| Poojitha - Data Engineer |
| [email protected] |
| Location: Atlanta, Georgia, USA |
| Relocation: Yes |
| Visa: H1B |
| Resume file: Poojitha DE Resume_1772205797854.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Poojitha
Email: [email protected] +1 (202)-642-4110 --------------------------------------------------------------------------------------------------------------------------------------------------------- PROFESSIONAL SUMMARY: Senior Data Engineer with over 10 years of hands-on experience building and supporting large-scale data platforms across finance, healthcare, insurance, and telecom. Known for translating complex business needs into reliable, production-ready data solutions that support analytics, reporting, and operational decision-making. Experienced in designing and working with cloud-based data architectures across AWS, Azure, and using tools such as BigQuery, Dataflow, AWS Glue, Azure Synapse, and Snowflake to build scalable data lakes and warehouses while improving pipeline performance and managing infrastructure costs. Strong background in distributed data processing and big data technologies including Apache Spark (batch and streaming), Kafka, Hadoop, and Airflow, supporting high-volume data processing, near real-time analytics, fraud detection, and machine learning data pipelines. Well-rounded data engineering professional with experience across the full data lifecycle, including ingestion, transformation, modeling, validation, and governance using Python, SQL, Pandas, and PySpark, while supporting compliance with HIPAA, GDPR, and SEC regulations. Played a key role in modernizing legacy on-premise data systems by supporting migrations to cloud-based data lakes and warehouses, building CI/CD pipelines using Jenkins and Kubernetes, and implementing monitoring and alerting solutions to improve platform reliability. Hands-on experience building real-time and streaming data solutions using Kafka, Azure Event Hubs, and Spark Streaming to process high-velocity data from IoT systems, insurance claims, and financial transactions, enabling faster fraud detection and operational insights. Supported integration of machine learning workflows into enterprise data platforms using Databricks MLflow, Vertex AI, TensorFlow, and Feast, helping streamline model deployment, feature management, and predictive analytics use cases. Strong focus on data quality, monitoring, and operational stability by implementing validation and observability frameworks using Great Expectations, Splunk, Cloud Monitoring, and ELK Stack to quickly detect and resolve production data issues. Contributed to improving platform scalability and cost efficiency by applying best practices such as partitioning, clustering, auto-scaling, and cloud cost monitoring across hybrid and multi-cloud environments. Known as a collaborative team contributor who supports and mentors engineers on cloud architecture, CI/CD practices, data engineering design patterns, and API integrations including REST and FHIR to improve data accessibility for analytics and business teams. TECHNICAL SKILLS: Cloud Platforms: AWS (Glue, EMR, S3, Redshift, Lake Formation, IAM, CloudWatch), Azure (Synapse, Data Lake Gen2, Event Hubs, Sentinel, HDInsight), (BigQuery, Dataflow, Dataproc, Cloud Storage, Composer, IAM, VPC) Big Data & Distributed Processing: Apache Spark, Hadoop, Apache Kafka, Databricks, Azure HDInsight Data Warehousing & Storage: Snowflake, Amazon Redshift, Azure Synapse, Data Lake Gen2, Amazon S3, Cloud Storage Programming & Query Languages: Python, SQL, Scala, Pandas Databases (SQL & NoSQL): PostgreSQL, MySQL, Oracle, MongoDB, Cassandra ETL / Data Integration: AWS Glue, Talend, Apache NiFi, Apache Dataflow Streaming & Event Processing: Apache Kafka, Azure Event Hubs, Spark Streaming, Pub/Sub APIs & Data Interoperability: REST APIs, Postman, FHIR DevOps & CI/CD: Prometheus, Splunk, ELK Stack, CloudWatch ML & MLOps: TensorFlow, MLflow, Vertex AI, Feast Governance & Security: Collibra, Lake Formation, Data Catalog, HIPAA, GDPR, SEC Visualization: Tableau, Power BI, Matplotlib, Tableau Prep Data Quality: Great Expectations, Validation Frameworks APIs & Interoperability: REST APIs, Postman, FHIR PROFESSIONAL EXPERIENCE: Client: State Street, MA Role: Sr. Data Engineer | June 2025 to Present Responsibilities: Built and deployed scalable data pipelines on Cloud Platform using BigQuery, Dataflow, and Pub/Sub to process high volumes of structured and semi-structured financial data across batch and streaming workloads with low latency. Implemented governance controls for AI workloads using IAM, Data Catalog, and metadata tagging to ensure explainability, traceability, and regulatory compliance for LLM-generated outputs in financial environments. Developed preprocessing frameworks in Python to clean, tokenize, and structure unstructured data (PDFs, regulatory filings, operational reports) for downstream LLM inference and AI-driven insights. Designed and developed scalable ETL/ELT pipelines using AWS Glue and PySpark, processing high-volume structured and semi-structured datasets into curated data lakes on Amazon S3. Managed end-to-end data engineering solutions using Cloud Storage, Dataproc, and Cloud Composer, creating resilient pipelines and enabling smooth integration between cloud-native services and hybrid/on-premise systems. Built distributed data transformation workflows using Azure Databricks (PySpark) to process large-scale datasets for analytics, reporting, and ML-ready data preparation. Developed hybrid ingestion pipelines using AWS Glue and S3 to process external financial datasets and securely transfer curated data into BigQuery environments for advanced analytics and risk modeling. Implemented event-driven ingestion pipelines using AWS Lambda, S3, and EventBridge, enabling near real-time data processing and automation workflows. Optimized BigQuery storage and query patterns to support vector similarity searches and high-performance retrieval workloads for AI-driven financial insights. Enabled secure LLM inference workflows by designing controlled data access layers using IAM and VPC Service Controls, ensuring sensitive financial data remained compliant during AI processing. Created reusable and scalable data ingestion frameworks using Kafka, Pub/Sub, and Cloud Functions to process high-volume transaction streams while maintaining schema validation, data consistency, and regulatory compliance standards. Optimized cross-cloud storage strategies by implementing lifecycle policies across AWS S3 and Azure Data Lake, improving cost efficiency and maintaining regulatory compliance. Built scalable data preparation workflows using BigQuery and Dataflow to generate embeddings and vector-ready datasets for downstream LLM-powered search and anomaly detection solutions. Established secure data access and sharing frameworks using IAM roles, service accounts, and VPC Service Controls, ensuring compliance with enterprise security and financial regulatory requirements while enabling cross-team collaboration. Improved BigQuery performance through partitioning, clustering, materialized views, and query optimization techniques, supporting large-scale analytical workloads for real-time risk monitoring and fraud detection. Enabled event-driven architectures using Azure Event Hubs and Kafka, supporting real-time transaction monitoring and fraud detection use cases across hybrid cloud environments. Collaborated with data science and AI teams to support prompt engineering workflows and fine-tuning datasets for fraud detection and anomaly identification use cases. Implemented metadata tracking and lineage controls for AI-generated outputs to ensure explainability, traceability, and regulatory compliance in financial AI solutions. Enabled machine learning workflows by integrating Vertex AI with existing data pipelines, supporting automated model deployment, real-time and batch inference, and performance monitoring for financial anomaly detection. ________________________________________ Client: Rural Community Insurance Services , MN Role: Sr. Data Engineer | Oct 2023 to May 2025 Responsibilities: Developed scalable ETL and ELT pipelines using Apache Spark and AWS Glue to process high-volume insurance claims data, enabling near real-time ingestion and transformation for operational reporting and analytics. Built and maintained cloud-based analytical data warehouses on Snowflake and Azure Synapse, implementing dimensional and analytical data models to support underwriting, actuarial insights, and enterprise reporting. Delivered data integration and streaming solutions using Kafka and Azure Event Hubs, consolidating policy, claims, and telematics data to support fraud detection and advanced analytics. Enhanced performance of complex SQL queries and Python-based transformations across large datasets in BigQuery, improving processing efficiency and enabling faster insights for premium pricing and claims forecasting. Collaborated with actuarial and analytics teams to develop ML-ready datasets using Databricks, integrating feature engineering workflows and supporting TensorFlow-based model development for personalized insurance offerings. Strengthened data governance and security practices using Collibra and AWS Lake Formation, implementing data classification, lineage tracking, and role-based access controls while meeting HIPAA, GDPR, and internal compliance standards. Automated deployment and validation of data engineering workflows by developing CI/CD pipelines using Jenkins and Kubernetes, supporting seamless promotion across development, staging, and production environments. Investigated and resolved production pipeline issues using Splunk and ELK Stack, identifying performance bottlenecks and improving overall system reliability and monitoring visibility. Supported and guided junior data engineers through code reviews and knowledge sharing, encouraging best practices in cloud data architecture, Spark optimization, and pipeline design. ________________________________________ Client: Quest Diagnostics, NJ Role: Sr. Data Engineer | Nov 2022 to Sep 2023 Responsibilities: Managed and supported Cloudera Hadoop environments on Linux, handling cluster provisioning, configuration, patching, upgrades, and performance tuning to maintain stability and reliability of distributed data platforms. Developed high-throughput ETL pipelines using AWS Glue and Apache Spark to aggregate laboratory test results from multiple clinical systems, enabling efficient ingestion, transformation, and delivery of diagnostic data. Built and enhanced cloud-based data lakes using Amazon S3 and Azure Data Lake Gen2, implementing lifecycle management, tiered storage, and access control strategies to support advanced clinical analytics and long-term data retention. Delivered real-time and near real-time streaming solutions using Apache Kafka and Azure Stream Analytics to track patient sample workflows, improving operational visibility and helping teams quickly identify processing bottlenecks. Designed and maintained scalable analytical data models in Snowflake and Amazon Redshift, enabling complex SQL analytics for population health reporting, regulatory compliance, and enterprise clinical insights. Integrated machine learning workflows using Databricks MLflow and Python, enabling automated anomaly detection on laboratory test datasets and improving early identification of data quality issues. Strengthened data security and compliance by implementing HIPAA-aligned controls using IAM policies, role-based access, encryption standards, and Azure Sentinel monitoring to safeguard sensitive healthcare information. Automated data validation and governance processes using Great Expectations and Apache Airflow, improving consistency, accuracy, and reliability of data used by BI, analytics, and research teams. Supported modernization of legacy Hadoop workloads by transitioning batch and analytics pipelines to cloud-native BigQuery environments, improving scalability, resilience, and operational efficiency. Worked closely with data science teams to build and maintain feature stores using Feast, enabling reusable feature pipelines and accelerating predictive diagnostics and laboratory analytics model development. Created monitoring and observability dashboards using Prometheus and Tableau Prep, helping teams proactively identify performance issues and improve overall data platform stability. Provided technical guidance and cross-team collaboration support, helping engineers adopt cloud data architecture best practices, distributed processing techniques, and healthcare data standards. Client: Centene Corporation, MO Data Engineer | Jan 2021 to Oct 2022 Responsibilities: Developed and maintained ETL and ELT pipelines using Python, SQL, and AWS Glue to process large-scale Medicaid claims data, improving data ingestion, transformation, and delivery for analytics and regulatory reporting. Built and supported analytical data marts in Azure Synapse and Snowflake by designing optimized data models and SQL patterns to support member analytics, enrollment tracking, utilization analysis, and provider performance insights. Delivered streaming and event-driven ingestion pipelines using Apache Kafka and Azure Event Hubs, improving near real-time data availability for fraud detection, compliance monitoring, and investigative reporting. Enhanced distributed data processing workflows using Apache Spark on Databricks, tuning jobs for improved performance, scalability, and reliability across enterprise reporting and analytical workloads. Strengthened data reliability by developing automated quality validation frameworks using Pandas and Great Expectations, ensuring data completeness, accuracy, and consistency for downstream dashboards and care management reporting. Supported migration of legacy relational database workloads to AWS Redshift by designing scalable schemas, partitioning strategies, and optimized query models to support population health analytics across multiple healthcare programs. Integrated RESTful APIs and FHIR-based data exchange processes using Python and Postman, enabling secure and standardized interoperability with external healthcare providers and partner systems. Automated build, testing, and deployment processes for data engineering solutions using Jenkins and GitHub Actions, improving release efficiency and reducing operational risk. Monitored and resolved production data pipeline issues using Splunk and AWS CloudWatch, performing root cause analysis and implementing proactive improvements to maintain system reliability and HIPAA compliance. Collaborated with business analysts and reporting teams to deliver self-service BI datasets in Power BI, improving accessibility to actionable insights and supporting better visibility into member outcomes and operational performance. Contributed to cloud cost optimization and governance initiatives across AWS and Azure by implementing auto-scaling strategies, workload right-sizing, and usage monitoring to balance performance, reliability, and operational spend. Company: Motorola Solutions, Bangalore Role: Data Engineer | Jun 2016 to Jun 2019 Responsibilities: Developed scalable ETL pipelines using Python, SQL, and Talend to ingest and process high-volume telecom network telemetry data, supporting predictive maintenance and operational analytics for public safety communication systems. Built and maintained Hadoop-based data lakes using Hive and HBase to manage large volumes of IoT sensor and radio telemetry data, enabling analytics focused on emergency response efficiency and system reliability. Delivered real-time and near real-time streaming pipelines using Apache Kafka and Spark Streaming, enabling low-latency data processing for fleet tracking and operational dashboards used by public safety teams. Improved performance of complex SQL workloads across Oracle and PostgreSQL by tuning queries, indexing strategies, and data models to enhance reporting efficiency for global operations teams monitoring network and device health. Created distributed data processing jobs using Python and Scala on Apache Spark to transform raw logs and telemetry feeds into curated, analytics-ready datasets for monitoring and troubleshooting use cases. Supported migration of on-premises data warehouse workloads to AWS-based platforms using Amazon EMR and S3, improving platform scalability, system resiliency, and cost efficiency during fluctuating workload demands. Strengthened data reliability by developing custom validation and reconciliation frameworks using Python, ensuring consistency, completeness, and accuracy of datasets used for compliance and operational decision-making. Built interactive analytics dashboards in Tableau by integrating data from multiple operational systems, providing field engineers and operations teams with real-time visibility into network performance and system health. Worked closely with software engineering teams to integrate data pipelines into CI/CD workflows using Jenkins, enabling automated testing, faster deployments, and alignment with agile development practices. Designed data models, normalization structures, and schemas for radio communication and telemetry datasets, supporting early-stage machine learning initiatives focused on anomaly detection and infrastructure monitoring. Contributed to cloud migration proof-of-concept initiatives using Azure HDInsight, helping establish architectural patterns and best practices to support enterprise big data modernization efforts. Company: Bacancy, Ahmedabad, Gujarat Role: Associate Data Engineer | Jun 2014 to May 2017 Responsibilities: Built and maintained ETL scripts and stored procedures using MySQL and PostgreSQL to extract, transform, and load structured datasets, helping deliver timely analytics and consulting insights to clients. Supported the development of data pipelines using Python and Apache NiFi to process structured client data for e-commerce and business intelligence use cases while maintaining data freshness and reliability. Cleaned, normalized, and validated datasets using Pandas and Talend Open Studio, helping ensure accurate and reliable data for BI dashboards, reports, and analytics applications across multiple client projects. Worked with Hadoop ecosystem tools and used HiveQL to query large datasets, supporting scalable data storage and analytics for retail and enterprise big data initiatives. Designed MongoDB NoSQL schemas and aggregation pipelines to support web application backends, improving application performance and enabling near real-time analytics for client platforms. Supported AWS S3 integration for cloud-based data storage by assisting with migration of legacy data files, helping reduce reliance on on-prem systems and improving data accessibility. Developed streaming data prototypes using Apache Kafka to ingest log and event data, supporting early-stage dashboards used for monitoring application performance and system operations. Created automated reports and data visualizations using Tableau and Python (Matplotlib), helping stakeholders track project metrics, operational KPIs, and business performance trends. Investigated and resolved data pipeline issues in development and staging environments using ELK Stack, improving pipeline reliability and reducing delays during testing and deployment cycles. Assisted with database performance improvements through indexing, query tuning, and schema enhancements, helping improve efficiency for high-volume client databases. Contributed to code reviews and documentation of data workflows, helping establish consistent development practices and supporting faster onboarding of new team members. ________________________________________ Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree Massachusetts Minnesota Missouri New Jersey |