Home

Kalpana Gujja - Data Engineer
[email protected]
Location: Princeton, New Jersey, USA
Relocation: Yes
Visa: H1B
Resume file: Kalpana- Data Engineer.(Aws)_1771868315325.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Name: Kalpana Gujja
Role: Sr. Data Engineer
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/kalpana-gujja-ka123/
Phone: (512)-866-3963
Professional Summary:
Senior Data Engineer with 12+ years of experience building and managing scalable data platforms on AWS, Azure, and Snowflake.
Expertise in ETL/ELT pipelines, data lakes, and data warehouses supporting analytics, BI, and ML workloads.
Hands-on experience with AWS services S3, Glue (PySpark), Redshift, Lambda, Kinesis, and EMR.
Skilled in Azure services, Data Lake Storage, Synapse Analytics, Azure Databricks, Data Factory, Azure SQL Database, and Azure Functions.
Proficient in Snowflake for scalable cloud data warehousing, including data modeling, schema design, and performance tuning.
Proficient in Python, SQL, and PySpark for large-scale data processing, transformation, and analysis.
Experienced in data visualization and reporting using Tableau and Power BI, delivering actionable insights for business and operational decisions.
Adept at data modeling, pipeline optimization, and automation, ensuring scalable, secure, and cost-efficient cloud solutions.
Strong experience in real-time streaming data pipelines using AWS Kinesis and Azure Event Hubs for time-sensitive analytics.
Implemented CI/CD pipelines for automated deployment of data workflows and infrastructure as code.
Deep understanding of data governance, security, and compliance best practices in cloud environments.
Skilled in performance tuning, query optimization, and cost management across large-scale data platforms.
Experienced in mentoring and leading technical teams, reviewing code, and ensuring adherence to best practices.
Track record of driving end-to-end data platform modernization, enabling faster insights and improved operational efficiency.
Technical Skills:
Big Data & Hadoop: HDFS, Hive, Spark (Scala, PySpark, Streaming), Kafka, Sqoop, Oozie, Zookeeper, Pig
Cloud Platforms: AWS (Glue, EMR, Redshift, Kinesis, S3, DynamoDB, IAM, Lambda), Azure (Data Factory, Databricks, Synapse, Data Lake, Event Hub, DevOps).
ETL & Data Integration: AWS Glue, SSIS, Informatica, Azure Data Factory, IBM DataStage.
Databases & Warehousing: Snowflake, Redshift, Synapse, Databricks, SQL Server, Oracle, PostgreSQL, Teradata, DynamoDB, and Cosmos DB
Languages & Scripting Python, Java, Terraform, Unix, SQL, Oracle PL/SQL, T-SQL
Visualization: Power BI (DAX, Power Query, Custom Visuals, RLS), Tableau, Excel, SSRS
DevOps & Tools: Git, Jenkins, Docker, Kubernetes, Terraform, CloudFormation.
File Formats: JSON, Parquet, Avro, ORC, CSV.

Work Experience:
PG&E Corporation, Oakland, CA Feb 2025 Present
AWS Data Engineer
Project Description:
Modernizing PG&E s enterprise data platform on AWS to support large-scale energy usage analytics, compliance reporting, and operational monitoring. The project integrates real-time smart meter data, IoT sensor streams, and enterprise systems into a secure, governed, and cost-efficient data lake and warehouse, enabling data-driven decision-making across business units.
Responsibilities:
Designed and implemented scalable data pipelines on AWS using Glue (PySpark), Lambda, and Kinesis/MSK to process high-volume batch and streaming data.
Built a centralized data lake on Amazon S3 with Glue Data Catalog and Redshift/Snowflake integration, supporting analytics and BI dashboards.
Orchestrated workflows with Apache Airflow and AWS Step Functions, implementing retries, monitoring, and SLA compliance for reliable data delivery.
Developed ETL/ELT frameworks to integrate diverse data sources (smart meters, IoT, APIs, RDBMS, NoSQL) into governed datasets.
Applied data governance and security policies with IAM, Lake Formation, and KMS encryption to meet regulatory and compliance requirements.
Tuned Spark jobs and optimized queries in Redshift and Snowflake, improving performance and reducing costs.
Established monitoring and alerting frameworks with CloudWatch, Datadog, and PagerDuty for proactive issue detection.
Partnered with analysts, data scientists, and business stakeholders to deliver curated datasets for regulatory reports, dashboards, and predictive models.
Designed and implemented scalable batch and streaming ETL pipelines integrating smart meter, IoT, and enterprise data into a centralized data lake on S3 with Redshift/Snowflake.
Orchestrated workflows using Apache Airflow and AWS Step Functions, ensuring SLA compliance, retries, and pipeline reliability.
Applied data governance and security policies using IAM, Lake Formation, and KMS encryption.
Optimized pipelines and queries for performance, cost efficiency, and reliability, supporting analytics and BI dashboards.
Collaborated with analysts, data scientists, and business stakeholders to deliver curated datasets for regulatory reporting and predictive analytics.

Environments&Tools:
AWS (S3, Glue, Glue Catalog, Redshift, Snowflake, MSK/Kafka, Kinesis, Lambda, Lake Formation, Athena), Apache Airflow, Step Functions, Python, PySpark, SQL, Terraform, CloudWatch, Datadog, PagerDuty, Power BI, Tableau.

CVS Health, Dallas, TX April 2024-Jan 2025
Data Engineer
Project Description:
Modernized CVS Health s enterprise data platform by migrating from on-premises systems to AWS, creating a scalable and secure foundation for healthcare analytics, claims processing, and regulatory reporting. The solution integrated structured, semi-structured, and unstructured healthcare datasets into an AWS-based data lake and warehouse, ensuring HIPAA compliance and enabling advanced BI use cases.
Responsibilities:
Designed and developed scalable ETL/ELT pipelines using AWS Glue (PySpark), Lambda, and Step Functions to process diverse healthcare data formats (JSON, Avro, Parquet, HL7).
Built an enterprise data lake on Amazon S3 with Glue Data Catalog, optimized with partitioning and bucketing for query efficiency.
Developed real-time ingestion pipelines using Kinesis Data Streams, Firehose, and Kafka to process claims and pharmacy data with low latency.
Migrated legacy ETL workflows from SQL Server/Oracle/SSIS to AWS-native services, improving reliability and reducing operational costs.
Modeled data in Amazon Redshift and Snowflake to support BI dashboards for claims, payor analysis, and compliance reporting.
Automated pipeline orchestration with Apache Airflow and Step Functions, ensuring SLA adherence and monitoring.
Implemented data validation and quality checks using Great Expectations and Deequ, ensuring compliance with healthcare data standards.
Enforced HIPAA-compliant security policies with IAM, Lake Formation, and KMS encryption, maintaining strict governance across sensitive healthcare datasets.
Collaborated with business stakeholders to deliver curated datasets and BI dashboards in Power BI and Tableau with row-level security for regulatory users.
Deployed infrastructure-as-code (IaC) templates with Terraform and CloudFormation, ensuring repeatable and secure deployments.
Environment & Tools:
AWS (Glue, EMR, S3, Kinesis, Redshift, Lambda, Athena, Lake Formation), Snowflake, Python, PySpark, SQL, Apache Airflow, Kafka, Tableau, QuickSight, Power BI, Great Expectations, Deequ, Terraform, CloudFormation.
Semnox Solutions, Karnataka, India Oct 2019 Nov 2022
Data Engineer
Project Description: Migrated and modernized the enterprise data platform by implementing scalable, cloud-native ELT pipelines on Snowflake to integrate data from diverse sources into a governed, performant, and cost-efficient data warehouse, delivered curated datasets, KPIs, and dashboards to support analytics and business intelligence.
Responsibilities:
Designed and developed scalable data pipelines to load, transform, and store data in Snowflake from on-premises databases, APIs, and flat files.
Used Python for orchestration, automation, and Snowflake integration, improving pipeline reliability and performance
Implemented and optimized ELT workflows using Snowflake streams, tasks, stages, and SnowSQL, orchestrated with Airflow, dbt, and cloud functions.
Built robust data models (star & snowflake schemas) with secure, performant views, materialized views, and stored procedures for analytics and reporting.
Automated and orchestrated pipelines using Apache Airflow, dbt, and AWS Lambda/Azure Functions.
Enforced data quality, governance, and security by applying RBAC, masking policies, and monitoring warehouse usage and costs.
Collaborated with business stakeholders to gather requirements and delivered curated datasets, KPIs, and dashboards using Power BI and Tableau.
Migrated legacy ETL workflows from on-premises SQL Server/Oracle to Snowflake ELT pipelines, improving performance and reducing costs.
Delivered curated datasets and KPIs for interactive Power BI dashboards, improving decision-making across business functions.
Tuned queries and managed multi-cluster warehouses for optimal performance, concurrency, and workload management.
Supported deployment processes with CI/CD pipelines, version control, and automated testing to ensure reliable, repeatable releases.
Environment & Tools: AWS (S3, Lambda, Glue, Redshift), Azure (Blob Storage, Data Factory, Functions), Snowflake, Apache Airflow, dbt, SQL, Python, Databricks, Kafka, Terraform, CloudWatch, Power BI, Tableau, CI/CD pipelines.

Meril Life Sciences Pvt. Ltd, Gujarat, India July 2017 Sep 2019
Azure Data Engineer
Project Description: Built a cloud-based data platform on Azure to ingest, transform, and store structured and unstructured data from medical devices, hospital systems, and ERP databases. Enabled real-time analytics, reporting, and predictive insights to support device performance monitoring and patient care.
Key Responsibilities:
Developed ETL/ELT pipelines using Azure Data Factory for batch and real-time data ingestion.
Processed and transformed data with Azure Databricks (PySpark/Scala) into curated bronze, silver, and gold layers.
Designed fact and dimension tables in Azure Synapse Analytics for efficient analytics and reporting.
Automated workflows and monitoring to ensure reliable, low-latency data processing.
Implemented data security, access control, and masking for sensitive patient and operational data.
Migrated legacy ETL workflows (SSIS, Informatica) to Azure-native pipelines for better scalability and cost efficiency.
Delivered validated datasets for Power BI dashboards and predictive analytics models.
Environment & Tools: Microsoft Azure, Azure Blob Storage, Azure Data Lake (Gen1/Gen2), Azure Data Factory (V1 & V2), Azure SQL Data Warehouse (Synapse), SQL Server 2016/2017, Azure Databricks, Python, Scala, Power BI, Azure DevOps, SSIS.

Innovapptive, Hyderabad, India Oct 2015 Jun 2017
ELT Developer
Project Description: Built and maintained the enterprise data warehouse and reporting infrastructure to support business operations and analytics. Developed efficient ELT pipelines, optimized data models, and delivered dashboards and reports enabling business users to make data-driven decisions.
Responsibilities:
Designed and developed ELT pipelines to extract, transform, and load structured and semi-structured data from diverse sources into the data warehouse.
Wrote optimized SQL and PL/SQL queries, stored procedures, and functions for high-performance data processing and analytics.
Designed data models (star and snowflake schemas) to support reporting and analytics.
Built and maintained operational and analytical dashboards and reports using Power BI, Tableau, and SSRS, providing business teams with actionable insights.
Collaborated with business analysts and end-users to gather requirements and translate them into technical specifications.
Automated repetitive workflows using SSIS, T-SQL, Python, and Shell scripts, improving reliability and reducing manual effort.
Ensured data quality and integrity by implementing validation rules, reconciliation checks, and data profiling techniques.
Contributed to the migration of legacy data processes to modern platforms, enhancing scalability and maintainability.
Environment & Tools: SQL Server 2012/2014, Oracle 11g/12c, MySQL, SSIS 2012/2014, Tableau 9/10, Power BI, Python, TFS, Git, Windows Server 2012/2016, Linux, Excel (advanced), SharePoint.

AVEVA, Hyderabad, India Mar 2013 Sep 2015
Data Analyst / BI Developer
Project Description: Developed and maintained business intelligence solutions to enable data-driven decision-making across the organization. Delivered dashboards, reports, and insights by integrating and analyzing data from multiple sources, improving visibility into key business metrics and operational performance.
Responsibilities:
Designed, developed, and maintained interactive dashboards and reports using Power BI, SSRS, and Tableau, enabling stakeholders to monitor KPIs and trends effectively.
Analyzed large datasets from diverse sources to uncover patterns, anomalies, and actionable insights, supporting business process improvements.
Developed and optimized complex SQL queries, stored procedures, and ETL workflows to extract, transform, and load data into data warehouses.
Collaborated with business users, SMEs, and cross-functional teams to gather reporting requirements, document specifications, and deliver aligned solutions.
Automated recurring data processing and reporting tasks, reducing manual effort and increasing accuracy and timeliness.
Ensured data quality, integrity, and consistency through validation checks and reconciliation processes.
Supported migration of legacy reports and processes to modern BI platforms, enhancing usability and scalability.
Conducted unit testing and worked with business users to validate reports and ensure they met business needs.
Environment & Tools: Microsoft SQL Server, Oracle, MySQL, SSIS, ETL (T-SQL/PL-SQL), SSRS, Power BI, Tableau.
Education :
B. Tech in Computer Science and Engineering, at JNTUH, India Jun 2009 May 2013
Master s in data science from Lewis University, Romeoville, IL.USA Jan 2023 Dec 2024
Keywords: continuous integration continuous deployment machine learning business intelligence sthree database procedural language California Illinois Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6864
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: