| Prathyusha Mynampati - Sr Data Engineer |
| [email protected] |
| Location: Plainsboro, New Jersey, USA |
| Relocation: yes |
| Visa: Green Card |
| Resume file: Prathyusha_Senior_Data_Engineer_1771868553330.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
PRATHYUSHA MYNAMPATI
(669) 323-9876; [email protected] SUMMARY: AWS Data Engineer with 12+ years of experience designing scalable, secure, and high-performance data platforms on AWS, supporting enterprise analytics, reporting. Expert in building advanced ETL/ELT pipelines using AWS Glue, PySpark, EMR, Lambda, Step Functions, and Glue Workflows, enabling both batch and real-time data processing. Hands-on Snowflake experience, including warehouse provisioning, schema design, Snowpipe, Streams, Tasks, performance tuning, Time Travel, and cost-optimized data loading from AWS S3. Skilled in building modern data architectures using AWS S3 Data Lake, Lakehouse patterns, and medallion (Bronze Silver Gold) architecture for curated, analytics-ready datasets. Proficient in distributed data processing with Python, SQL, Spark, PySpark, Spark SQL, and serverless compute, delivering efficient and scalable transformations. Experienced with AWS analytics ecosystem, including Redshift, Athena, EMR, Glue Data Catalog, DynamoDB Streams, and Kinesis Streams/Firehose for ingestion and real-time processing. Strong Cloud Security & Governance expertise, implementing best practices using IAM, KMS encryption, VPC networks, Lake Formation, role-based access control, and audit logging. Proven ability to automate data workflows using CI/CD (Jenkins, GitHub Actions), Terraform/CloudFormation, versioned deployments, and automated environment management. Data Quality & Observability focused, leveraging CloudWatch, Glue Data Quality, Snowflake Query History, and logging frameworks to ensure consistent, trusted data delivery. Collaborative partner to BI, data science, and product teams, enabling analytics, dashboarding, forecasting, and ML initiatives by providing well-structured, high-quality datasets. TECHNICAL SKILLS Cloud Platforms: AWS (S3, Glue, Redshift, Athena, EMR, Kinesis, Lambda, Step Functions, RDS, DynamoDB, EC2, CloudFormation, CodePipeline); Azure (ADF, Data Lake, Synapse, Databricks). Data Engineering: ETL/ELT pipelines, batch & streaming data processing, data warehousing, Snowflake, DBT, data modeling (star & snowflake schemas), data wrangling, performance optimization. Programming & Scripting: Python (pandas, PySpark, boto3, NumPy), SQL (T-SQL, PL/SQL), Shell, PowerShell. DevOps & Automation: CI/CD (Jenkins, GitHub Actions, Azure DevOps, AWS CodePipeline), IaC (Terraform, CloudFormation), Docker, Kubernetes (EKS/AKS). Security & Governance: IAM, RBAC, KMS Encryption, VPC isolation, PrivateLink, Glue Data Catalog, Unity Catalog, audit/compliance (GDPR/SOX). Monitoring & Logging: CloudWatch, AWS Config, Azure Monitor, Log Analytics. Visualization & Reporting: Power BI, Tableau, QuickSight, Cognos, R Shiny. WORK EXPERIENCE: Wells Fargo, Charlotte NC | Feb 2024 Present Sr. Data Engineer Project Description: This project involves modernizing Wells Fargo s legacy ETL ecosystem by migrating workloads to a cloud-native architecture built on AWS, Databricks, and Snowflake. The initiative enables real-time data ingestion, high-performance analytics, and secure enterprise-wide data sharing. It supports key banking use cases such as fraud detection, customer insights, and regulatory reporting while ensuring compliance with SOX and GDPR standards. Key Responsibilities: Designed and implemented scalable ETL/ELT pipelines using PySpark, AWS Glue 4.0, and SQL to process structured and semi-structured data from mainframe, Oracle, and Kafka sources. Built real-time streaming pipelines using Amazon Kinesis Data Streams, Firehose, and Lambda to deliver near real-time analytics for fraud-monitoring use cases. Developed Databricks (v13 LTS) workflows for complex transformations, data enrichment, and ML-ready dataset creation. Modeled and optimized Snowflake tables using clustering keys, micro-partitioning, and query tuning to improve dashboard and report performance. Automated pipeline orchestration using AWS Step Functions and Apache Airflow 2.x, improving operational reliability and traceability. Deployed infrastructure and data pipelines using Terraform (v1.7) and AWS CloudFormation integrated with GitHub Actions for CI/CD automation. Implemented monitoring and alerting using CloudWatch and Datadog to track Spark job performance, pipeline health, and SLA adherence. Applied AWS FinOps practices including auto-scaling EMR 6.x clusters, resource right-sizing, and Spot Instance adoption to reduce compute costs. Strengthened data governance through IAM role-based access control, KMS encryption, VPC endpoints, and Glue Data Catalog metadata classification. Collaborated with data scientists to provide curated datasets for Amazon SageMaker ML model development and with analytics teams for business reporting. Coordinated with Wells Fargo compliance teams to maintain SOX and GDPR alignment across data pipelines, storage, and access layers. Environment: AWS (S3, Glue 4.0, Redshift RA3, EMR 6.x, Lambda, Step Functions, Kinesis, CloudWatch, IAM, KMS), Databricks (v13 LTS), Snowflake, Python, PySpark, SQL Server, Oracle, Terraform (v1.7), GitHub Actions, Docker, Airflow 2.x, Kafka, Datadog Brinker International, Coppell, TX | Feb 2023 Jan 2024 Sr Data Engineer Project Description: Worked on building a cloud-native data platform to support restaurant operations, customer analytics, and real-time reporting by ingesting and transforming large-scale operational and customer data. Key Responsibilities: Designed batch and streaming data pipelines using AWS Glue 4.0, Kinesis, Lambda, and Step Functions for high-volume operational data. Built a scalable S3-based data lake with Bronze Silver Gold layers for governed storage and efficient transformations. Developed PySpark ETL pipelines in Glue to process POS, menu, loyalty, and inventory datasets with improved freshness and reliability. Integrated enterprise data sources into Redshift models and optimized performance using sort keys, dist styles, and materialized views. Developed Snowflake curated layers and optimized micro-partitioning/clustering for faster analytics. Implemented secure Snowflake role-based access and governed data sharing across analytics teams. Created modular dbt models (staging marts) to standardize SQL transformations and improve maintainability. Configured dbt tests (unique, not_null, relationships) to automate data quality validation Automated CI/CD for pipelines using Terraform and Code Pipeline and implemented observability using Airflow and CloudWatch. Strengthened IAM, KMS, and VPC-based security and contributed to cost optimization across Glue, Redshift, and S3. Environment: AWS (Glue 4.0, Redshift, S3, Kinesis, Lambda, Step Functions), Snowflake, dbt, Airflow, Python (PySpark, pandas), Terraform, CodePipeline, CloudWatch, Tableau, QuickSight, Git/GitHub, Jira. Nike Inc., Hillsboro, OR | Mar 2021 Dec 2022 AWS Data Engineer Project Description: Contributed to building a cloud-based data platform supporting near real-time analytics and reporting for global retail and e-commerce operations. The platform ingested transactional, batch, and streaming data to enable business insights, operational visibility, and scalable analytics. Key Responsibilities: Designed scalable batch and streaming pipelines using AWS Glue (v3.0), PySpark, Kinesis Streams, Firehose, and Lambda for high-volume retail and e-commerce data. Developed ETL/ELT workflows in Glue and Step Functions to transform raw, semi-structured, and structured data into curated datasets. Built and enhanced an S3-based Data Lake with Bronze Silver Gold architecture for standardized ingestion, transformation, and governed storage. Leveraged Databricks for scalable data processing, notebook-based transformations, and collaborative development with data science teams. Modeled and optimized Redshift RA3 schemas, implemented distribution and sort keys, and leveraged Redshift Spectrum for hybrid query performance across S3. Automated data validation and quality checks using Python and Glue workflows to ensure schema integrity, referential checks, and business rule accuracy. Implemented CI/CD automation for Glue jobs, Lambda functions, and schema updates using CodePipeline, CodeBuild, and Terraform. Enabled proactive monitoring through CloudWatch logs, metrics, and custom alerts for pipeline performance and SLA tracking. Enforced security and compliance using IAM policies, KMS encryption, VPC endpoints, and controlled Redshift access. Supported BI and reporting teams by publishing curated datasets to QuickSight and Power BI. Environment: AWS (S3, Glue v3.0, Redshift RA3, Redshift Spectrum, Athena, Kinesis Streams, Firehose, Lambda, Step Functions, DynamoDB, EventBridge, CloudWatch), Databricks, PySpark, Terraform v1.x, CloudFormation, CodePipeline, CodeBuild, CodeDeploy, Git/GitHub. Boehringer-Ingelheim, Ridgefield, CT | Nov 2019 Jan 2021 Cloud Engineer Project Description: Supported cloud transformation by migrating on-prem systems to AWS & Azure, building secure and scalable infrastructure aligned with healthcare compliance. Responsibilities: Designed and implemented cloud infrastructure across AWS and Azure, ensuring scalability, performance, and security. Automated environment provisioning using Terraform, CloudFormation, and ARM templates. Built CI/CD pipelines using Azure DevOps and Jenkins for application and data pipeline deployments. Migrated on-premises workloads and databases to AWS RDS, S3, and Azure Data Lake. Deployed and managed containerized applications using Docker, EKS, and AKS. Implemented monitoring and observability with CloudWatch, Azure Monitor, Prometheus, and Grafana. Collaborated with InfoSec to enforce IAM, encryption, compliance, and disaster recovery policies. Performed cloud cost optimization using AWS Cost Explorer and Azure Advisor. Supported data teams by establishing secure data pipelines and storage zones in S3 and Azure Blob. Environment: AWS (EC2, S3, RDS, Lambda, IAM, VPC, EKS), Azure (Blob, Data Lake, Functions, AKS, Monitor, Key Vault), Terraform, ARM, CloudFormation, Azure DevOps, Jenkins, Docker, Kubernetes, Prometheus, Grafana, Python, PowerShell. Piper Sandler, Minneapolis, MN | Aug 2016 July 2019 Data Engineer Project Description: Worked on modernizing legacy on-prem data workflows into cloud-based pipelines using AWS, Azure, Snowflake, and Databricks. Developed scalable ETL/ELT processes, unified cloud storage, and analytics-ready datasets for enterprise reporting and BI platforms. Responsibilities: Built and maintained ETL/ELT pipelines using Databricks, AWS Glue, and ADF for ingesting and transforming structured and semi-structured data. Migrated on-prem data assets into AWS S3, Azure Data Lake, and Snowflake, improving performance and reducing processing times. Developed transformation logic using PySpark and SQL within Databricks for analytics and reporting use cases. Designed optimized data models in Snowflake and implemented DBT for transformations, lineage tracking, and documentation. Automated data validation and quality checks using Python, DBT tests, and rule-based frameworks. Implemented pipeline performance improvements, including partitioning, caching, and parallel execution. Set up cloud monitoring using CloudWatch and Azure Monitor to track job status and SLAs. Supported BI teams by delivering curated datasets to Power BI and QuickSight dashboards. Environment: AWS S3, AWS Glue, Azure Data Lake, Snowflake, DBT, Databricks, Synapse, PySpark, Python, SQL, Power BI, Azure DevOps, Git, CloudWatch, ADF. Deluxe, Minneapolis, MN | Mar 2015 Aug 2016 Data Analyst / Data Engineer Project Description: Worked on enhancing enterprise data platforms to support reporting, analytics, and business intelligence initiatives. The role focused on analyzing large datasets, building data models, modernizing ETL workflows, and delivering dashboards that supported strategic and operational decision-making across business teams. Responsibilities: Analyzed high-volume datasets from Oracle, SQL Server, flat files, and API feeds to identify business trends, operational gaps, and performance insights. Designed and implemented data marts using star and snowflake schemas for streamlined reporting and analytical consumption. Developed and optimized SQL queries, stored procedures, and ETL scripts to cleanse, transform, and validate data for downstream systems. Automated recurring data extraction, transformation, and reporting processes using Python, shell scripts, and scheduling tools, reducing manual tasks and errors. Performed statistical analysis and data modeling using Python (pandas, NumPy) and R to support forecasting, performance metrics, and operational decision-making. Built interactive dashboards and visualizations using Tableau, Cognos, and R to communicate KPIs and business insights. Collaborated with business stakeholders to interpret analytical requirements and convert them into technical data models and metrics. Performed extensive data reconciliation and validation to ensure quality, accuracy, and consistency across multiple systems. Partnered with DBAs, ETL developers, and QA teams to support data integration across Oracle, SQL Server, and Teradata environments. Environment: Python, R, Oracle, SQL Server, Teradata, Informatica PowerCenter, SSIS, SQL, PL/SQL, T-SQL, Tableau, Cognos, SVN. Intelex Systems | India | Feb 2011 Oct 2014 Data Analyst Project Description: Provided analytical support to business and operations teams by extracting, analyzing, and visualizing data. Delivered meaningful insights through reports, KPIs, and dashboards to help drive strategic and operational decisions. Responsibilities: Collected and integrated data from relational databases, flat files, and APIs to build unified datasets for reporting and analysis. Performed exploratory data analysis (EDA) to identify business trends, anomalies, and correlations. Developed optimized SQL scripts to extract, aggregate, and validate large datasets for KPI and trend reporting. Cleaned, standardized, and validated raw data to ensure accuracy, completeness, and referential integrity. Built interactive dashboards and visual analytical summaries in Tableau and R Studio for leadership and non-technical teams. Applied descriptive and inferential statistical techniques (regression, correlation analysis, PCA) to support business recommendations. Collaborated with business stakeholders to translate reporting needs into data models and source-to-target mappings. Documented data lineage, business rules, and report logic for audit and maintenance purposes. Conducted time-series and variance analysis to track operational metrics and identify improvement opportunities. Environment: Python (pandas, NumPy), R, SQL, Tableau, R Studio, Unix, Snowflake schema EDUCATION: Bachelors of Computer Science, Amity University, Noida, India Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence sthree rlang information technology procedural language Connecticut Minnesota North Carolina Texas |