| YASHWANTH - Senior Data Engineer Azure & Snowflake |
| [email protected] |
| Location: Houston, Texas, USA |
| Relocation: All states |
| Visa: GC |
| Resume file: R Yashwanth Senior DE_DA Resume_1767803824351.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
R Yashwanth
Senior Data Engineer Phone: (430) 231-1142 Email: [email protected] Professional Summary 10+ years of experience delivering enterprise-grade ETL, ELT, and data warehousing solutions across healthcare, finance, insurance, and public sector domains. Expert in modern ETL frameworks, designing and optimizing scalable data pipelines using Azure Data Factory (ADF), Databricks (PySpark/Scala), Informatica PowerCenter, and SSIS. Cloud migration specialist with proven success in modernizing SQL Server, Oracle, and flat-file systems into Snowflake, Azure Synapse, and AWS/GCP-native warehouses for improved scalability and performance. Strong knowledge of real-time streaming architectures, building low-latency ingestion pipelines using Apache Kafka, Spark Structured Streaming, and event-driven frameworks. Implemented enterprise-wide data governance and security frameworks (HIPAA, GDPR, SOX, 21 CFR Part 11), leveraging Azure Purview for lineage, metadata, classification, and PII masking. Built reusable ETL templates and parameterized frameworks, accelerating onboarding of new data sources by up to 50% and enforcing consistency across projects. Skilled in data modeling Star and Snowflake schemas, fact-dimension modeling, Slowly Changing Dimensions (SCD Types 1 & 2) to preserve historical data and support advanced analytics. Proficient in DevOps and CI/CD automation for data projects, using Azure DevOps, Jenkins, and GitHub Actions to streamline testing, deployments, and rollback processes. Experienced in performance tuning optimized Spark jobs (caching, partitioning, broadcast joins) and Snowflake workloads (clustering, warehouse tuning), achieving 60% faster queries and 25% reduction in cloud costs. Designed and delivered Lakehouse architectures, integrating Azure Data Lake, AWS S3, and GCS with curated Snowflake/BigQuery semantic layers for cross-domain analytics. Created real-time pipelines for mission-critical use cases, including fraud detection, IoT telemetry, predictive maintenance, and actuarial forecasting. Integrated BI tools (Power BI, Tableau, SSRS, Looker) with Snowflake and SQL Server to deliver executive dashboards, KPI scorecards, and ad hoc exploration layers. Developed automated testing and data quality frameworks (schema validation, anomaly detection, reconciliation checks) to ensure data accuracy, consistency, and trustworthiness. Implemented secure, multi-tenant Snowflake environments with RBAC, column-level masking, and secure data sharing for departmental and partner use cases. Collaborated with cross-functional stakeholders clinicians, actuaries, finance controllers, and compliance teams to translate complex requirements into scalable, production-grade data solutions. Recognized for mentorship and leadership, onboarding junior engineers, promoting best practices in ETL design, and leading Agile delivery (Scrum, Jira, sprint planning, retrospectives). Trusted to deliver mission-critical data platforms with real-time availability, strong compliance posture, and measurable cost savings across global enterprises. Technical Skills Category Technologies / Tools ETL & Data Processing Azure Data Factory (ADF), Informatica PowerCenter, SSIS, PySpark, Sqoop, Oozie Category Technologies / Tools Cloud Platforms & Storage Microsoft Azure (Databricks, Synapse, Data Lake), AWS (S3) Programming & Scripting Python, SQL, Scala, Shell Scripting Data Warehousing Snowflake, Azure Synapse Analytics, SQL Server, Oracle Streaming & Real-Time Apache Spark, Spark Streaming, Kafka, Delta Lake BI & Reporting Power BI, SSRS, Tableau DevOps & CI/CD Azure DevOps, Jenkins, GitHub Actions, Git Data Modeling Star Schema, Snowflake Schema, SCD Type 1 & 2, ERwin Security & Governance Azure Purview, Data Lineage, Data Masking, HIPAA & GDPR Compliance Methodologies Agile/Scrum, Jira, Confluence, Sprint Planning Professional Experience Senior Data Engineer Azure & Snowflake Bayer Healthcare Whippany, NJ (May 2022 Present) Designed and deployed enterprise-scale ETL pipelines using Azure Data Factory to ingest large volumes of clinical and research data into Snowflake, improving data accessibility across teams. Optimized Spark-based data transformations in Azure Databricks, reducing batch processing runtime by ~35% and accelerating analytics for patient outcomes. Built a healthcare-optimized Snowflake data warehouse with clustering keys and multi-cluster warehouses to enable scalable, high-concurrency access for analysts and clinicians. Developed real-time streaming data pipelines with Apache Kafka and Spark Structured Streaming to ingest IoT medical device telemetry, enabling instant alerts on critical patient health events. Created reusable, parameter-driven ETL frameworks and templates for onboarding new electronic medical record (EMR) and diagnostic data sources, reducing development time by 40%. Implemented robust data governance with Azure Purview (data cataloging, lineage) and data masking to ensure HIPAA compliance and protect PHI across the pipeline. Set up end-to-end CI/CD pipelines using Azure DevOps and Jenkins for ADF, Databricks notebooks, and Snowflake, enabling automated deployments and consistent releases across environments. Utilized Snowflake Streams and Tasks to implement incremental loading of data, reducing dashboard refresh times and achieving near real-time data availability for stakeholders. Performed extensive performance tuning and cost optimization on Snowflake (clustering, query profiling, caching) and Spark jobs, cutting cloud compute costs by ~25% while maintaining SLA targets. Key Achievements: Enabled faster insights and stronger compliance by modernizing the data platform recognized by Bayer leadership for delivering a secure, scalable analytics architecture that lowered costs and reduced data latency from days to minutes. Data Engineer Budget Analytics & ETL Modernization Illinois Department of Innovation & Technology (DoIT) Springfield, IL (May 2021 Apr 2022) Consolidated financial data from 25+ state agencies into a centralized Snowflake data warehouse using Azure Data Factory, enabling unified statewide budget analysis and oversight. Developed dimensional data models and implemented SCD Type 2 logic to support year-over-year trend and variance analysis, providing dynamic historical reporting capabilities. Authored PySpark transformation scripts to clean, normalize, and standardize multi-agency datasets, resolving schema inconsistencies and improving cross-department data accuracy by ~40%. Configured fine-grained role-based access controls in Snowflake to allow secure inter-agency data sharing and collaboration while maintaining compliance with state governance policies. Built interactive Power BI dashboards with drill-through filters and DAX calculations on top of the Snowflake warehouse, empowering stakeholders with on-demand insights into fund allocations, expenditures, and budget performance. Automated end-to-end ETL orchestration using Azure Logic Apps and ADF event triggers to enable real-time data refresh cycles, eliminating manual intervention and delays in reporting. Created parameterized ADF pipeline templates for onboarding new agencies, reducing setup time from 7 days to under 48 hours and ensuring consistency across implementations. Tuned Snowflake virtual warehouses (right-sizing, result caching) and optimized query logic to improve dashboard responsiveness and reduce compute costs by ~30%. Implemented incremental data loading and delta processing using Snowflake Streams and Tasks, cutting report generation time from days to minutes and keeping data current for decision-makers. Key Achievements: Eliminated over 5,000 hours of annual manual work by modernizing legacy Excel-based processes. Recognized by state leadership for enabling real-time fiscal transparency and data-driven budgeting through an automated, resilient analytics platform. Senior Data Engineer Insurance Analytics (ETL & Compliance) AXA Insurance New York, NY (May 2020 Apr 2021) Reduced insurance claims processing time by ~40% by optimizing Spark ETL workflows in Azure Databricks, accelerating adjudication and settlements. Built scalable Snowflake data models for actuarial forecasting and underwriting analytics, leveraging clustering keys and multi-cluster warehouses. Implemented real-time data ingestion with Kafka + Spark Streaming to detect fraud in near real time, improving risk response times. Partnered with compliance/legal teams to deliver GDPR/IFRS 17-compliant datasets, ensuring regulatory audit readiness. Senior Data Engineer Azure Data Platform Delta Airlines Atlanta, GA (May 2019 Apr 2020) Developed real-time IoT ingestion pipelines using Azure Data Factory + Spark Streaming to capture aircraft telemetry for predictive maintenance. Modeled Snowflake data warehouses for flight, crew, and loyalty data, enabling high-concurrency queries for global operations. Tuned Spark ETL jobs (broadcast joins, caching) to improve execution times by 50% across large aviation datasets. Delivered Power BI dashboards with route performance, fleet utilization, and delay KPIs, reducing decision-making latency. Data Engineer Hadoop ETL & Migration Impetus (Consulting) Dallas, TX (May 2018 Apr 2019) Migrated legacy MapReduce jobs to Spark-based frameworks, cutting batch runtimes by 40% and improving reliability. Designed HiveQL and PySpark ETL flows to process structured/semi-structured web and log data into AWS S3. Orchestrated scalable workflows using Oozie with error handling and retries, ensuring SLA compliance. Documented metadata lineage and built data dictionaries to support downstream BI and audit teams. ETL Developer Informatica & SQL Server BI Harmonia Holdings Group Maryland (Apr 2016 Apr 2018) Built ETL pipelines with Informatica PowerCenter + SSIS to integrate ERP, flat-file, and MySQL data into SQL Server DW. Implemented SCD Type 2 logic in dimensional models to preserve historical accuracy for finance/HR analytics. Automated SSRS dashboard refreshes and ETL schedules with SQL Server Agent, reducing manual work by 50%. Tuned SQL queries, Informatica mappings, and cache settings to improve pipeline throughput and reporting speed. Junior Data Analyst / BI Developer Client: Ashburn, VA (Jun 2014 Mar 2016) Developed SSIS and T-SQL ETL workflows for daily infrastructure health and asset utilization reporting. Built and maintained Power BI/SSRS dashboards for uptime, performance, and incident tracking. Authored stored procedures and dimensional models to support ad hoc and SLA-bound reporting. Created automated SQL/SSIS error alerts, ensuring 99% SLA compliance for operational reports. Education Master of Science in Management Information Systems Lamar University, Beaumont, TX (May 2014) Keywords: continuous integration continuous deployment business intelligence sthree active directory rlang Georgia Illinois New Jersey New York Texas Virginia Keywords: continuous integration continuous deployment business intelligence sthree active directory rlang Georgia Illinois New Jersey New York Texas Virginia |