Home

Rama Prasada - Lead Azure Data Engineer/Data Engineer
[email protected]
Location: North Richland Hills, Texas, USA
Relocation: REMOTE
Visa: H1b
Resume file: RamaPrasadaReddyYannam_LeadDataEngineer_Azure_15yrs_Exp__1768242728141.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
RAMA PRASADA REDDY YANNAM
Email: [email protected]
Call: +1 734-373-0232
Address: Northlake, Texas
Profile Summary
Lead Data Engineer with 15 years of experience in Leading teams, Project delivery, Data Engineering, Solution Design, Data Analysis, Information Management, Data Governance and Business Intelligence projects with expertise in Azure and AWS technologies. Experience working with Java, Scala, Python, Databricks, PySpark, Data Warehousing, Data Mart, Data Modelling, ETL Pipeline Designing and visualisations. Expertise in collaborating with various stakeholders including business users, Architects, Solution Designers, other Project Managers and Delivery Leads to deliver business outcomes.
Professional Summary
15+ years of IT experience across diverse industries, with a focus on Data Engineering and Java application development.
10 years of experience as a Data Engineer, specializing in designing and implementing end-to-end Hadoop infrastructure, including HDFS, HIVE, Sqoop, Oozie, Spark, Snowflake, Scala, NoSQL, Zookeeper,Apache Flume, Apache Flink, Apache Kafka and Airflow.
5+ years of Java development experience, with expertise in developing web-based and client-server applications using Java, React, Angular 2, Spring, Hibernate and SQL.
Experience in building APIs for data scraping using Python.
Experience with python libraries such as Pandas, Pyodbc, PyTest etc.
Experience in Azure and AWS Cloud services and infrastructure management.
Proficient in designing and developing Spark applications using Scala, PySpark and Java.
Skilled in designing efficient Data Lakes and optimizing data storage architecture to support complex data extraction, transformation, and loading (ETL) processes.
Expertise in Delta Live Tables (DLT) and Databricks Auto Loader to streamline real-time data ingestion and Medallion Architecture, transformation, ensuring up-to-date insights and reliable data pipelines.
Experienced in implementing Data governance and security best practices using Unity Catalog in Databricks, maintaining consistent data access control and compliance across distributed systems.
Experienced using DBT (Data Build Tool) to design and manage ELT processes for transforming and modeling data within modern data pipelines.
Hands-on experience with ADF pipelinee, SQL, PL/SQL, and advanced data modelling using Star and Snowflake schemas. Adept at using Azure Functions and Logic Apps for integration and automation.
Experienced with Alloy Studio, Alloy Cube, and Alloy Query Browser, with practical knowledge in Pure to Alloy migration.
Built and maintained scalable, high-performance data warehouse solutions on Snowflake to perform analytics on snowflake data.
Hands-on experience with Flume and Kafka for data streaming.
Expertise in NoSQL databases such as HBase, MongoDB and DynamoDB.
Experienced in creating design documentation using VS Studio and Draw.IO.
Extensive exposure to the Agile software development process.
Domain experience in Healthcare, Insurance, Banking, Travel, and E-Commerce.
Proven ability to rapidly acquire new technologies and optimize data engineering processes for improved efficiency and scalability.
Technical Skills
Programming Languages & Scripting:
Java, Python, Unix, Shell Script, SQL
SQL & NoSQL Databases Oracle, MySQL, PostgreSQL, SQL Server, HBase, MongoDB, Casandra, DynamoDB, Redis, Parquet, AVRO, ORC
Big Data Technologies HDFS, Hive, Sqoop, Oozie, Zookeeper, Apache Spark, Apache Kafka, Apache Flink, Apache Flume, Cloudera Manager, PySpark, SparkSQL, DBT (Data Build Tool)

Cloud & Data Platforms:
AWS (Redshift, Glue, EMR, Lambda, EC2, IAM, RDS, Data Pipeline, EMR, EC2, S3, CloudWatch, AWS Lambda, SNS, SQS, Step Functions, DynamoDB, AWS Athena, AWS Glue, Redshift)
Azure (Data Factory, Databricks, EventHub, Blob Storage, Azure Functions, Azure Blob Storage, Data Lakehouse, Delta Tables, Secret Management, Unity Catalog, ADLS Gen2, Delta Lake, Synapse Analytics, Azure DevOps)
Data Warehouse & Reporting Snowflake, Power BI, Tableau
Frameworks and Others Spring REST, Hibernate, React, Angular2, Django, Flask, Pandas, Micro Services
CI/CD and Tools Apache Airflow, Jenkins, GitHub, Grafana, Docker, Kubernetes, Elasticsearch, Logstash, Kibana,
Experience
Client: Albertsons, USA
Role: Technical Lead | ADB | Jun 2025 Present
Architected and led the development of end-to-end data pipelines using Azure Databricks and Azure Blob Storage.
Designed scalable PySpark and SQL-based ETL frameworks for data ingestion, transformation, data quality validation, and performance optimization.
Built and maintained batch full-load and CDC pipelines using PySpark on Azure Databricks.
Developed and optimized Delta Lake tables, leveraging partitioning and best practices for storage and query performance.
Led a team of data engineers, conducted design/code reviews, and enforced best practices in coding standards, CI/CD, Git branching, and DevOps workflows.
Collaborated closely with product teams to understand requirements, define KPIs, and deliver robust and scalable data solutions.
Partnered with backend and QA teams to ensure data quality testing, validation, and defect resolution.
Provided extensive support for Azure Databricks job monitoring, troubleshooting, and issue resolution.
Implemented complex functionalities using Graph Frames for graph-based data processing and analytics.
Built reusable Python/PySpark libraries and modules to enhance development efficiency and reduce repetitive coding.
Managed Databricks job orchestration, cluster configuration, environment setup, and production support, including monitoring, debugging, and root-cause analysis.
Implemented automated CI/CD pipelines using GitHub Actions to deploy notebooks, jobs, clusters, and related configurations.
Performed performance tuning of Spark jobs, SQL queries, and Delta tables to ensure high throughput and low latency.
Documented technical designs, data flows, and process workflows for development and support teams.
Environment: Azure Databricks, Azure Data Factory, Azure Data Lake Gen2, Azure Python, SQL, SQL Server, Spark SQL, PySpark, GitHub, Stone branch.
Client: Goldman Sachs Texas, USA
Role: Lead Data Engineer | PySpark, Azure
Sigmoid | May 2024 - Jun 2025
Led a team of Senior Data Engineers, offering technical guidance, mentoring, and presenting data architecture and delivery outcomes to senior business stakeholders.
Architected and implemented scalable data pipelines on Azure Databricks(PySaprk), aligning with Medallion Architecture to ensure data quality, lineage, and accessibility.
Built and deployed ETL/ELT pipelines using Azure Data Factory (ADF), Databricks, PySpark SQL, Delta Tables, Unity Catalog and PL/SQL.
Designed and implemented real-time streaming data pipelines using Apache Kafka to process high throughput, structured and semi-structured data.
Extracted, transformed, and loaded data into Azure Data Lake Storage Gen2 using PySpark notebooks and ADF pipelines.
Stored raw data in Parquet format to optimize storage and speed up downstream processing.
Enforced centralized data governance using Unity Catalog, implementing RBAC, audit logging, and fine-grained access controls.
Built and scheduled ETL/ELT pipelines to extract Workday data (e.g., employee, payroll, financials) into Azure Data Lake Storage (ADLS) and processed it using Azure Data Factory and Data Flows.
Leveraged Workday RaaS (Report-as-a-Service) and Custom Reports to extract complex datasets in JSON formats and orchestrated secure data ingestion into Azure.
Automated workflows for incremental loads, change data capture (CDC), and error handling for Workday source systems using Logic Apps and ADF triggers.
Applied GDPR-compliant data masking and security policies to safeguard sensitive information.
Built and optimized Delta Lake architectures supporting both batch and real-time ACID-compliant analytics workloads.
Migrated models from Pure Data Browser to Alloy/Legend, enhancing accessibility and model reuse.
Developed and maintained orchestration workflows in Airflow, automating job scheduling, data transformations, and error handling.
Tuned SQL queries by analyzing execution plans and applying indexing strategies to reduce latency.
Delivered analytical dashboards and reports using Tableau to enable data-driven decision-making across business units.
Conducted data profiling and defined remediation rules to address data quality issues.
Designed and implemented CI/CD pipelines using Azure DevOps to automate the build, test, and deployment of data pipelines and analytics solutions across dev, test, and production environments.
Managed infrastructure as code (IaC) using ARM templates, Terraform, and Bicep for provisioning Azure Data Factory, Databricks, Synapse, and other Azure resources.
Automated deployment and version control of Azure Data Factory (ADF) pipelines, datasets, and linked services using Azure Repos and Azure Pipelines. Defined and managed SDLC processes for Pure to Alloy model migration, improving deployment efficiency and traceability.

Environment: Azure Databricks, Azure Data Factory, Azure Synapse, Azure Data Lake Gen2, Azure DevOps, Parquet, Python, PL/SQL, SQL Server, SparkSQL, PySpark, Apache Kafka, Alloy Query, Alloy Studio, Unity Catalog, Tableau.
Client: Change Healthcare
Role: Lead Data Engineer | PySpark, Azure, ADB
Innova Solutions Private Limited, Bengaluru, India | Mar 2022- May 2024
Lead Senior Data Engineers and review the outcomes and presenting to senior Business stakeholders
Architected and implemented scalable data pipelines on Azure Databricks, aligning with Medallion Architecture principles to ensure data quality, lineage, and accessibility.
Collaborate with Business Analysts, Solution Designers and other delivery teams to develop and deploy data models and ETL pipelines in Azure Environment using ADF, Databricks, Spark-SQL and PySpark.
Developed complex data transformation logic using PySpark DataFrames and Spark SQL to support business intelligence and analytics initiatives.
Led schema design and implemented data quality validation within PySpark workflows to ensure data integrity and compliance with governance standards.
Built and maintained robust Snowflake data warehouse solutions to support business intelligence and analytics needs.
Developed and managed end-to-end data pipeline processes for both real-time and batch data ingestion into the Snowflake warehouse.
Ensured optimal performance, scalability, and reliability of the Snowflake environment through proactive monitoring and maintenance.
Lead Senior Data Engineers and provide performance reviews and lead Team initiatives
Interacted with business users to clarify on business logic required for the data models.
Gathering business requirements by organizing and managing meetings with Business Analysts, Data Stewards, and subject matter experts on a regular basis.
Utilized Spark SQL API and PySpark in Databricks to extract, load and transform data and perform SQL queries.
Conducted frequent meetings with my ETL coding and development team to co-ordinate the process and to efficiently organize and distribute the workflow among the team.
Developed scalable data pipelines using Azure EventHub, Data Factory, Databricks, and Blob Storage.
Built and maintained enterprise data lakes with Azure Data Lake and Snowflake for efficient processing and storage.
Partnered with GRC and risk management teams to define data requirements and governance rules for Archer datasets.
Ensured secure, auditable data integration between Archer and other enterprise platforms.
Optimized performance of Archer data integrations, reducing latency and improving refresh rates for critical compliance dashboards.
Implemented data validation and reconciliation checks to ensure accuracy and consistency of Archer-sourced data.
Automated deployments with CI/CD pipelines using GitHub, Azure DevOps, and Terraform.
Ensured data governance, security, and compliance in data workflows.
Integrated Power BI for analytics and reporting.
Developed and deployed RESTful APIs using FAST API to serve curated data from Azure Databricks and Snowflake.
Implemented custom Java connectors for Azure Eventhub using Eventhub SDK.
Integrated Azure Event Hub SDK (Java) to implement reliable real-time event-driven architecture for data processing.
Ensured idempotent event processing and retry mechanisms using checkpointing and Event Hub consumer groups.
Implemented OAuth2-based authentication and role-based access control for REST APIs.
Used Azure Key Vault and App Configuration for secure configuration and secret management.
Developed monitoring dashboards with Grafana and Power BI to track KPIs.
Automated monitoring, alerting, and reporting processes for data quality using Python, SQL, and orchestration tools
Environment: Azure Databricks, Azure Data Factory, Azure Data Lake Gen2, SparkSQL, PySpark, FAST API, Azure EventHub, Python, Power BI,
Role: Sr.Data Engineer | Fulltime
McAfee Software India Private Limited, Bengaluru, India | Nov 2018-Mar 2022
Implemented a data pipeline for efficient dataset processing and custom Flume sink for data ingestion.
Developed Kafka producers/consumers and configured HDFS for failover backup.
Created Spark jobs and SQL queries in Databricks for data transformation and testing.
Designed Hive queries and functions for data loading and filtering.
Built real-time and batch data ingestion frameworks with PySpark, integrated with AWS Glue and Apache Kafka for near real-time processing.
Optimized PySpark jobs by tuning memory usage, partitioning strategies, broadcast joins, and leveraging DataFrame API for better performance.
Integrated PySpark pipelines with cloud-native storage solutions(S3) and workflow orchestration tools like Airflow.
Led migration of on-prem applications to AWS cloud.
Developed AWS dashboards for error pattern analysis and Lambda functions for data forwarding.
Automated deployment using AWS CloudFormation and created Spark EMR jobs for DynamoDB uploads.
Implemented DBT best practices including version control (Git), testing (schema & data tests), documentation, and model lineage tracking.
Automated DBT model runs using CI/CD pipelines in AWS CodePipeline, Azure DevOps, or GitHub Actions, ensuring robust deployment of data models.
Built a framework for data uploads to multiple endpoints and implemented centralized logging with CloudWatch.
Leverage AWS Glue Crawlers to automatically catalog data stored in S3 and keep schemas up to date for seamless querying.
Write optimized SQL queries in Amazon Athena to analyze data directly in Amazon S3 using serverless, cost-efficient methods.
Integrate AWS Glue and Athena with other AWS services (e.g., S3, Redshift, Lambda, CloudWatch) for end-to-end data processing and analytics workflows.
Implement partitioning and compression strategies in Glue and Athena to improve query performance and reduce cost.
Schedule and orchestrate ETL workflows using AWS Glue Workflows, Triggers, and Step Functions.
Monitor and debug Glue jobs using CloudWatch Logs and implement error handling and retry mechanisms.
Environment: Python, SparkSQL, PySpark, Java. Flask, Python, S3, SNS, SQS, AWS Glue, Athena, EMR, EC2, Docker, Snowflake, Kafka, Flume, Hive, AWS Insight, CloudWatch, Databricks, ECS.
Role: Associate Developer | Fulltime
JP Morgan Services India Private Limited, Bengaluru, India | Dec 2015 - Nov 2018
Analyzed Hadoop cluster and big data tools (HDFS, Hive, Spark, Java, Sqoop).
Built scalable distributed data solutions using Hadoop infrastructure.
Developed and tested Spark code in Java and Spark-SQL/Streaming for efficient data processing.
Managed data import/export (SQL Server, Oracle, CSV, text files) between file systems and persistence systems.
Designed a data warehouse with Hive for improved data management.
Utilized Sqoop to export data to relational databases for reporting and visualization.
Tested application flows with Postman to ensure functionality and reliability.
Developed and maintained cloud-based data pipelines for structured and unstructured data processing, handling massive volumes of data from multiple sources in real-time and batch processing modes.
Built real-time streaming applications for network performance monitoring and service assurance, utilizing tools like Apache Kafka and AWS Kinesis for scalable event-driven architectures.
Optimized ETL/ELT workflows using AWS tools like Redshift, AWS Glue, EMR, and Apache
Automated data engineering processes and enhanced system performance using Python for developing data workflows, including Apache Beam and AWS Data Pipeline for stream processing.
Authored and optimized complex SQL queries for data analysis and performance tuning on big data platforms like Redshift and Athena, handling high-throughput data.
Diagnosed and resolved production issues to ensure data system reliability, utilizing Hadoop and Apache Spark to ensure high availability, fault tolerance, and optimized resource usage.
Automated deployment processes using GitHub Actions as part of a CI/CD pipeline, significantly reducing manual effort and accelerating release cycles.
Wrote complex SQL queries (PostgreSQL, MySQL) to analyze and transform data for case management, CRM, and business analytics systems.
Contributed to DevOps pipelines for seamless integration and deployment of data workflows,
utilizing Docker, Kubernetes, and cloud-native AWS services for continuous data pipeline automation.
Communicated technical solutions and insights to stakeholders, driving informed decision-making and the strategic use of Big Data technologies in telecom analytics.
Environment: Java, HDFS, Spark, Sqoop, Hive, AWS S3, AWS EMR, Athena, Glue, CloudWatch, SQL, Kafka, Lambda, DynamoDB.
Client: Target, USA
Senior Associate Engineer Technology L1
Sapient Consulting Pvt. Ltd, Bengaluru, India | June 2014 - Dec 2015
Developed user input forms with Angular2 and Material design for generating PowerPoint slides.
Created efficient, testable code through software analysis, programming, testing, and debugging.
Enabled users to generate financial reports in PowerPoint via a web browser interface.
Implemented Spring REST Web services for data retrieval from various applications.
Developed DAO layer using Hibernate Framework for streamlined data access.
Built administrative pages with Spring REST to update metadata without application releases.
Contributed to JUnit test case development using Mockito for effective testing.
Used Apache POI to generate MS Excel reports and charts.
Tested application flows with Postman to ensure functionality and reliability.
Environment: Java, Angular2, VBA, Spring Boot, Spring REST, Hibernate, Oracle, PowerPoint, Micro Services, AWS, ECS, Docker, JUnit.
Client: SSP Worldwide General Insurance, UK
Senior Software Engineer
NIIT Technologies Pvt. Ltd(Coforge), Noida, India | January 2012 - May 2014
Developed and maintained Java web applications using Spring, Hibernate, React and REST APIs.
Collaborated with cross-functional teams to deliver high-quality software on time and within budget.
Participated in code reviews, testing, and deployment to ensure maintainable code.
Troubleshot and resolved technical issues, providing effective solutions to meet client needs.
Developed and maintained dynamic, responsive web interfaces using React.js and Redux in a microservices-based architecture.
Integrated front-end components with RESTful APIs and backend services developed in Java (Spring Boot).
Applied React Hooks and functional components to modernize legacy class-based components and improve performance.
Implemented state management strategies using Redux and Context API for scalable component interaction.
Enhanced performance through lazy loading, memoization, and component reusability patterns.
Developed reusable UI components and design systems shared across multiple web applications.
Performed software analysis, programming, testing, and debugging.
Created design documents for various applications.
Environment: Java, React JS, Spring Boot, Spring REST, Hibernate, Oracle, Jenkins, JUnit
Client: Sony PS
Role: Software Engineer
ITech Solutions, Pune, India | March 2010 - December 2011
Led development of business logic using Java and J2EE technologies.
Implemented MVC architecture for structured, scalable applications.
Developed dynamic web pages with JSP and JavaScript to enhance user interfaces.
Created customer and batch upload modules for comprehensive functionality.
Executed test cases with JUnit and Mockito to ensure software reliability.
Environment: Java, Spring Boot, Spring REST, Hibernate, MySQL
Education
Bachelor of Technology Computer Science and Engineering (JNTU University, Anantapur), 2005 2009
Keywords: continuous integration continuous deployment quality analyst user interface javascript business intelligence sthree information technology microsoft mississippi procedural language Colorado

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6622
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: