Resume View

Home

Tejasri Elimineti - Data Engineer

Location: , , USA

Relocation: Yes

Visa: GC

Resume file: Tejasri - Data Engineer_1772477103092.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Tejasri Elimineti
[email protected]; (952)-356-8965
LinkedIn: linkedin.com/in/elimineti-tejasri

SUMMARY:
Senior Data Engineer with 12 years of experience delivering large-scale data engineering and cloud solutions across healthcare, retail, finance, manufacturing, and e-commerce. Experienced in leading end-to-end projects from requirements gathering to production deployment, ensuring scalability, reliability, and business impact.
Hands-on expertise in cloud platforms AWS (Glue, S3, Redshift, EMR, Lambda, Kinesis, Step Functions), and Azure (Data Factory, Databricks, Synapse, Data Lake, Event Hub). Skilled at migrating on-premises systems to cloud-native platforms while optimizing cost, security, and performance.
Strong background in big data frameworks and distributed processing with Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), and Kafka. Experienced in building both real-time streaming and batch ETL/ELT pipelines to process high-volume structured and unstructured data.
Deep experience with ETL and data integration tools including Informatica PowerCenter, AWS Glue, Azure Data Factory, Talend, and SSIS. Proficient in designing reusable frameworks, applying partitioning strategies, and performance-tuning pipelines for faster execution and reliability.
Proficient in SQL, PL/SQL, Python, and Shell scripting, with the ability to write complex queries, optimize stored procedures, and automate workflows. Experienced in data modeling (star and snowflake schemas), building fact/dimension tables, and enabling BI teams with clean, analytics-ready datasets.
Skilled in designing and implementing data lakes, data warehouses, and Delta Lake architectures to support enterprise analytics, BI dashboards, and regulatory reporting. Experience includes building scalable solutions for claims, retail transactions, e-commerce clickstream, and IoT data.
Strong understanding of data governance, lineage, and compliance frameworks, including HIPAA, SOX, and GDPR. Experience in implementing encryption, fine-grained IAM roles, and metadata standards to ensure secure, governed access to sensitive data.
Effective in Agile/Scrum environments, collaborating closely with cross-functional teams of business analysts, BI developers, architects, and data scientists. Proven ability to translate business needs into scalable technical solutions.
Adept at workflow automation and orchestration, using tools like Airflow, Control-M, AWS Step Functions, and Databricks jobs to reduce manual intervention and improve system reliability.
Modernized legacy data systems into cloud-native architectures, improving performance, data accessibility, and analytics capabilities.

TECHNICAL SKILLS:
Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda, Kinesis, Step Functions, Athena, CloudWatch), Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage, Event Hub, Stream Analytics, Purview).
Big Data & Distributed Processing: Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), Kafka.
ETL / Data Integration: Informatica PowerCenter (8.x/9.x), AWS Glue, Azure Data Factory, Talend, SSIS.
Databases & Data Warehousing: Oracle (10g/11g), SQL Server, Teradata, Snowflake, Amazon Redshift, Azure Synapse, BigQuery.
Programming & Scripting: Python, SQL, PL/SQL, Shell Scripting (UNIX/Linux), Scala (basic).
Data Modeling & BI Tools: Star/Snowflake schema design, Power BI, Tableau, SSRS, Excel (advanced).
Workflow Orchestration: Apache Airflow, Control-M, Step Functions, Databricks Jobs.
Version Control & DevOps: Git, GitHub, Jenkins, CI/CD pipelines, Agile/Scrum methodology.
Other: Data Quality & Governance (metadata management, reconciliation, lineage), Performance Tuning (ETL, SQL, Spark), Security & Compliance (HIPAA, SOX, GDPR).

WORK EXPERIENCE:
Vanguard Group, Malvern, PA | Jan 2025 Till Date
Sr. Data Engineer
Project Description: The project s objective is to build an enterprise-grade AI-driven data platform that centralizes structured and semi-structured data across multiple domains - investments, customer analytics, and risk management - to enable self-service analytics, machine learning workloads, and generative AI insights. The platform integrates diverse data sources using automated ingestion frameworks and enforces governance, lineage, and security through AWS Lake Formation and IAM policies.
Key Responsibilities:
Designed and implemented ETL/ELT pipelines using AWS Glue, PySpark, and Lambda, transforming raw data into curated datasets following the Medallion (Bronze-Silver-Gold) architecture to support AI and ML model readiness.
Built real-time streaming pipelines using Kinesis Data Streams and SQS, enabling low-latency data ingestion for AI-powered risk scoring and customer insight models.
Developed and optimized Redshift and Snowflake data warehouses for AI workloads by tuning sort keys, clustering, and distribution styles to accelerate model training data retrieval.
Automated infrastructure provisioning and data pipeline deployments using Terraform and GitHub Actions, ensuring consistent and reproducible environments for ML and analytics teams.
Implemented data quality, validation, and drift detection frameworks using Great Expectations and custom Python-based ML validation scripts to ensure accuracy of model input data.
Leveraged AWS Lake Formation for centralized governance, lineage tracking, and fine-grained access control for AI training datasets.
Developed and maintained feature engineering pipelines that feed the SageMaker Feature Store, enabling data scientists to reuse standardized AI features across models.
Integrated ML monitoring and observability using CloudWatch, Step Functions, and custom Lambda triggers to track model input freshness, schema changes, and anomalies.
I partnered with Data Scientists and ML Engineers to define AI-ready data schemas, design data marts for predictive analytics, and automate dataset versioning for experimentation.
Contributed to the design of a GenAI data foundation layer using AWS Bedrock and OpenAI APIs for proof-of-concept LLM-based insights on investment and risk datasets.
Environment: AWS (S3, Glue, Lambda, Redshift, Lake Formation, Kinesis, SQS, Step Functions, Athena, CloudWatch, SageMaker), Python, PySpark, SQL, Terraform, GitHub Actions, Snowflake, Tableau, QuickSight, Great Expectations, Jira, Agile-Scrum

Tapestry Inc., New York city, NY | Sep 2023 Oct 2024
Senior Data Engineer
Project: Cloud Data Lake & Analytics Modernization
Tapestry, a global luxury fashion house, initiated a cloud-first data modernization program to unify data from retail stores, e-commerce platforms, supply chain systems, and customer loyalty applications. The goal was to build a centralized AWS data lake and analytics ecosystem to support enterprise-wide reporting, financial planning, and customer insights. As a Cloud Data Engineer, I have been responsible for designing scalable data pipelines, optimizing data storage, and enabling secure, governed access to enterprise data.
Key Responsibilities:
Designed and built ETL/ELT pipelines using AWS Glue, Lambda, and PySpark to ingest and process data from ERP, POS, Salesforce, and third-party retail systems into Amazon S3 and Redshift.
Developed PySpark transformations on AWS EMR and Glue for cleansing, enrichment, and aggregation of large retail transaction datasets.
Implemented Delta Lake and partitioning strategies on S3 to optimize storage, query performance, and incremental loads.
Designed and optimized Redshift models and fact/dimension tables to support enterprise BI dashboards and ad-hoc analysis.
Integrated real-time pipelines with Amazon Kinesis and Glue streaming jobs for e-commerce clickstream and inventory data feeds.
Automated orchestration and monitoring of workflows using Step Functions, Airflow, and CloudWatch, ensuring reliability and transparency.
Established data quality and governance frameworks, including reconciliation reports, metadata standards, and security policies.
Worked closely with BI and finance teams to enable self-service analytics through Tableau and Power BI connected to Redshift and Athena.
Collaborated with cross-functional teams on cloud cost optimization, leveraging lifecycle policies, compression, and Redshift workload management.
Ensured compliance with SOX and GDPR regulations by implementing encryption (KMS), fine-grained IAM policies, and audit logging.
Environment: AWS Glue, S3, Redshift, EMR (PySpark), Kinesis, Lambda, Step Functions, Airflow, Athena, Tableau, Power BI, SQL, Python, Git, Agile.

Health Care Service Corporation, Richardson, TX | Nov 2021 May 2023
Senior Data Engineer
Project: Enterprise Data Platform & Cloud Modernization.
HCSC embarked on a large-scale cloud migration initiative to modernize its data ecosystem and enable advanced analytics across healthcare claims, provider, and member data. The goal was to move from legacy on-premises systems to a cloud-native data lake and warehouse on AWS, ensuring scalability, cost efficiency, and support for enterprise reporting. As an AWS Data Engineer, I was responsible for designing and implementing robust data pipelines, ensuring data quality, and supporting compliance requirements across the enterprise.
Key Responsibilities:
Designed and developed end-to-end using AWS Glue, Lambda, and PySpark to ingest and transform data from Oracle, SQL Server, APIs, and flat files into Amazon S3 and Redshift.
Built scalable data lake architecture on S3 with partitioning, compaction, and lifecycle policies to optimize cost and performance.
Integrated streaming data pipelines using Amazon Kinesis and Glue streaming jobs to process real-time claims and provider feeds.
Developed PySpark transformations on EMR and Glue for large-scale healthcare data cleansing, enrichment, and standardization.
Designed and implemented Redshift schema models (star and snowflake) to support enterprise reporting and dashboards.
Automated orchestration and scheduling of jobs using AWS Step Functions, Lambda triggers, and Airflow.
Implemented data quality frameworks with validation rules, reconciliation checks, and exception handling.
Tuned performance of Glue jobs, Redshift queries, and EMR clusters to improve throughput and reduce processing time.
Ensured HIPAA compliance and security by implementing encryption (KMS), IAM policies, and audit logging across AWS services.
Environment: AWS Glue, S3, Lambda, EMR (PySpark), Redshift, Kinesis, Step Functions, Airflow, Athena, Python, SQL, Oracle, SQL Server, Git, Agile.

Kroger, Blue Ash, OH | Oct 2018 Jun 2021
Data Engineer
Project: Enterprise Data Lake & Cloud Migration
The initiative focused on consolidating structured and unstructured data from point-of-sale (POS), supply chain, e-commerce, and customer loyalty systems into a centralized Azure Data Lake. As an Azure Data Engineer, I was responsible for designing and implementing scalable data pipelines, integrating enterprise data sources, and ensuring high performance and data quality across the platform.
Key Responsibilities:
Designed and implemented ETL/ELT pipelines using Azure Data Factory (ADF) for ingesting data from on-prem Oracle/SQL Server systems, APIs, and flat files into Azure Data Lake Storage (ADLS) and Azure SQL Data Warehouse (Synapse Analytics).
Developed and optimized PySpark/Spark transformations in Azure Databricks for cleansing, aggregating, and standardizing large-scale datasets.
Built real-time ingestion pipelines using Azure Event Hub and Stream Analytics for processing e-commerce clickstream and POS data.
Implemented data partitioning and Delta Lake architecture to support efficient querying and incremental processing.
Designed data models and star schemas for downstream BI solutions in Power BI and SSRS.
Applied data quality checks, exception handling, and reconciliation frameworks to ensure trusted analytics outputs.
Automated job scheduling and orchestration through ADF pipelines and Databricks notebooks, reducing manual intervention.
Monitored and optimized pipeline performance, improving processing times for critical retail datasets.
Collaborated with cross-functional teams including business analysts, architects, and data scientists to deliver end-to-end analytics solutions.
Supported data governance initiatives, including metadata management and implementing Azure Purview for data lineage and compliance.
Environment: Azure Data Factory, Azure Databricks (PySpark, Spark SQL), Azure Data Lake Storage, Azure Synapse Analytics, Event Hub, Stream Analytics, Power BI, SQL Server, Oracle, Python, Git, and Agile.

SoCal GAS, Los Angeles, CA| Mar 2016 Sep 2018
Data Engineer
Project Description: Worked on enterprise data modernization initiatives to support SoCal Gas s operations, billing, and customer service functions. The focus was on building scalable ETL pipelines to process large volumes of customer usage, billing, and meter data, integrating disparate systems into a centralized data warehouse. The project involved migrating legacy data pipelines to modern big data frameworks, enabling advanced analytics for regulatory compliance, energy consumption forecasting, and operational efficiency.
Key Responsibilities:
Designed and developed ETL workflows using Informatica, SQL, and Python to extract, transform, and load customer and operational data from multiple source systems into the enterprise data warehouse.
Worked with Oracle and SQL Server databases for relational storage and optimized queries to improve data processing performance.
Implemented data quality checks, validations, and reconciliation processes to ensure accurate reporting for billing and compliance.
Built and optimized batch pipelines for processing meter data, billing transactions, and asset management information.
Collaborated with business analysts, operations teams, and compliance officers to translate energy usage and billing requirements into technical data models.
Migrated portions of legacy ETL jobs into Hadoop/Spark-based pipelines for handling larger data volumes more efficiently.
Developed ad-hoc and scheduled reports using Power BI and Tableau for customer analytics, usage forecasting, and regulatory reporting.
Ensured compliance with energy regulatory standards by maintaining audit-ready data pipelines and producing consistent reporting outputs.
Performed performance tuning and query optimization to reduce ETL load times and improve reporting SLAs.
Supported data governance and metadata management efforts, ensuring lineage and traceability of critical energy data.
Environment: Oracle 11g/12c, SQL Server, Hadoop, Spark, Informatica PowerCenter, Python, UNIX Shell Scripting, Power BI, Tableau, Control-M, Git, Teradata, Agile/Scrum.

Experis, India | Aug 2013 Dec 2015
Data Consultant
Key Responsibilities:
Designed and optimized complex SQL/PL-SQL scripts for data validation, analysis, and integration from online quoting systems.
Developed and maintained stored procedures, triggers, indexes, and views in Oracle and SQL Server to implement business rules and enable reporting.
Built Hive bucketed and partitioned tables to optimize query performance and support large-scale distributed data processing.
Wrote MapReduce programs and HiveQL scripts to extract, transform, and load (ETL) data into the Hadoop Distributed File System (HDFS).
Used Sqoop for high-performance bulk data transfer between Oracle and Hive for downstream analytics.
Assisted in configuring and maintaining Hadoop ecosystem components such as Hive, HBase, and Sqoop.
Created and enhanced ETL workflows in Informatica PowerCenter, implementing complex transformations to meet business requirements.
Leveraged mapplets and reusable transformations in Informatica to improve standardization and reusability across ETL jobs.
Configured parameterized mappings and sessions for dynamic job execution and runtime flexibility.
Monitored and troubleshot ETL workflows using Workflow Manager and Workflow Monitor.
Defined and enforced metadata, data warehouse standards, and naming conventions to ensure consistency and maintainability.
Tuned ETL job performance by resolving target bottlenecks, optimizing queries, and applying pipeline partitioning in Informatica.
Environment: Informatica PowerCenter 8.6/9.x, Apache Hadoop, Hive, HBase, MapReduce, Sqoop, Oracle 10g/11g, SQL Server, PL/SQL, UNIX, Shell Scripting, Python (basic scripting), Toad, Windows NT.

Oasis Infotech, India | Apr 2011 Jun 2013
ETL Developer
Project: Laboratory Information Management System
Key Responsibilities:
Designed, developed, and maintained ETL workflows using Informatica PowerCenter 8.6/9.0, extracting and transforming data from Oracle databases and flat files into the enterprise data warehouse.
Wrote and optimized complex SQL/PL-SQL queries, stored procedures, and triggers to enable reporting and analytics.
Automated data ingestion and validation processes with UNIX Shell Scripts and scheduled jobs via cron.
Performed data validation, reconciliation, and quality checks, ensuring accuracy between source and target systems.
Supported business decision-making by preparing data summaries, performance reports, and ad-hoc analysis using SQL and Excel.
Worked on database performance tuning and indexing strategies to optimize ETL and reporting workloads.
Partnered with business analysts to translate requirements into scalable ETL and reporting solutions.
Investigated and resolved data inconsistencies by tracing upstream feeds and implementing corrections in ETL workflows.
Assisted with data migration, backup, and recovery activities during database upgrades and platform transitions.
Environment: Oracle 10g/11g, PL/SQL, Informatica PowerCenter 8.6/9.0, UNIX/Linux, Shell Scripting, SQL*Plus, Toad, MS Excel, Windows Server.

EDUCATION:
IIMT College of Engineering, Greater Noida, India | Jun 2007 May 2011
Bachelor of Technology in Computer Science
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory microsoft mississippi procedural language California New York Ohio Pennsylvania Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];6917

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: