Home

Tejasri E - Data Engineer
[email protected]
Location: Moorestown, New Jersey, USA
Relocation: Yes
Visa: GC
Resume file: Tejasri - Data Engineer_1780322521071.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Tejasri E
[email protected]; (952)-356-8965
LinkedIn: linkedin.com/in/elimineti-tejasri

SUMMARY:
Senior Data Engineer with 12 years of experience delivering large-scale data engineering and cloud solutions across healthcare, retail, finance, manufacturing, and e-commerce. Experienced in leading end-to-end projects from requirements gathering to production deployment, ensuring scalability, reliability, and business impact.
Hands-on expertise in cloud platforms AWS (Glue, S3, Redshift, EMR, Lambda, Kinesis, Step Functions), and Azure (Data Factory, Databricks, Synapse, Data Lake, Event Hub). Skilled at migrating on-premises systems to cloud-native platforms while optimizing cost, security, and performance.
Strong background in big data frameworks and distributed processing with Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), and Kafka. Experienced in building both real-time streaming and batch ETL/ELT pipelines to process high-volume structured and unstructured data.
Deep experience with ETL and data integration tools including Informatica PowerCenter, AWS Glue, Azure Data Factory, Talend, and SSIS. Proficient in designing reusable frameworks, applying partitioning strategies, and performance-tuning pipelines for faster execution and reliability.
Proficient in SQL, PL/SQL, Python, and Shell scripting, with the ability to write complex queries, optimize stored procedures, and automate workflows. Experienced in data modeling (star and snowflake schemas), building fact/dimension tables, and enabling BI teams with clean, analytics-ready datasets.
Skilled in designing and implementing data lakes, data warehouses, and Delta Lake architectures to support enterprise analytics, BI dashboards, and regulatory reporting. Experience includes building scalable solutions for claims, retail transactions, e-commerce clickstream, and IoT data.
Exposure to Generative AI and LLM integration, including building data foundation layers and pipelines to support AI-driven analytics using AWS Bedrock, OpenAI APIs, and SageMaker Feature Store.
Strong understanding of data governance, lineage, and compliance frameworks, including HIPAA, SOX, and GDPR. Experience in implementing encryption, fine-grained IAM roles, and metadata standards to ensure secure, governed access to sensitive data.
Effective in Agile/Scrum environments, collaborating closely with cross-functional teams of business analysts, BI developers, architects, and data scientists. Proven ability to translate business needs into scalable technical solutions.
Adept at workflow automation and orchestration, using tools like Airflow, Control-M, AWS Step Functions, and Databricks jobs to reduce manual intervention and improve system reliability.
Modernized legacy data systems into cloud-native architectures, improving performance, data accessibility, and analytics capabilities.

TECHNICAL SKILLS:
Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda, Kinesis, SNS, SQS, Cloud trail, Step Functions, Athena, CloudWatch), Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage, Event Hub, Stream Analytics, Purview).
Big Data & Distributed Processing: Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), Kafka.
ETL / Data Integration: Informatica PowerCenter (8.x/9.x), AWS Glue, Azure Data Factory, Talend, SSIS.
Databases & Data Warehousing: Oracle (10g/11g), SQL Server, Teradata, Snowflake, Amazon Redshift, Azure Synapse.
Machine Learning & AI: AWS SageMaker, SageMaker Feature Store, Feature Engineering Pipelines, Generative AI Integration, AWS Bedrock, OpenAI APIs, LLM-powered analytics workflows.
Programming & Scripting: Python, SQL, PL/SQL, Shell Scripting (UNIX/Linux), Scala (basic).
Data Modeling & BI Tools: Star/Snowflake schema design, Power BI, Tableau, SSRS, Excel (advanced).
Streaming & Real-Time Processing: Kafka, AWS Kinesis, Azure Event Hub, Spark Streaming.
Workflow Orchestration: Apache Airflow, Control-M, Step Functions, Databricks Jobs.
Version Control & DevOps: Git, GitHub, Jenkins, CI/CD pipelines, Agile/Scrum methodology.
Other: Data Quality & Governance (metadata management, reconciliation, lineage), Performance Tuning (ETL, SQL, Spark), Security & Compliance (HIPAA, SOX, GDPR).

WORK EXPERIENCE:
Vanguard Group, Malvern, PA | Jan 2025 Till Date
Sr. Data Engineer
Project Description: The project s objective is to build an enterprise-grade AI-driven data platform that centralizes structured and semi-structured data across multiple domains - investments, customer analytics, and risk management - to enable self-service analytics, machine learning workloads, and generative AI insights. The platform integrates diverse data sources using automated ingestion frameworks and enforces governance, lineage, and security through Lake Formation and IAM policies.
Key Responsibilities:
Designed and implemented scalable ETL/ELT pipelines using Glue, PySpark, and AWS Lambda, transforming raw data into curated datasets following the Medallion (Bronze-Silver-Gold) architecture to support advanced analytics and machine learning workloads.
Built real-time streaming pipelines using Amazon Kinesis Data Streams and SQS, enabling low-latency ingestion of transactional and behavioral data for downstream analytics and predictive models.
Developed and optimized Amazon Redshift data warehouses by tuning sort keys, clustering, and distribution styles, improving query performance and accelerating access to analytical and training datasets.
Developed feature engineering pipelines integrated with SageMaker Feature Store, enabling reusable and standardized machine learning features across multiple predictive models.
Contributed to GenAI proof-of-concept initiatives by designing a data foundation layer integrating Bedrock and OpenAI APIs to enable LLM-based insights on investment and risk datasets.
Leveraged AWS Lake Formation to implement centralized data governance, metadata cataloging, and fine-grained access control for enterprise analytics datasets.
Implemented data quality validation and monitoring frameworks using Great Expectations and custom Python validation scripts to ensure data accuracy, completeness, and schema consistency across pipelines.
Implemented pipeline monitoring and observability using CloudWatch, Step Functions, and Lambda triggers, tracking pipeline health, schema drift, and data freshness.
Automated infrastructure provisioning and CI/CD deployments for data pipelines using Terraform and GitHub Actions, ensuring consistent, scalable, and reproducible environments across development, staging, and production.
Collaborated with Data Scientists, ML Engineers, and Analytics teams to design AI-ready datasets, build curated data marts, and enable reproducible datasets for experimentation and model training.
Environment: AWS (S3, Glue, Lambda, Redshift, Lake Formation, Kinesis, SQS, Step Functions, Athena, CloudWatch, SageMaker), Python, PySpark, SQL, Terraform, GitHub Actions, Tableau, QuickSight, Great Expectations, Jira, Agile-Scrum.

Tapestry Inc., New York city, NY | Sep 2023 Oct 2024
Senior Data Engineer
Project: Cloud Data Lake & Analytics Modernization
Tapestry, a global luxury fashion house, initiated a cloud-first data modernization program to unify data from retail stores, e-commerce platforms, supply chain systems, and customer loyalty applications. The goal was to build a centralized AWS data lake and analytics ecosystem to support enterprise-wide reporting, financial planning, and customer insights. As a Cloud Data Engineer, I have been responsible for designing scalable data pipelines, optimizing data storage, and enabling secure, governed access to enterprise data.
Key Responsibilities:
Designed and built ETL/ELT pipelines using AWS Glue, Lambda, and PySpark to ingest and process data from ERP, POS, Salesforce, and third-party retail systems into Amazon S3 and Amazon Redshift.
Developed PySpark transformations on EMR and Glue for cleansing, enrichment, and aggregation of large retail transaction datasets.
Implemented Delta Lake architecture and partitioning strategies on Amazon S3 to optimize storage efficiency, improve query performance, and support incremental data processing.
Integrated real-time streaming pipelines using Apache Kafka and Glue Streaming to process e-commerce clickstream events and inventory data feeds.
Designed and optimized database schemas in Amazon Redshift and PostgreSQL, including fact and dimension models as well as normalized tables, to support both analytical reporting and application-level data access.
Automated workflow orchestration and monitoring using Step Functions, Apache Airflow, and Amazon CloudWatch, while integrating Amazon SNS for real-time alerts and notifications on pipeline failures and job status updates.
Established data quality and governance frameworks, including reconciliation reports, metadata standards, and validation rules to ensure data accuracy, consistency, and reliability.
Enabled self-service analytics by collaborating with BI and finance teams using Tableau and Power BI, connected to Amazon Redshift and Amazon Athena.
Optimized cloud costs and performance using S3 lifecycle policies, compression techniques, and Redshift workload management.
Ensured SOX and GDPR compliance by implementing encryption with AWS KMS, fine-grained IAM access controls, and centralized audit logging using AWS CloudTrail for tracking user activities and data access across services.
Environment: AWS Glue, Amazon S3, Amazon Redshift, EMR (PySpark), Apache Kafka, AWS Lambda, Step Functions, Apache Airflow, Amazon Athena, Amazon SNS, AWS CloudTrail, Tableau, Power BI, SQL, Python, Git, Agile

Health Care Service Corporation, Richardson, TX | Nov 2021 May 2023
Senior Data Engineer
Project: Enterprise Data Platform & Cloud Modernization.
HCSC embarked on a large-scale cloud migration initiative to modernize its data ecosystem and enable advanced analytics across healthcare claims, provider, and member data. The goal was to move from legacy on-premises systems to a cloud-native data lake and warehouse on AWS, ensuring scalability, cost efficiency, and support for enterprise reporting. As anData Engineer, I was responsible for designing and implementing robust data pipelines, ensuring data quality, and supporting compliance requirements across the enterprise.
Key Responsibilities:
Designed and developed end-to-end ETL/ELT pipelines using Glue, Lambda, and PySpark to ingest and transform data from Oracle, SQL Server, APIs, and flat files into Amazon S3 and Amazon Redshift.
Built a scalable data lake architecture on Amazon S3 using partitioning strategies, file compaction, and lifecycle policies to optimize storage efficiency and reduce costs.
Integrated real-time streaming pipelines using Amazon Kinesis and AWS Glue Streaming to process real-time healthcare claims and provider data feeds.
Developed PySpark transformations on EMR and Glue for large-scale healthcare data cleansing, enrichment, and standardization.
Designed and implemented Amazon Redshift schema models, including star and snowflake schemas, to support enterprise reporting and analytical dashboards.
Automated workflow orchestration and job scheduling using Step Functions, Lambda triggers, and Apache Airflow to manage complex ETL dependencies.
Managed pipeline code, Glue scripts, and Airflow DAGs using Git, enabling version control, collaborative development, and code reviews across the data engineering team.
Implemented data quality frameworks using validation rules, reconciliation checks, and exception handling to ensure accuracy and completeness of healthcare datasets.
Optimized performance of Glue jobs, Amazon Redshift queries, and EMR clusters, improving pipeline throughput and reducing processing time.
Ensured HIPAA compliance and data security by implementing encryption using KMS, fine-grained IAM access policies, and centralized audit logging across AWS services.
Environment: AWS Glue, S3, Lambda, EMR (PySpark), Redshift, Kinesis, Step Functions, Airflow, Athena, Python, SQL, Oracle, SQL Server, Git, Agile.

Kroger, Blue Ash, OH | Oct 2018 Jun 2021
Data Engineer
Project: Enterprise Data Lake & Cloud Migration
The initiative focused on consolidating structured and unstructured data from point-of-sale (POS), supply chain, e-commerce, and customer loyalty systems into a centralized Azure Data Lake. As an Azure Data Engineer, I was responsible for designing and implementing scalable data pipelines, integrating enterprise data sources, and ensuring high performance and data quality across the platform.
Key Responsibilities:
Designed and implemented ETL/ELT pipelines using Azure Data Factory (ADF) for ingesting data from on-prem Oracle/SQL Server systems, APIs, and flat files into Azure Data Lake Storage (ADLS) and Azure SQL Data Warehouse (Synapse Analytics).
Developed and optimized PySpark/Spark transformations in Azure Databricks for cleansing, aggregating, and standardizing large-scale datasets.
Built real-time ingestion pipelines using Azure Event Hub and Stream Analytics for processing e-commerce clickstream and POS data.
Implemented data partitioning and Delta Lake architecture to support efficient querying and incremental processing.
Designed data models and star schemas for downstream BI solutions in Power BI and SSRS.
Applied data quality checks, exception handling, and reconciliation frameworks to ensure trusted analytics outputs.
Automated job scheduling and orchestration through ADF pipelines and Databricks notebooks, reducing manual intervention.
Monitored and optimized pipeline performance, improving processing times for critical retail datasets.
Collaborated with cross-functional teams including business analysts, architects, and data scientists to deliver end-to-end analytics solutions.
Supported data governance initiatives, including metadata management and implementing Azure Purview for data lineage and compliance.
Environment: Azure Data Factory, Azure Databricks (PySpark, Spark SQL), Azure Data Lake Storage, Azure Synapse Analytics, Event Hub, Stream Analytics, Power BI, SQL Server, Oracle, Python, Git, and Agile.

SoCal GAS, Los Angeles, CA| Mar 2016 Sep 2018
Data Engineer
Project Description: Worked on enterprise data modernization initiatives to support SoCal Gas s operations, billing, and customer service functions. The focus was on building scalable ETL pipelines to process large volumes of customer usage, billing, and meter data, integrating disparate systems into a centralized data warehouse. The project involved migrating legacy data pipelines to modern big data frameworks, enabling advanced analytics for regulatory compliance, energy consumption forecasting, and operational efficiency.
Key Responsibilities:
Designed and developed ETL workflows using Informatica, SQL, and Python to extract, transform, and load customer and operational data from multiple source systems into the enterprise data warehouse.
Worked with Oracle and SQL Server databases for relational storage and optimized queries to improve data processing performance.
Implemented data quality checks, validations, and reconciliation processes to ensure accurate reporting for billing and compliance.
Built and optimized batch pipelines for processing meter data, billing transactions, and asset management information.
Collaborated with business analysts, operations teams, and compliance officers to translate energy usage and billing requirements into technical data models.
Migrated portions of legacy ETL jobs into Hadoop/Spark-based pipelines for handling larger data volumes more efficiently.
Developed ad-hoc and scheduled reports using Power BI and Tableau for customer analytics, usage forecasting, and regulatory reporting.
Ensured compliance with energy regulatory standards by maintaining audit-ready data pipelines and producing consistent reporting outputs.
Performed performance tuning and query optimization to reduce ETL load times and improve reporting SLAs.
Supported data governance and metadata management efforts, ensuring lineage and traceability of critical energy data.
Environment: Oracle 11g/12c, SQL Server, Hadoop, Spark, Informatica PowerCenter, Python, UNIX Shell Scripting, Power BI, Tableau, Control-M, Git, Teradata, Agile/Scrum.

Experis, India | Aug 2013 Dec 2015
Data Consultant
Key Responsibilities:
Designed and optimized complex SQL/PL-SQL scripts for data validation, analysis, and integration from online quoting systems.
Developed and maintained stored procedures, triggers, indexes, and views in Oracle and SQL Server to implement business rules and enable reporting.
Built Hive bucketed and partitioned tables to optimize query performance and support large-scale distributed data processing.
Wrote MapReduce programs and HiveQL scripts to extract, transform, and load (ETL) data into the Hadoop Distributed File System (HDFS).
Used Sqoop for high-performance bulk data transfer between Oracle and Hive for downstream analytics.
Assisted in configuring and maintaining Hadoop ecosystem components such as Hive, HBase, and Sqoop.
Created and enhanced ETL workflows in Informatica PowerCenter, implementing complex transformations to meet business requirements.
Leveraged mapplets and reusable transformations in Informatica to improve standardization and reusability across ETL jobs.
Configured parameterized mappings and sessions for dynamic job execution and runtime flexibility.
Monitored and troubleshot ETL workflows using Workflow Manager and Workflow Monitor.
Defined and enforced metadata, data warehouse standards, and naming conventions to ensure consistency and maintainability.
Tuned ETL job performance by resolving target bottlenecks, optimizing queries, and applying pipeline partitioning in Informatica.
Environment: Informatica PowerCenter 8.6/9.x, Apache Hadoop, Hive, HBase, MapReduce, Sqoop, Oracle 10g/11g, SQL Server, PL/SQL, UNIX, Shell Scripting, Python (basic scripting), Toad, Windows NT.

Oasis Infotech, India | Apr 2011 Jun 2013
ETL Developer
Project: Laboratory Information Management System
Key Responsibilities:
Designed, developed, and maintained ETL workflows using Informatica PowerCenter 8.6/9.0, extracting and transforming data from Oracle databases and flat files into the enterprise data warehouse.
Wrote and optimized complex SQL/PL-SQL queries, stored procedures, and triggers to enable reporting and analytics.
Automated data ingestion and validation processes with UNIX Shell Scripts and scheduled jobs via cron.
Performed data validation, reconciliation, and quality checks, ensuring accuracy between source and target systems.
Supported business decision-making by preparing data summaries, performance reports, and ad-hoc analysis using SQL and Excel.
Worked on database performance tuning and indexing strategies to optimize ETL and reporting workloads.
Partnered with business analysts to translate requirements into scalable ETL and reporting solutions.
Investigated and resolved data inconsistencies by tracing upstream feeds and implementing corrections in ETL workflows.
Assisted with data migration, backup, and recovery activities during database upgrades and platform transitions.
Environment: Oracle 10g/11g, PL/SQL, Informatica PowerCenter 8.6/9.0, UNIX/Linux, Shell Scripting, SQL*Plus, Toad, MS Excel, Windows Server.

Certification:
AWS Certified Solutions Architect - Associate

EDUCATION:
IIMT College of Engineering, Greater Noida, India | Jun 2007 May 2011
Bachelor of Technology in Computer Science
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory microsoft mississippi procedural language California New York Ohio Pennsylvania Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7380
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: