| Tejasri - Senior Data Engineer |
| [email protected] |
| Location: Moorestown, New Jersey, USA |
| Relocation: Yes |
| Visa: GC |
| Resume file: Tejasri - Senior Data Engineer_1767107108791.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Tejasri Elimineti
Email: [email protected]; Phone: (952)-356-8965 LinkedIn: linkedin.com/in/elimineti-tejasri SUMMARY: Senior Data Engineer with 12 years of experience delivering large-scale data engineering and cloud solutions across healthcare, retail, finance, and e-commerce. Experienced in leading end-to-end projects from requirements gathering to production deployment, ensuring scalability, reliability, and business impact. Hands-on expertise in cloud platforms - AWS (Glue, S3, Redshift, EMR, Lambda, Kinesis, Step Functions), and Azure (Data Factory, Databricks, Synapse, Data Lake, Event Hub). Skilled at migrating on-premises systems to cloud-native platforms while optimizing cost, security, and performance. Strong background in big data frameworks and distributed processing with Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), and Kafka. Experienced in building both real-time streaming and batch ETL/ELT pipelines to process high-volume structured and unstructured data. Deep experience with ETL and data integration tools including Informatica PowerCenter, AWS Glue, Azure Data Factory, Talend, and SSIS. Proficient in designing reusable frameworks, applying partitioning strategies, and performance-tuning pipelines for faster execution and reliability. Proficient in SQL, PL/SQL, Python, and Shell scripting, with the ability to write complex queries, optimize stored procedures, and automate workflows. Experienced in data modeling (star and snowflake schemas), building fact/dimension tables, and enabling BI teams with clean, analytics-ready datasets. Skilled in designing and implementing data lakes, data warehouses, and Delta Lake architectures to support enterprise analytics, BI dashboards, and regulatory reporting. Experience includes building scalable solutions for claims, retail transactions, e-commerce clickstream, and IoT data. Strong understanding of data governance, lineage, and compliance frameworks, including HIPAA, SOX, and GDPR. Experience in implementing encryption, fine-grained IAM roles, and metadata standards to ensure secure, governed access to sensitive data. Effective in Agile/Scrum environments, collaborating closely with cross-functional teams of business analysts, BI developers, architects, and data scientists. Adept at workflow automation and orchestration, using tools like Airflow, Control-M, AWS Step Functions, and Databricks jobs to reduce manual intervention and improve system reliability. Modernized legacy data systems by shifting them to cloud-native architectures, improving performance, data availability, and overall reporting capability. TECHNICAL SKILLS: Cloud Platforms: AWS (S3, Glue, Redshift, EMR, Lambda, Kinesis, Step Functions, Athena, CloudWatch, Lake Formation), Azure (Data Factory, Databricks, Synapse Analytics, Data Lake Storage, Event Hub, Stream Analytics, Purview). Cloud Data Warehouses: Snowflake, Amazon Redshift, Azure Synapse Big Data & Distributed Processing: Apache Spark (PySpark, Spark SQL), Hadoop (HDFS, Hive, MapReduce, Sqoop), Kafka. ETL / Data Integration: Informatica PowerCenter (8.x/9.x), AWS Glue, Azure Data Factory, Talend, SSIS. Databases & Data Warehousing: Oracle (10g/11g), SQL Server, Teradata, Snowflake, Amazon Redshift, Azure Synapse. Programming & Scripting: Python, SQL, PL/SQL, Shell Scripting (UNIX/Linux), Scala (basic). Data Modeling & BI Tools: Star/Snowflake schema design, Power BI, Tableau, SSRS, Excel (advanced). Workflow Orchestration: Airflow, Control-M, Step Functions, Databricks Jobs. Version Control & DevOps: Git, GitHub, Jenkins, CI/CD pipelines, Agile/Scrum methodology. Other: Data Quality & Governance (metadata management, reconciliation, lineage), Performance Tuning (ETL, SQL, Spark), Security & Compliance (HIPAA, SOX, GDPR). WORK EXPERIENCE: Vanguard Group, Malvern, PA | Jan 2025 Till Date Sr. Data Engineer Project Description: The project s objective is to build an enterprise-grade AI-driven data platform that centralizes structured and semi-structured data across multiple domains - investments, customer analytics, and risk management - to enable self-service analytics, machine learning workloads, and generative AI insights. The platform integrates diverse data sources using automated ingestion frameworks and enforces governance, lineage, and security through AWS Lake Formation and IAM policies. Key Responsibilities: Designed and implemented ETL/ELT pipelines using AWS Glue, PySpark, and Lambda, transforming raw data into curated datasets following the Medallion (Bronze-Silver-Gold) architecture to support AI and ML model readiness. Built and supported real-time and near-real-time data ingestion pipelines using Amazon Kinesis, SQS, and Glue streaming jobs to enable low-latency data processing for risk scoring and customer insight use cases. Developed and optimized cloud data warehouse layers on Amazon Redshift and Snowflake, designing analytics-ready schemas and tuning data access patterns to support AI model training and downstream reporting. Automated infrastructure provisioning and data pipeline deployments using Terraform and GitHub Actions, ensuring consistent, reproducible environments across development, test, and production. Implemented AI-focused data quality, validation, and drift detection frameworks using Great Expectations and custom Python-based checks to ensure reliability and freshness of model input datasets. Enforced centralized data governance, lineage, and fine-grained access control using AWS Lake Formation, IAM policies, and metadata standards for sensitive AI training and analytics data. Developed and maintained feature engineering pipelines feeding the SageMaker Feature Store, enabling reuse of standardized features across multiple machine learning models. Integrated monitoring and observability for data and ML pipelines using CloudWatch, Step Functions, and Lambda-based alerts to track schema changes, data freshness, and operational anomalies. Collaborated with data scientists and ML engineers to define AI-ready data schemas, design analytical data marts, and support dataset versioning for experimentation and model development. Contributed to the design and implementation of a GenAI data foundation layer leveraging AWS Bedrock and OpenAI APIs to support proof-of-concept LLM-driven insights on investment and risk datasets. Environment: AWS (S3, Glue, Lambda, Redshift, Lake Formation, Kinesis, Step Functions, Athena, CloudWatch, SageMaker), Python, PySpark, SQL, Terraform, GitHub Actions, Snowflake, Great Expectations, Jira, Agile/Scrum. Tapestry Inc., New York city, NY | Sep 2023 Oct 2024 Senior Data Engineer Project: Cloud Data Lake & Analytics Modernization Tapestry, a global luxury fashion house, initiated a cloud-first data modernization program to unify data from retail stores, e-commerce platforms, supply chain systems, and customer loyalty applications. The goal was to build a centralized AWS data lake and analytics ecosystem to support enterprise-wide reporting, financial planning, and customer insights. As a Cloud Data Engineer, I have been responsible for designing scalable data pipelines, optimizing data storage, and enabling secure, governed access to enterprise data. Key Responsibilities: Designed and built scalable ETL/ELT pipelines using AWS Glue, Databricks (PySpark), and AWS Lambda to ingest and process retail, e-commerce, and finance data from ERP, POS, Salesforce, and third-party systems into AWS S3 and Amazon Redshift. Developed PySpark transformations on AWS EMR and Glue to cleanse, enrich, and aggregate high-volume retail transaction and inventory datasets for reporting and analytics. Implemented Delta Lake tables and partitioning strategies on AWS S3 to support efficient incremental loads, improve query performance, and manage growing retail data volumes. Designed and optimized Redshift data models, including fact and dimension tables, to support enterprise BI dashboards and ad-hoc analysis for business and finance teams. Built real-time data ingestion pipelines using Amazon Kinesis and AWS Glue streaming jobs to process e-commerce clickstream and inventory feeds. Automated workflow orchestration and monitoring using AWS Step Functions, Airflow, and CloudWatch to ensure reliable daily and intraday data processing. Established data quality and governance practices, including reconciliation checks, metadata standards, and access controls, to ensure trusted analytics outputs. Enabled self-service analytics by supporting Tableau and Power BI reporting on Redshift and Athena for retail operations, merchandising, and finance users. Partnered with engineering, BI, and infrastructure teams on cloud cost optimization initiatives, including lifecycle policies, compression strategies, and Redshift workload management. Ensured compliance with SOX and GDPR requirements by implementing encryption using KMS, fine-grained IAM policies, and audit logging across data pipelines. Environment: AWS Glue, Databricks, S3, Redshift, EMR (PySpark), Kinesis, Lambda, Step Functions, Airflow, Athena, Tableau, Power BI, SQL, Python, Git, Agile. Health Care Service Corporation, Richardson, TX | Nov 2021 May 2023 Senior Data Engineer Project: Enterprise Data Platform & Cloud Modernization. HCSC embarked on a large-scale cloud migration initiative to modernize its data ecosystem and enable advanced analytics across healthcare claims, provider, and member data. The goal was to move from legacy on-premises systems to a cloud-native data lake and warehouse on AWS, ensuring scalability, cost efficiency, and support for enterprise reporting. As an AWS Data Engineer, I was responsible for designing and implementing robust data pipelines, ensuring data quality, and supporting compliance requirements across the enterprise. Key Responsibilities: Designed and developed end-to-end batch data pipelines using AWS Glue, Lambda, and PySpark to migrate healthcare claims, provider, and member data from Oracle, SQL Server, APIs, and flat files into Amazon S3 and Redshift. Built and maintained a scalable healthcare data lake on AWS S3, implementing partitioning, compaction, and lifecycle policies to manage large historical datasets and control storage costs. Integrated real-time data ingestion using Amazon Kinesis and Glue streaming jobs to process inbound claims and provider feeds requiring near real-time availability. Developed PySpark-based data standardization and cleansing logic on EMR and Glue to normalize healthcare datasets across multiple source systems. Designed and implemented Redshift star and snowflake schemas tailored for enterprise healthcare reporting and regulatory dashboards. Automated job scheduling and dependency management using AWS Step Functions, Lambda triggers, and Airflow to support reliable daily and intraday processing cycles. Implemented healthcare-focused data quality controls, including validation rules, reconciliation checks, and exception handling to ensure accuracy of claims and member data. Performed performance tuning of Glue jobs, EMR workloads, and Redshift queries to improve batch processing throughput and meet reporting SLAs. Ensured HIPAA compliance by implementing encryption using KMS, IAM-based access controls, and audit logging across AWS data services. Environment: AWS Glue, S3, Lambda, EMR (PySpark), Redshift, Kinesis, Step Functions, Airflow, Athena, Python, SQL, Oracle, SQL Server, Git, Agile. Kroger, Blue Ash, OH | Oct 2018 Jun 2021 Data Engineer Project: Enterprise Data Lake & Cloud Migration The initiative focused on consolidating structured and unstructured data from point-of-sale (POS), supply chain, e-commerce, and customer loyalty systems into a centralized Azure Data Lake. As an Azure Data Engineer, I was responsible for designing and implementing scalable data pipelines, integrating enterprise data sources, and ensuring high performance and data quality across the platform. Key Responsibilities: Designed and implemented ETL/ELT pipelines using Azure Data Factory (ADF) for ingesting data from on-prem Oracle/SQL Server systems, APIs, and flat files into Azure Data Lake Storage (ADLS) and Azure SQL Data Warehouse (Synapse Analytics). Developed and optimized PySpark/Spark transformations in Azure Databricks for cleansing, aggregating, and standardizing large-scale datasets. Built real-time ingestion pipelines using Azure Event Hub and Stream Analytics for processing e-commerce clickstream and POS data. Implemented data partitioning and Delta Lake architecture to support efficient querying and incremental processing. Designed data models and star schemas for downstream BI solutions in Power BI and SSRS. Applied data quality checks, exception handling, and reconciliation frameworks to ensure trusted analytics outputs. Automated job scheduling and orchestration through ADF pipelines and Databricks notebooks, reducing manual intervention. Monitored and optimized pipeline performance, improving processing times for critical retail datasets. Collaborated with cross-functional teams including business analysts, architects, and data scientists to deliver end-to-end analytics solutions. Supported data governance initiatives, including metadata management and implementing Azure Purview for data lineage and compliance. Environment: Azure Data Factory, Azure Databricks (PySpark, Spark SQL), Azure Data Lake Storage, Azure Synapse Analytics, Event Hub, Stream Analytics, Power BI, SQL Server, Oracle, Python, Git, and Agile. SoCal GAS, Los Angeles, CA| Mar 2016 Sep 2018 Data Engineer Project Description: Worked on enterprise data modernization initiatives to support SoCal Gas s operations, billing, and customer service functions. The focus was on building scalable ETL pipelines to process large volumes of customer usage, billing, and meter data, integrating disparate systems into a centralized data warehouse. The project involved migrating legacy data pipelines to modern big data frameworks, enabling advanced analytics for regulatory compliance, energy consumption forecasting, and operational efficiency. Key Responsibilities: Designed and developed ETL workflows using Informatica, SQL, and Python to extract, transform, and load customer and operational data from multiple source systems into the enterprise data warehouse. Worked with Oracle and SQL Server databases for relational storage and optimized queries to improve data processing performance. Implemented data quality checks, validations, and reconciliation processes to ensure accurate reporting for billing and compliance. Built and optimized batch pipelines for processing meter data, billing transactions, and asset management information. Collaborated with business analysts, operations teams, and compliance officers to translate energy usage and billing requirements into technical data models. Migrated portions of legacy ETL jobs into Hadoop/Spark-based pipelines for handling larger data volumes more efficiently. Developed ad-hoc and scheduled reports using Power BI and Tableau for customer analytics, usage forecasting, and regulatory reporting. Ensured compliance with energy regulatory standards by maintaining audit-ready data pipelines and producing consistent reporting outputs. Performed performance tuning and query optimization to reduce ETL load times and improve reporting SLAs. Supported data governance and metadata management efforts, ensuring lineage and traceability of critical energy data. Environment: Oracle 11g/12c, SQL Server, Hadoop, Spark, Informatica PowerCenter, Python, UNIX Shell Scripting, Power BI, Tableau, Control-M, Git, Teradata, Agile/Scrum. Experis, India | Aug 2013 Dec 2015 Data Consultant Key Responsibilities: Designed and optimized complex SQL/PL-SQL scripts for data validation, analysis, and integration from online quoting systems. Developed and maintained stored procedures, triggers, indexes, and views in Oracle and SQL Server to implement business rules and enable reporting. Built Hive bucketed and partitioned tables to optimize query performance and support large-scale distributed data processing. Wrote MapReduce programs and HiveQL scripts to extract, transform, and load (ETL) data into the Hadoop Distributed File System (HDFS). Used Sqoop for high-performance bulk data transfer between Oracle and Hive for downstream analytics. Assisted in configuring and maintaining Hadoop ecosystem components such as Hive, HBase, and Sqoop. Created and enhanced ETL workflows in Informatica PowerCenter, implementing complex transformations to meet business requirements. Leveraged mapplets and reusable transformations in Informatica to improve standardization and reusability across ETL jobs. Configured parameterized mappings and sessions for dynamic job execution and runtime flexibility. Monitored and troubleshot ETL workflows using Workflow Manager and Workflow Monitor. Defined and enforced metadata, data warehouse standards, and naming conventions to ensure consistency and maintainability. Tuned ETL job performance by resolving target bottlenecks, optimizing queries, and applying pipeline partitioning in Informatica. Environment: Informatica PowerCenter 8.6/9.x, Apache Hadoop, Hive, HBase, MapReduce, Sqoop, Oracle 10g/11g, SQL Server, PL/SQL, UNIX, Shell Scripting, Python (basic scripting), Toad, Windows NT. Oasis Infotech, India | Apr 2011 Jun 2013 ETL Developer Project: Laboratory Information Management System Key Responsibilities: Designed, developed, and maintained ETL workflows using Informatica PowerCenter 8.6/9.0, extracting and transforming data from Oracle databases and flat files into the enterprise data warehouse. Wrote and optimized complex SQL/PL-SQL queries, stored procedures, and triggers to enable reporting and analytics. Automated data ingestion and validation processes with UNIX Shell Scripts and scheduled jobs via cron. Performed data validation, reconciliation, and quality checks, ensuring accuracy between source and target systems. Supported business decision-making by preparing data summaries, performance reports, and ad-hoc analysis using SQL and Excel. Worked on database performance tuning and indexing strategies to optimize ETL and reporting workloads. Partnered with business analysts to translate requirements into scalable ETL and reporting solutions. Investigated and resolved data inconsistencies by tracing upstream feeds and implementing corrections in ETL workflows. Assisted with data migration, backup, and recovery activities during database upgrades and platform transitions. Environment: Oracle 10g/11g, PL/SQL, Informatica PowerCenter 8.6/9.0, UNIX/Linux, Shell Scripting, SQL*Plus, Toad, MS Excel, Windows Server. EDUCATION: IIMT College of Engineering, Greater Noida, India | Jun 2007 May 2011 Bachelor of Technology in Computer Science Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory microsoft mississippi procedural language California New York Ohio Pennsylvania Texas |