| Lakshmi Tejaswani - Senior Data Engineer |
| [email protected] |
| Location: Hicksville, New York, USA |
| Relocation: yes |
| Visa: GC EAD |
| Resume file: Lakshmi Tejaswani_Data Engineer_Uptd_1775393836485.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Senior Data Engineer
Lakshmi Tejaswani Phone: +1 (203)-564-9583 Email: [email protected] PROFESSIONAL SUMMARY: Data Engineer with 9+ years of experience designing, building, and optimizing scalable data platforms and pipelines across financial services, retail, insurance, and healthcare domains. Proficient in cloud-based data engineering, leveraging AWS (S3, Redshift, EMR, Glue, Lambda, Kinesis) and Azure (ADLS, Databricks, Synapse, Data Factory, AKS, Key Vault) for enterprise-grade solutions. Experienced in Big Data ecosystems including Hadoop, Spark, Hive, Kafka, Airflow, Dataproc, Dataflow, and Databricks for high-volume data processing. Skilled in ETL/ELT design, lakehouse architectures, and modern data warehouse solutions for structured and semi-structured datasets. Built real-time streaming pipelines for fraud detection, operational analytics, claims monitoring, and marketing performance using Kafka, Spark Streaming, and Kinesis. Hands-on with Python, PySpark, Spark SQL, SQL, Scala, PL/SQL, T-SQL, shell scripting, and Java for complex data transformations and automation. Implemented CI/CD pipelines, Docker containerized workloads, and Kubernetes/AKS deployments for scalable data applications. Designed dimensional and relational data models, including star and snowflake schemas, fact and dimension tables, for OLAP/OLTP reportingX. Ensured data quality, lineage, governance, and regulatory compliance (HIPAA, GDPR, HITECH, financial regulations). Migrated legacy data warehouses from Oracle and Teradata to Redshift, Snowflake, and Azure Synapse, improving reporting and analytics efficiency. Developed multi-terabyte ETL pipelines with optimized Spark jobs using partitioning, caching, and cluster tuning, reducing runtimes by 30 40%. Built incremental load and CDC pipelines to enable near real-time dashboards and alerts for claims, underwriting, and transactions. Delivered analytical datasets, dashboards, and reports using Power BI, Tableau, and QuickSight for executive and operational insights. Automated workflow orchestration and monitoring using Apache Airflow, AWS Step Functions, and Azure Data Factory pipelines. Managed metadata, cataloging, and data lineage using AWS Glue Data Catalog and Azure Purview to enforce enterprise governance. Integrated structured, semi-structured, and unstructured data from multiple sources (APIs, CSV, JSON, XML, Avro, Parquet) for analytics and ML workloads. Partnered with business, actuarial, fraud, and analytics teams to translate KPIs into actionable, scalable data models. Mentored junior engineers on Spark optimization, Azure/AWS best practices, and enterprise data architecture standards. Applied machine learning, statistical modeling, and predictive analytics using Python, R, and Azure ML for operational insights and risk mitigation. Strong experience in Agile/Scrum methodologies, collaborating with cross-functional teams to deliver business-impacting data solutions. TECHNICAL SKILLS: Languages Python 3.x, SQL, PL/SQL, T-SQL, Java, C, C++, ASP, Visual Basic, HTML5, XML, PERL, UNIX Shell Scripting Database Oracle 19c/12c, Microsoft Access 2016, SQL Server 2019, Sybase, DB2, Teradata r15, Hive 2.3 Data Modeling Tools Erwin 9.7, ER/Studio, Star-Schema Modeling, Snowflake Schema Modeling, FACT and dimension tables, ETL Informatica, Pivot Tables. BI Tools Tableau, Tableau server, Tableau Reader, SAP Business Objects, SSIS, SSRS, Crystal Reports Azure Data Lakes, Data Factory, SQL Data warehouse, Data Lake Analytics, Databricks, Synapse, Blob storage, and other Azure services. Cloud AWS (S3, EMR, EC2, Glue, Redshift, Athena) IAM, EMR, Kinesis, VPC, Dynamo DB, Redshift, Amazon RDS, Lambda, DMS, Quick Sight, Amazon Elastic Load Balancing, Auto Scaling, CloudWatch, SNS, SQS Applications Toad for Oracle, Oracle SQL Developer, MS Word 2017, MS Excel 2016, MS Power Point 2017, Teradata r15. Big Data Hadoop, Spark, Hive, Cassandra, MongoDB, MapReduce, Sqoop. Methodologies Agile, RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Ralph Kimball and Bill Inmon, Waterfall Model. Operating Systems Microsoft Windows 9x / NT / 2000/XP / Vista/7/8/10 and Unix. PROFESSIONAL EXPERIENCE: Truist Financial | Charlotte, North Carolina April 2025 - Present Senior Data Engineer Project: Real-Time Fraud Detection & Risk Analytics Platform (Azure Lakehouse Architecture) Roles & Responsibilities: Led the design and implementation of a real-time fraud detection and risk analytics platform, processing millions of daily financial transactions with low latency. Designed and built an end-to-end Azure Lakehouse solution integrating ADLS Gen2, Azure Databricks, Synapse Analytics, and Power BI to support scalable fraud monitoring and reporting. Utilized Developed high-throughput streaming pipelines using Kafka and Databricks (PySpark, Spark SQL) to enable near real-time transaction scoring and anomaly detection Improved ETL performance by 40% through Spark optimization techniques including partitioning, caching, and cluster tuning. Implemented Bronze, Silver, and Gold data layers to standardize ingestion, transformation, and analytics-ready datasets across structured and semi-structured sources. Built graph-based fraud detection workflows using TigerGraph (GSQL) to identify suspicious account relationships and transaction networks. Developed scalable ELT pipelines in Azure Data Factory to integrate core banking systems, APIs, and third-party data sources into ADLS. Implemented data validation and reconciliation frameworks to ensure accuracy and consistency across ingestion, staging, and warehouse layers. Designed analytical datasets in Azure Synapse to support fraud investigation teams and compliance reporting. Optimized TimescaleDB for time-series transaction analysis using hypertables, compression policies, and continuous aggregates Automated infrastructure provisioning using ARM templates and Ansible to maintain consistency across Dev, QA, and Production environments. Containerized data services using Docker and deployed workloads on Azure Kubernetes Service (AKS) using Helm charts. Established monitoring and alerting mechanisms to proactively detect pipeline failures and performance bottlenecks. Integrated semi-structured data formats such as JSON, Avro, and Parquet into scalable Spark pipelines. Enhanced Hive and Spark SQL query performance by implementing partitioning and bucketing strategies on large datasets. Analyzed transaction log patterns using Python libraries to identify anomaly trends and strengthen fraud detection strategies. Collaborated closely with fraud analysts, risk teams, and business stakeholders to translate regulatory requirements into scalable data solutions. Optimized Databricks cluster utilization to reduce compute costs while maintaining SLA commitments. Participated in Agile development cycles including sprint planning, technical reviews, and production deployments. Provided guidance to junior engineers on Spark optimization and Azure data engineering best practices. Environment: Azure (ADLS Gen2, Blob Storage, Azure Data Factory, Azure Databricks, Synapse Analytics, Azure Kubernetes Service (AKS), Azure Key Vault, Azure API Management, Azure VMs), Kafka, Spark, PySpark, Spark SQL, Spark Streaming, TigerGraph (GSQL), TimescaleDB, SQL Server (Always On), Hadoop, Hive, Sqoop, Python, Scala, SQL, Docker, Kubernetes, Helm, Ansible, Power BI, Agile (Scrum). Target | Minneapolis, Minnesota May 2023 - March 2025 Senior Data Engineer Project: Enterprise Retail Analytics & Marketing Performance Platform (Azure Modern Data Warehouse) Responsibilities: Led the development of a centralized Azure-based analytics platform to unify sales, marketing, and digital campaign data across multiple business units. Designed and implemented scalable ETL/ELT pipelines using Azure Data Factory and Azure Databricks to ingest data from POS systems, e-commerce platforms, APIs, and third-party marketing vendors Built a modern cloud data warehouse solution using Azure Data Lake and Azure Synapse to support enterprise reporting and advanced analytics. Processed multi-terabyte retail datasets using Databricks (PySpark, Spark SQL) to generate customer insights, campaign attribution metrics, and revenue analytics. Improved campaign performance visibility by developing curated datasets to evaluate Daily Deals PLA ads and measure ROI uplift compared to non-promotional items. Optimized Spark workloads by tuning joins, partitioning strategies, and cluster configurations, reducing data processing time by 35% Implemented data governance and cataloging using Azure Purview, enabling enterprise-wide data lineage, classification, and regulatory compliance (GDPR/HIPAA). Designed dimensional data models (star schema) to support OLAP reporting and self-service analytics. Built analytical data marts in Azure Synapse to support executive dashboards and KPI tracking. Developed Databricks notebooks to cleanse, transform, and aggregate structured and semi-structured datasets stored in Azure Data Lake. Integrated data from relational databases, flat files, Snowflake, and Teradata systems into unified reporting layers. Automated infrastructure provisioning using ARM templates and Terraform, enabling Infrastructure as Code (IaC) best practices. Automated infrastructure provisioning using ARM templates and Terraform, enabling Infrastructure as Code (IaC) best practices. Implemented version-controlled Terraform deployments integrated with Git for environment consistency and CI/CD automation. Developed complex SQL and PL/SQL queries, stored procedures, and performance-optimized transformations for high-volume retail datasets. Enhanced query performance by implementing indexing strategies and partitioning on large analytical tables. Established data quality validation checks to ensure accuracy, completeness, and consistency across reporting pipelines. Partnered with product managers and marketing teams to translate business KPIs into scalable Azure data models Collaborated with senior leadership to align cloud data architecture with strategic business objectives. Enabled near real-time data availability for analytics teams through optimized incremental data loading strategies Standardized data ingestion frameworks to handle CSV, Avro, and other retail data formats efficiently. Improved operational monitoring of data workflows using logging and alerting mechanisms. Contributed to Agile ceremonies including sprint planning, backlog grooming, and release management. Mentored junior engineers on Azure best practices, Spark optimization, and data warehouse design principles. Reduced manual reporting efforts by enabling automated dashboards and self-service BI capabilities. Supported enterprise analytics initiatives by preparing clean, analytics-ready datasets for data science and forecasting models. Environment: Azure (Azure Data Factory, Azure Databricks, Azure Data Lake, Synapse Analytics, Azure Purview), Apache Spark, PySpark, Spark SQL, SQL, PL/SQL, Snowflake, Teradata, Terraform, ARM Templates, Git, Python, Scala, ETL/ELT, JIRA, Agile. Homesite Insurance | Boston, Massachusett July 2021 - April 2023 Data Engineer Project: Real-Time Insurance Policy & Claims Analytics Platform (AWS Modern Data Lake) Responsibilities: Built a centralized AWS data platform integrating S3, Glue, EMR, Redshift, Lambda, and Kinesis to consolidate policy, claims, and customer datasets for analytics. Developed scalable ETL/ELT pipelines using AWS Glue and PySpark, ingesting structured and semi-structured data (JSON, XML, Avro, Parquet) from multiple sources. Automated workflow orchestration using AWS Step Functions, Lambda, and Airflow to ensure reliable daily pipeline execution with alerting. Migrated legacy Oracle data warehouse workloads to Redshift, improving query performance and reporting efficiency. Optimized Spark jobs on EMR clusters through partitioning, caching, and cluster tuning to reduce ETL runtime. Designed incremental loading and CDC pipelines to support near real-time claims and underwriting dashboards. Implemented AWS Glue Data Catalog for metadata management, data lineage, and compliance with GDPR requirements. Developed Hive external tables on S3 for analytics and downstream reporting, standardizing the data lake structure. Created Dockerized backend services deployed on Kubernetes for scalable API-driven data applications. Enforced data security and governance using IAM roles, encryption (KMS), and access controls across AWS environments. Collaborated with underwriting, claims, and actuarial teams to translate business KPIs into optimized cloud data models. Built analytical Redshift data marts supporting executive dashboards, reporting, and predictive insurance modeling. Monitored pipeline health, performance, and failures using logging and automated alerts to ensure SLA compliance. Implemented Infrastructure as Code (IaC) using Terraform for consistent and reproducible AWS environment deployments. Mentored junior engineers on AWS best practices, Spark optimization, and data lake architecture for enterprise-scale data workflows. Developed end-to-end data pipelines to aggregate claims, underwriting, and customer interaction data for fraud detection and operational insights. Implemented near real-time streaming data ingestion using Kinesis and Spark Streaming to enable proactive alerts on suspicious claims. Conducted performance tuning of Redshift and EMR clusters, reducing report generation time by 30% for executive dashboards. Created reusable PySpark modules for data cleansing, transformation, and enrichment to standardize ETL workflows across multiple domains. Implemented version-controlled CI/CD pipelines for ETL scripts using Git and Terraform, improving deployment reliability. Established logging, monitoring, and alerting frameworks across AWS data workflows to ensure SLA adherence and early anomaly detection. Collaborated with security teams to implement encryption, access control, and audit logging for sensitive policy and claims data. Environment: Azure (Data Factory, Databricks, Data Lake, Synapse Analytics, Purview), Apache Spark, PySpark, Spark SQL, SQL, PL/SQL, Snowflake, Teradata, Terraform, ARM Templates, Git, Python, Scala, ETL/ELT, JIRA, Agile (Scrum). Abbott Laboratories | Abbott Park, Illinois Feb 2019 - June 2021 Data Engineer Project: Enterprise Healthcare Data & Analytics Platform (AWS Modern Data Lake & ML Integration) Responsibilities: Developed scalable ETL/ELT pipelines using Python, Spark, and SQL to process large-scale healthcare datasets including claims, eligibility, provider, member, and clinical data. Built and optimized AWS-based data workflows using S3, Redshift, EMR, Glue, Athena, and Kinesis to support analytics and reporting at global scale. Automated serverless ETL orchestration using AWS Lambda, Step Functions, and Airflow to ensure reliable daily and real-time data processing. Designed multi-petabyte data platform architectures, integrating structured and semi-structured data into a centralized data lake for analytics and compliance. Integrated AWS SageMaker ML pipelines into production ETL for predictive analytics, recommendation engines, and anomaly detection in healthcare operations. Conducted ETL tasks using Sqoop, Hive, and Pig to efficiently transfer and integrate legacy and third-party healthcare data. Implemented CI/CD pipelines using AWS CodePipeline, improving deployment reliability and reducing manual intervention for ETL processes. Ensured data quality, validation, and governance aligned with HIPAA, HITECH, and organizational standards, maintaining regulatory compliance. Developed Databricks-based pipelines for automated data transformation, cleansing, and enrichment, standardizing datasets for downstream analytics. Managed high-performance databases (MySQL, DynamoDB, Redshift) for optimized storage, retrieval, and analytics workloads. Monitored and optimized system performance using AWS CloudWatch, EMR cluster tuning, and Spark job optimization for faster reporting. Created interactive dashboards and visualizations using Amazon QuickSight, enabling self-service reporting and actionable business insights. Documented and maintained technical processes and Scala/PySpark pipeline code for knowledge transfer and onboarding of new team members. Collaborated with cross-functional teams including data science, analytics, and IT security to translate business requirements into scalable AWS data solutions. Mentored junior engineers on best practices in AWS architecture, Spark optimization, ETL design, and enterprise-scale data engineering workflows. Designed incremental and change data capture (CDC) pipelines to enable near real-time updates of patient, claims, and provider datasets. Standardized semi-structured healthcare data formats (JSON, XML, Avro, Parquet) across pipelines to improve processing efficiency and analytics readiness. Implemented role-based access controls and encryption (KMS) across AWS environments to secure sensitive healthcare data and maintain compliance. Developed reusable PySpark modules for data cleansing, transformation, and aggregation to accelerate development of new ETL workflows. Collaborated with business stakeholders to define KPIs and translate healthcare operational requirements into scalable data models for reporting and predictive analytics. Optimized Redshift and EMR cluster performance by fine-tuning queries, partitions, and resource configurations, reducing report generation time and improving analytics efficiency. Environment: Python, AWS (S3, Redshift, EMR, Glue, Lambda, Step Functions, Athena, Kinesis, CodePipeline, CloudWatch, SageMaker), Spark, PySpark, Scala, Hadoop, Snowflake, EKS, ECS, Ansible, MySQL, DynamoDB, BitBucket, Pandas, NumPy, JSON, XML, Agile, Scrum, JIRA. Edvensoft Solutions India Pvt. Ltd | India Aug 2016 - Sept 2018 ETL Developer Project: Enterprise Data Integration & ETL Automation Platform (Hybrid Data Warehouse) Responsibilities: Developed end-to-end ETL pipelines to ingest, transform, and load data from multiple relational, flat file, and API sources into central data warehouses. Integrated structured and semi/unstructured data from SQL, Oracle, SAP, and third-party systems into a unified reporting environment. Designed and implemented complex transformation logic including joins, aggregations, conditional splits, lookups, and calculations for analytics-ready datasets. Automated batch ETL workflows using Informatica PowerCenter, Talend, SSIS, Control-M, and Autosys to reduce manual intervention and improve reliability. Built reusable ETL modules and templates, standardizing processes across projects and accelerating development cycles. Designed fact and dimension tables with proper normalization, indexing, and partitioning to optimize query performance in the warehouse. Implemented data validation, cleansing, and reconciliation frameworks to ensure accuracy, consistency, and completeness of ETL output. Developed audit trails and lineage documentation for ETL processes to support regulatory compliance and internal governance. Enforced data security and access controls within databases and ETL workflows, safeguarding sensitive organizational data. Created parameterized ETL jobs for dynamic processing across multiple environments, improving scalability and maintainability. Integrated hybrid ETL pipelines combining traditional RDBMS and emerging big data platforms for enterprise-wide analytics. Collaborated with business stakeholders to translate requirements into scalable ETL solutions for reporting and decision-making. Developed high-performance ETL pipelines to process heterogeneous datasets, ensuring timely availability of analytics-ready data. Streamlined development and deployment by building modular ETL components reusable across multiple projects. Monitored ETL execution, tracked errors, and implemented logging mechanisms to ensure operational efficiency and SLA compliance. Environment: Informatica PowerCenter, Talend, SSIS (SQL Server Integration Services), DataStage, Control-M, Autosys, Cron, SQL Server, Oracle, MySQL, PostgreSQL, SAP HANA, Unix/Linux Shell Scripting, SQL. Keywords: cprogramm cplusplus continuous integration continuous deployment quality analyst machine learning business intelligence sthree database rlang information technology microsoft mississippi procedural language |