| Chetan Yadav A - Lead AI/ML Data Engineer |
| [email protected] |
| Location: Remote, Remote, USA |
| Relocation: Ready to relocate |
| Visa: H1B |
| Resume file: Chetan_Yadav_DE_AI_ML_Engineer_1772033822303.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Chetan Yadav A
Senior AI/ML & Data Engineer [email protected]| (314)763-9001 | LinkedIn Professional Summary Lead AI/ML Data Engineer with 11+ years of experience building AI-ready data platforms and production-grade Generative AI systems across healthcare, financial services, and retail domains. Proven expertise delivering end-to-end Data + AI platforms spanning ingestion, transformation, feature engineering, model deployment, and LLM orchestration across AWS, Azure, and GCP. Hands-on experience implementing Model Context Protocol (MCP) patterns to improve LLM context management, tool integration, and enterprise-grade prompt orchestration. Strong hands-on experience with Databricks, Apache Spark, Delta Lake, and Snowflake enabling scalable Lakehouse architectures and high-throughput data platforms. Experienced in Snowflake-centric ELT architectures leveraging dbt and component-based data integration platforms (including Keboola-style orchestration) to standardize transformations, enforce data validation, and support analytics and ML readiness Designed and deployed enterprise Generative AI solutions including RAG pipelines, semantic retrieval systems, and LLM-powered APIs using LangChain, vector search, and FastAPI. Experienced in building AI microservices, LLM orchestration layers, and production LLM APIs for conversational AI, knowledge discovery, and intelligent automation. Implemented robust MLOps and LLMOps frameworks using MLflow, Docker, Kubernetes, and CI/CD pipelines for model lifecycle governance and observability. Delivered real-time and streaming data pipelines using Kafka, Kinesis, and Spark Structured Streaming for low-latency analytics and AI feature engineering. Built multi-cloud data platforms leveraging AWS-native services, Azure data ecosystems, and GCP analytics tooling for platform portability and resilience. Strong experience in vector search and embeddings-based architectures enabling contextual search, conversational memory, and semantic ranking. Implemented enterprise-grade data governance, lineage, and security controls aligned with HIPAA, GDPR, and compliance-driven environments. Modernized legacy ETL systems into cloud-native Lakehouse architectures improving scalability, reliability, and cost efficiency. Designed metadata-driven ingestion frameworks and reusable data components accelerating enterprise data platform development. Extensive experience designing AI-ready Snowflake data platforms leveraging dbt for modular transformations and scalable ML feature engineering. Built API-first data platforms using FastAPI and REST microservices enabling internal AI platform consumption. Engineered feature pipelines and curated training datasets for ML and Generative AI models, implementing validation, schema enforcement, and data versioning to ensure reproducible model training and inference Collaborated with data scientists, architects, and stakeholders to translate complex business problems into scalable AI and data engineering solutions. Recognized for delivering high-impact AI and data initiatives with measurable improvements in performance, platform stability, and operational efficiency. Technical Skills: Programming & Query Languages Python, Node.js (integration), TypeScript (exposure), PySpark, Scala (working knowledge), Shell Scripting AI / ML & LLM Enablement Generative AI, LLM Pipelines, RAG Architectures, LangChain, Semantic Search, Vector Databases (FAISS), Prompt Engineering, Agentic Workflows, Hugging Face Transformers, OpenAI GPT Models, MLflow, NLP, Feature Engineering, Model Serving APIs, Azure ML, Scikit-learn, TensorFlow Data Engineering & Big Data Apache Spark, Databricks, Delta Lake, Structured Streaming, Apache Kafka, CDC Patterns, Data Contracts, Feature Stores (conceptual), Data Mesh (exposure) Cloud Platforms AWS: Glue, EMR, Lambda, Redshift, S3, DynamoDB, Lake Formation, Kinesis, Bedrock (exposure), IAM Azure: Data Factory, Synapse Analytics, ADLS Gen2, Azure Databricks, Azure OpenAI (exposure), Azure Functions, Azure DevOps GCP: BigQuery, Dataflow, Cloud Storage, Pub/Sub (exposure), Vertex AI (exposure), Dataproc (exposure) Containerization & Orchestration Docker, Kubernetes (EKS) ETL / ELT & Workflow Orchestration Databricks Workflows, AWS Glue, Azure Data Factory, GCP Dataflow (exposure), Airflow (exposure), dbt, Keboola (component-based ELT orchestration), Step Functions, Event-driven Pipelines Databases & Storage Snowflake, Redshift, BigQuery, SQL Server, PostgreSQL, Oracle, MySQL, DynamoDB, MongoDB, Vector Storage Patterns Data Modeling & Warehousing Dimensional Modeling (Star & Snowflake Schemas), ELT Design, Data Lineage, Data Quality Frameworks. Monitoring & Observability AWS CloudWatch, ELK Stack (Elasticsearch, Logstash, Kibana), Grafana, Prometheus. DevOps & Infrastructure Terraform, CloudFormation, Jenkins, GitHub Actions, Azure DevOps, Docker, Kubernetes (EKS/AKS exposure), CI/CD Pipelines Business Intelligence & Analytics Power BI, Tableau, QuickSight, Looker, Excel (Power Query, DAX, Pivot Tables) Security & Governance OAuth 2.0, JWT, Role-Based Access Control (RBAC), Secrets Management, AWS IAM, HIPAA-aligned data controls Version Control & Collaboration Git, GitHub, Bitbucket, GitLab, JIRA, Confluence Methodologies Agile (Scrum, Kanban), Waterfall Certifications Certified Python Developer (PCAP) Microsoft Certified: Python for Data Science Azure AI Fundamentals GCP Professional Data Engineer AWS certified Solutions Architect Professional Databricks Certified Data Engineer Associate Professional Experience: HCA Health Care Rahway, NJ April 2023 Present Role: Senior AI/ML & Data Engineer Responsibilities: Architected and delivered large-scale AWS-native data and AI platforms using S3, Glue, IAM, Lambda, and Kinesis, supporting enterprise healthcare analytics and AI-driven decision workflows. Designed and implemented scalable ingestion and transformation pipelines using AWS Glue, Databricks, and PySpark, enabling reliable processing of structured clinical data and unstructured healthcare documents. Implemented Model Context Protocol (MCP)-based context orchestration patterns to enable structured context injection and tool-aware reasoning for enterprise LLM applications. Built and deployed Generative AI solutions including Retrieval-Augmented Generation (RAG) pipelines to enable intelligent clinical search, knowledge discovery, and automated summarization. Optimized AI model performance through hyperparameter tuning, model evaluation metrics (precision, recall, F1), latency profiling, and inference cost optimization in cloud environments. Implemented unit and integration testing using PyTest for distributed data processing pipelines, improving reliability and regression stability. Designed and integrated Azure OpenAI-based enterprise LLM solutions leveraging secure identity boundaries, RBAC enforcement, and enterprise API integration patterns Developed and deployed supervised and unsupervised ML models using Scikit-learn and PyTorch for classification, anomaly detection, and predictive analytics use cases. Developed LLM-powered APIs using LangChain and FastAPI, enabling secure enterprise consumption of AI capabilities across internal tools and applications. Designed scalable backend services using Python and integrated Node.js-based service patterns to enable AI-powered application workflows and API orchestration in cloud environments. Built and optimized Snowflake-based analytical data models using advanced SQL transformations, clustering strategies, and cost-aware warehouse scaling to support marketing analytics and ML feature engineering workloads. Engineered feature pipelines and curated training datasets for ML and Generative AI models, implementing validation, schema enforcement, and data versioning to ensure reproducible model training and inference. Engineered semantic retrieval platforms using embeddings and vector indexing, enabling contextual search across SOPs, care protocols, and research content. Implemented near real-time streaming pipelines using Kafka and AWS Kinesis to power low-latency analytics and real-time AI feature ingestion. Established strong data governance and security controls using IAM, Glue Catalog, and encryption standards aligned with HIPAA and regulated data environments. Introduced MLOps and LLMOps practices using MLflow for experiment tracking, prompt lifecycle management, and production model governance. Built reusable metadata-driven ingestion frameworks that accelerated onboarding of new datasets across analytics and AI initiatives. Optimized Spark workloads through partition tuning, caching, and adaptive compute strategies, significantly improving runtime performance and cloud efficiency. Implemented observability and monitoring frameworks using QuickSight, Power BI, and cloud-native telemetry to track pipeline health and AI usage. Collaborated with domain experts, architects, and clinical stakeholders to translate complex healthcare workflows into scalable, production-grade AI and data solutions. Key Technologies: AWS (S3, Glue, IAM, Lambda, Kinesis, CloudWatch), Databricks, PySpark, Delta Lake, Snowflake, LangChain, OpenAI APIs, MLflow, FastAPI, Kafka, Docker, Kubernetes, Terraform, Power BI, QuickSight USAA San Antanio, TX September 2021- February 2023 Lead Data Engineer Responsibilities: Led the design and evolution of multi-cloud data platforms across GCP and Azure, supporting enterprise financial analytics, compliance reporting, and modernization initiatives. Built distributed batch pipelines using BigQuery and Dataflow, enabling high-throughput processing of large-scale financial datasets. Designed and implemented streaming ingestion architectures using Pub/Sub and streaming Dataflow jobs for near real-time data availability. Developed reusable Python-based ETL frameworks to standardize ingestion, validation, and transformation patterns across business domains. Implemented high-performance analytical data models in BigQuery using partitioning, clustering, and cost-aware query optimization strategies. Enabled downstream analytics integrations using Azure Data Factory and Synapse, supporting enterprise reporting and BI teams. Implemented secure multi-cloud access patterns using GCP IAM and Azure RBAC, ensuring alignment with enterprise governance and compliance standards. Containerized distributed data workloads using Kubernetes, improving scalability, portability, and deployment consistency. Automated infrastructure provisioning using Terraform, enabling repeatable multi-environment deployments. Delivered platform-wide optimizations across compute, storage, and query layers, improving performance and operational efficiency. Authored data contracts, ingestion standards, and operational playbooks that improved platform governance and maintainability. Partnered with architects and business stakeholders to align data platform capabilities with enterprise cloud modernization strategies. Key Technologies: GCP (BigQuery, Dataflow, Pub/Sub, Cloud Storage, IAM), Azure (Data Factory, Synapse, RBAC), Apache Spark, Python, Terraform, Kubernetes, Snowflake, Airflow (exposure), CI/CD. Kroger, Houston TX Jan 2021 August 2021 Senior Python Developer Responsibilities: Built and maintained Azure-based data platforms using Azure Data Factory, Azure Databricks, and ADLS Gen2 to support retail analytics and operational reporting at scale. Developed scalable batch and incremental pipelines using PySpark and Delta Lake, processing POS, inventory, and IoT datasets for downstream analytics and forecasting. Designed and implemented RESTful data services using FastAPI and Flask to expose curated datasets and enable real-time integrations across retail systems. Collaborated on React and TypeScript-based frontend components, integrating RESTful APIs and AI-powered backend services to deliver responsive, intelligent retail applications. Supported frontend-backend integration patterns including state management, routing, and component-driven architecture to enable complex application workflows. Implemented event-driven ingestion workflows using Azure Functions and messaging services to process partner and vendor data feeds. Engineered feature pipelines supporting demand forecasting and pricing analytics models, ensuring consistency between training and inference datasets. Supported ML lifecycle governance using MLflow, enabling reproducible experiments and improved model traceability. Established secure data access patterns using Azure Key Vault and RBAC, improving enterprise data protection and secrets management. Built interactive Power BI dashboards delivering insights into supply chain performance, inventory optimization, and operational KPIs. Automated CI/CD pipelines using Azure DevOps and Jenkins, improving deployment reliability for data pipelines and APIs. Optimized data workflows using partitioning and parallel execution strategies, improving pipeline throughput and stability. Collaborated with product managers, analysts, and platform teams to deliver scalable data solutions aligned with fast-paced retail environments. Provided production support and performance tuning, ensuring high availability and reliability of critical data services. Key Technologies: Azure (Data Factory, ADLS Gen2, Azure Databricks, Key Vault, Azure Functions), Python, PySpark, Delta Lake, FastAPI, Flask, MLflow (exposure), Power BI, Azure DevOps, Jenkins. BNY MELLON Technologies Pvt. Ltd, Chennai July 2014 December 2019 Data Engineer Responsibilities: Developed enterprise data pipelines using Python and SQL to support treasury, risk, and regulatory reporting platforms in highly governed financial environments. Automated manual reporting workflows using Python and VBA, significantly improving data accuracy, auditability, and operational efficiency. Designed complex SQL transformations, stored procedures, and reconciliation logic for large-scale financial datasets. Worked with component-based data integration platforms (including Keboola-style orchestration frameworks) to manage modular ETL workflows, dependency tracking, and standardized data transformations across financial reporting pipelines. Built hybrid ETL pipelines integrating Python, SQL Server, and legacy tooling to streamline data ingestion and transformation workflows. Supported early cloud modernization initiatives, including migrations to Azure SQL and Azure Data Factory ingestion patterns. Designed semantic reporting layers and optimized data models to improve BI usability and query performance. Participated in Snowflake adoption initiatives, helping transition legacy reporting workloads to modern analytical warehousing platforms. Implemented data validation and reconciliation frameworks ensuring accuracy and consistency of regulatory submissions. Built automated monitoring scripts and alerting mechanisms to improve data reliability and SLA adherence. Collaborated with finance, audit, and risk teams to deliver compliant, audit-ready data solutions. Investigated production issues and performed root cause analysis, improving reliability of mission-critical reporting systems. Contributed to documentation and knowledge transfer, improving maintainability and onboarding of new engineering resources. Environment: Python, SQL, PL/SQL, SQL Server, Azure (ADF, Azure SQL), Snowflake, Tableau, Power BI, Airflow, Talend, Excel VBA. Educational Details: Master s of Science in Advanced Data Analytics Saint Louis University 2021 Bachelor s of Commerce in Computer Science Osmani University , Hyderabad 2014 Keywords: continuous integration continuous deployment artificial intelligence machine learning javascript business intelligence sthree procedural language New Jersey Texas |