Home

Satvik Jonnalagadda - Senior AI&ML Engineer
[email protected]
Location: Remote, Remote, USA
Relocation: Yes
Visa: GC Holder
PROFESSIONAL SUMMARY
Senior AI/ML Engineer with 11+ years of IT experience, including 8+ years in data engineering and 5+ years shipping production machine learning systems. The most recent focus is on Generative AI, RAG, and agentic systems in HIPAA-regulated healthcare at AbbVie. Career started in traditional data engineering with SSIS, Informatica, and warehouse modeling, then grew with the discipline as it evolved: cloud lakehouses on AWS, Azure, and GCP, large-scale Spark and Kafka pipelines, classical ML in production, deep learning in computer vision and NLP, and now LLM-driven agentic systems. Deep expertise in Python, PySpark, FastAPI, LangChain, LangGraph, PyTorch, and TensorFlow, with end-to-end MLOps and LLMOps lifecycle via MLflow, Weights & Biases, SageMaker, Vertex AI, Docker, Kubernetes, Terraform, and CI/CD. Current AbbVie work covers multimodal RAG with GPT-4o and Claude vision, LangGraph agent orchestration with MCP and native tool/function calling, AWS Bedrock managed fine-tuning on clinical corpora, GPU-accelerated medical-imaging inference, LLM evaluation with LangSmith and Ragas, and Responsible-AI guardrails for clinical documents. Experienced mentoring engineers, leading technical design sessions, and driving AI/ML architecture in fast-paced, cross-functional, matrixed environments.
TECHNICAL SKILLS
GenAI & Agentic Systems:
LLMs, LangChain, LangGraph, Hugging Face, RAG, Vector DBs, Agent Orchestration, Tool/Function Calling, MCP, Prompt Engineering, In-Context Learning, LLM Evals (LangSmith, Ragas), Guardrails / Responsible AI, Managed Foundation Model Fine-tuning (AWS Bedrock)
ML & Deep Learning:
PyTorch, TensorFlow, Scikit-learn, XGBoost, LightGBM, BERT/Transformers, spaCy, NLP, Computer Vision, CNN/LSTM, Recommender Systems, Time-Series Forecasting (ARIMA, Prophet), Feature Engineering, Optuna, SHAP
MLOps & LLMOps:
MLflow, Weights & Biases, SageMaker, Vertex AI, Azure ML, Model Registry, Experiment Tracking, Prompt Versioning, Eval Pipelines, Model Monitoring & Drift Detection, NVIDIA Triton, CUDA / GPU Optimization, TensorRT
Languages:
Python, SQL (T-SQL, PL/SQL), PySpark, Spark SQL, Bash
Data Engineering & Streaming:
Apache Spark, Delta Lake, Databricks, dbt, Airflow, Apache Kafka, Spark Structured Streaming, AWS Kinesis, GCP Pub/Sub, Hadoop, Hive, Sqoop
Cloud Platforms:
AWS: S3, Glue, EMR, SageMaker, Lambda, EKS, Bedrock, Redshift, Kinesis Azure: ADF, ADLS Gen2, Synapse, Databricks, Azure ML, DevOps GCP: BigQuery, Vertex AI, Dataflow, Pub/Sub, GKE, Cloud Run
Databases & Storage:
PostgreSQL, SQL Server, Snowflake, MongoDB, DynamoDB, Cassandra, Redis, ElasticSearch
API, CI/CD & Infrastructure:
FastAPI, Flask, REST APIs, Docker, Kubernetes (EKS, AKS, GKE), Terraform, Git, GitHub Actions, Jenkins, Azure DevOps, Helm
BI & Testing:
Tableau, Power BI, Plotly, Matplotlib, Pytest, Great Expectations, LLM Eval Frameworks
PROFESSIONAL EXPERIENCE
Senior AI/ML EngineerFeb 2025 Present
AbbVie | Vernon Hills, IL
Designed and shipped a HIPAA-aware, agentic RAG platform over 500K+ biomedical documents (clinical trial protocols, FDA submissions, internal study reports) using LangChain, LangGraph, OpenAI GPT-4o, and Anthropic Claude with hybrid retrieval, reranking, and citation-grounded responses across Pinecone, FAISS, and pgvector indexes, cutting manual literature review effort by ~60% for clinical research and pharmacovigilance teams.
Built multi-agent orchestration with LangGraph and Model Context Protocol (MCP) routing compound clinical questions through specialized retrieval, summarization, structured-data lookup, and target-discovery tools via tool/function calling, enabling researchers to ask multi-step questions across unstructured EHR notes, curated trial data, and biomedical knowledge graphs.
Implemented multimodal RAG pipelines combining radiology reports, pathology images, and structured EHR fields through vision-language LLMs (GPT-4o, Claude) with Hugging Face embedding models, supporting drug-discovery and clinical research workflows where text alone misses signal buried in scans and figures.
Stood up a production LLM evaluation harness using LangSmith, Ragas, and custom eval frameworks that measure retrieval precision/recall, citation faithfulness, hallucination rate, and clinical safety regressions, gating every prompt and model upgrade through versioned eval suites with prompt versioning before promotion.
Engineered LLM guardrails and Responsible-AI controls for HIPAA-regulated workloads, including PHI detection and de-identification, prompt-injection filters, output safety classifiers, and human-in-the-loop review gates, keeping every GenAI surface in clinical apps audit-ready.
Customized foundation models via AWS Bedrock managed fine-tuning on AbbVie's curated biomedical and adverse-event corpora, lifting downstream extraction F1 by ~15% on AbbVie-specific entities vs. off-the-shelf baselines without managing GPU training infrastructure.
Trained and shipped production NLP models on PyTorch, TensorFlow, and BERT/Transformers, including Transformer-based NER and multi-label document classification on EHR data, with spaCy preprocessing pipelines integrated into enterprise document workflows processing 20K+ records weekly.
Deployed medical-imaging deep-learning models (CNN, Vision Transformers) for clinical image triage on NVIDIA Triton with CUDA-accelerated GPU optimization on EKS, serving scalable batch and real-time inference across regulated clinical imagery.
Engineered FHIR-based clinical data lakes with PHI de-identification logic in PySpark on AWS S3, Glue, and Delta Lake, enabling safe use of clinical data for ML training, fine-tuning, and RAG indexes.
Stood up real-time clinical event monitoring pipelines using Kafka, AWS Kinesis, and Spark Structured Streaming over EHR and genomic event streams, delivering sub-minute latency with schema-registry enforcement, dead-letter queue handling, and idempotent processing into Delta Lake and Snowflake orchestrated by Apache Airflow and dbt.
Built the full MLOps/LLMOps lifecycle on SageMaker, MLflow, Weights & Biases, EKS, Docker, Helm, Terraform, and GitHub Actions, covering experiment tracking, prompt versioning, model monitoring, drift detection, eval pipelines, staged deployment, and reliable rollback, which shrank release cycles from days to hours and maintained 99.5%+ pipeline uptime.
Designed a Patient 360 platform fusing EHR, pharmacy, claims, and trial data on PostgreSQL, DynamoDB, Redis, and ElasticSearch, powering AI-assisted care and research recommendations exposed via FastAPI microservices with Great Expectations and Pytest data-quality and regression coverage.
Wired ML and GenAI predictions into three internal clinical applications via FastAPI; published Tableau and Power BI dashboards (with Plotly for ad-hoc analyses) tracking pipeline SLAs, model accuracy, RAG retrieval quality, and clinical KPIs for product and leadership stakeholders.
Mentored junior engineers through code reviews, architectural guidance, and knowledge-sharing sessions; led AI/ML solution design with data engineers, architects, scientists, and clinical stakeholders in an Agile, cross-functional matrixed environment.
Environment: Python, PySpark, Apache Spark, Kafka, AWS Kinesis, Airflow, dbt, Delta Lake, Snowflake, AWS (S3, Glue, SageMaker, Lambda, EKS, Bedrock), PyTorch, TensorFlow, BERT, Hugging Face, LangChain, LangGraph, MCP (Model Context Protocol), OpenAI GPT-4o, Anthropic Claude (Multimodal), FAISS, Pinecone, pgvector, AWS Bedrock Managed Fine-tuning, MLflow, Weights & Biases, LangSmith, Ragas, NVIDIA Triton, CUDA, Docker, Kubernetes (EKS), Helm, Terraform, FastAPI, PostgreSQL, DynamoDB, Redis, ElasticSearch, Tableau, Power BI, Plotly, GitHub Actions, Great Expectations, Pytest, HIPAA, FHIR
AI/ML EngineerOct 2022 Jan 2025
John Deere | Urbandale, IA
Trained and deployed deep-learning computer-vision models (CNN, Vision Transformers) for in-field weed detection, crop-health classification, and equipment-perimeter awareness on millions of frames from connected agricultural machinery, supporting precision herbicide application and reduced chemical use across global fleets.
Built equipment failure prediction and driver-behavior classification models with Scikit-learn, TensorFlow, PyTorch, XGBoost, and LightGBM on Databricks; reduced unplanned downtime through proactive service scheduling across connected agricultural fleets.
Established real-time streaming feature pipelines using Kafka and Spark Structured Streaming over Avro-encoded sensor data from 10K+ active machines with schema-registry enforcement and idempotent consumer design; maintained an ADLS Gen2 + Delta Lake lakehouse as the central ML feature store.
Built Azure ADF and PySpark ingestion pipelines processing 50M+ daily IoT telemetry events in JSON, Avro, and Parquet formats from connected equipment, feeding predictive-maintenance, yield-optimization, and route-planning models.
Produced K-Means customer segmentation and collaborative-filtering recommender systems ranking implement and parts upsells for dealer-portal personalization across millions of customer-equipment interactions; commercial teams used outputs to personalize service plans and parts replenishment.
Developed time-series forecasting models (Prophet, ARIMA, LSTM, LightGBM) for parts demand and seasonal service planning across regional fleets, supporting supply-chain decisions for global parts distribution.
Launched NLP pipelines using spaCy and BERT for sentiment analysis and issue classification on service-technician reports and dealer feedback; surfaced insights via Power BI and Tableau dashboards used by product and aftermarket teams.
Optimized deep-learning inference for edge and in-cab deployment using TensorRT and CUDA on NVIDIA GPUs, hitting sub-second latency for autonomous decision loops on agricultural equipment.
Automated the full MLOps lifecycle on Azure DevOps, MLflow, and Azure ML, covering experiment tracking, Optuna-driven hyperparameter tuning, SHAP-based explainability, model registry, scheduled retraining, drift detection, and staged deployment, which cut time-to-production by ~50%.
Integrated GCP BigQuery, Vertex AI, Dataflow, and Pub/Sub for cross-cloud SQL analytics and managed ML workflows; wired model predictions into internal fleet-management apps via FastAPI microservices on Docker and Kubernetes (AKS and GKE) and Cloud Run, exercising AWS, Azure, and GCP within the same architecture.
Environment: Python, PySpark, Spark SQL, Apache Spark, Kafka, Azure (ADF, ADLS Gen2, Synapse, Databricks, DevOps, Azure ML), GCP (BigQuery, Vertex AI, Dataflow, Pub/Sub, GKE, Cloud Run), Delta Lake, TensorFlow, PyTorch, Scikit-learn, XGBoost, LightGBM, spaCy, BERT, CNN/LSTM, Vision Transformers, Optuna, SHAP, MLflow, TensorRT, CUDA, Docker, Kubernetes (AKS, GKE), FastAPI, Power BI, Tableau, Agile/Scrum
Data ScientistMay 2020 Sep 2022
Edward Jones | St. Louis, MO
Built XGBoost-based client attrition / churn prediction models on 2M+ client records, engineering behavioral features from transaction history, advisor touchpoints, and portfolio activity; SHAP-based explanations surfaced top retention drivers, improving early-warning targeting for advisor outreach by ~30%.
Developed K-Means and RFM-based client segmentation producing 8 actionable personas across the 2M+ client base; segmentation powered tailored marketing journeys and advisor-engagement plays, lifting campaign response rates by ~25%.
Trained product propensity and next-best-action classifiers (XGBoost, LightGBM, logistic regression) ranking advisor leads by likelihood to adopt specific financial products, deployed as weekly batch-scoring jobs on Azure Databricks to feed advisor-facing tooling that informed real-time client conversations.
Produced time-series forecasting models for client asset flows, AUM trajectories, and campaign uptake using ARIMA, Prophet, and gradient-boosted regressors; outputs fed regional leadership dashboards used in quarterly planning.
Designed an A/B test analysis framework in PySpark to measure campaign and outreach lift, computing statistical significance, effect sizes, and segment-level cuts; results gated go/no-go decisions on retention and cross-sell campaigns.
Stood up the modeling lifecycle on Azure Databricks and MLflow, covering feature pipelines, experiment tracking, Optuna-driven hyperparameter tuning, model registry, and scheduled retraining, which cut model handoff and redeployment friction across the data-science team and reduced time-to-production by ~40%.
Built NLP pipelines (Scikit-learn, Pandas, classical text features) for issue classification and topic modeling on advisor call-center notes and service complaints, feeding compliance-oriented monitoring workflows.
Migrated legacy Hive batch workloads into optimized PySpark and Spark SQL transformations on Azure Databricks, cutting nightly feature-build runtimes from 6+ hours to under 90 minutes and unblocking faster model iteration.
Consolidated data from Snowflake, SQL Server, and Oracle into a unified ADLS Gen2 + Synapse platform, retiring 12+ fragmented data silos and standardizing the data foundation for downstream modeling.
Wrapped scoring services in Flask APIs with Pytest coverage for internal model consumers; partnered with marketing, compliance, and advisor-enablement teams to translate business problems into modeling specs and present results to non-technical stakeholders, including risk and compliance reviewers; published self-service Tableau dashboards (with Matplotlib for analyses) tracking model performance, campaign uptake, and segmentation health used by 20+ analysts and senior leaders.
Environment: Python, PySpark, Spark SQL, Apache Spark, XGBoost, LightGBM, Scikit-learn, SHAP, Optuna, MLflow, Pandas, NumPy, Matplotlib, Azure Databricks, ADLS Gen2, Synapse, Snowflake, Hadoop, Hive, SQL Server, Oracle, Flask, Pytest, Tableau, Git
Data EngineerJan 2018 Apr 2020
Sam's Club (Walmart) | Arizona
Architected an AWS-based data lake (S3 + EMR + Spark) processing 100M+ weekly retail transaction records across hundreds of stores, replacing an on-prem Hadoop cluster and cutting infrastructure costs.
Wired up AWS Glue and Lambda ingestion pipelines pulling from 15+ data sources (POS, third-party vendors, and loyalty platforms), with Apache Kafka streaming for near-real-time inventory and member analytics.
Trained and shipped early Scikit-learn member churn and basket-affinity models with collaborative-filtering recommendation prototypes, feeding targeted retention and cross-sell campaigns. This was the first hands-on production ML work that began my pivot from data engineering into ML.
Built XGBoost-based demand-forecasting models for category-level replenishment, reducing stockouts and waste across regional distribution centers.
Authored PySpark transformation scripts and Hive/Sqoop batch ETL for data cleansing and structured star-schema data marts on AWS Redshift for self-service BI.
Built member-360 attribute pipelines (purchase history, store-visit patterns, and loyalty signals) stored across Cassandra, MongoDB, and ElasticSearch for low-latency personalization lookups in member-facing experiences.
Set up Jenkins-based CI/CD for data pipelines and exposed early ML scoring through lightweight Flask services; optimized SQL/Hive queries, cutting analytical runtimes by ~45% on distributed Hadoop clusters; launched Tableau dashboards used daily by merchandising and operations leadership.
Environment: Python, PySpark, Apache Spark, Hadoop, Hive, AWS (S3, EMR, Glue, Lambda, Redshift), Kafka, Sqoop, Scikit-learn, XGBoost, Pandas, NumPy, Cassandra, MongoDB, ElasticSearch, Flask, Jenkins, Tableau, Git
Data EngineerJul 2014 Aug 2017
Careator Technologies Pvt. Ltd. | Hyderabad, India
Owned enterprise ETL workflows in SSIS and Informatica PowerCenter, pulling data from SQL Server, flat files, and XML sources across 5+ business units.
Built OLAP dimensional models and tuned T-SQL and PL/SQL queries for high-volume operational reporting; automated SSRS report delivery to 50+ business users, cutting manual effort by ~70%.
Established data warehouse architecture and ETL documentation standards that reduced pipeline incident rate and accelerated team onboarding; this foundation work set up my later move into cloud and ML.
Environment: SQL Server, T-SQL, PL/SQL, SSIS, SSRS, Informatica PowerCenter, Oracle, OLAP, Tableau, Excel, Data Warehousing
EDUCATION
Bachelor of Technology in Computer Science and Engineering | Indian Institute of Technology, Madras | May 2014
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree active directory information technology golang procedural language Delaware Illinois Iowa Missouri

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7369
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: