| Naveena - Data scientist,AI/ML Engineer |
| [email protected] |
| Location: Atlanta, Georgia, USA |
| Relocation: YES |
| Visa: H1B |
| Resume file: Naveena Resume_1775144452300.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
LinkedIn:linkedin.com/in/reka-n-7241632b4
Professional Summary 10+ years of experienceas a Data Scientist / AI-ML Engineer delivering end-to-end machine learning, deep learning, and data engineering solutions across healthcare, automotive, IoT, NLP, and computer vision domains. Proven expertise in building production-grade AI systems, real-time pipelines, and scalable MLOps platforms on AWS, Azure, and GCP. Strong background in ML/DL algorithms, NLP, time-series forecasting, and computer vision, with hands-on deployment using Docker, Kubernetes, and serverless architectures. Experienced in ETL, data lakehouse architectures, CI/CD for ML, and model governance. Adept at translating complex analytics into actionable business insights, collaborating with stakeholders, and driving AI solutions aligned with enterprise KPIs. Passionate about MLOps automation, AI ethics, and generative AI innovation. Education: Bachelors in Computer Science JNTUH Skills: Machine Learning & Artificial Intelligence Data Analysis & Predictive Modeling Python, R, SQL,PL/SQL, Java, Scala GPU Optimization: Gaudi 3, CUDA (if applicable), Nsight, TensorRT, Habana SynapseAI SDK, kernel profiling, memory tuning. Supported deployment and operationalization of NLP and deep learning models in cloud environments using AWS SageMaker and Azure Machine Learning. Machine Learning & AI: Scikit-learn, XGBoost, TensorFlow, Keras, MLflow, StatsModels, Prophet, Evidently AI, NLP (NLTK, Gensim), Genism Deep Learning & NLP: Neural Networks, CNN, RNN, Transformers, BERT, Text Classification, Sentiment Analysis GCP: {GCP workflow, GCP functions, VM} MLOps& Orchestration: Kubernetes, Helm, Terraform, Ray Serve, MLflow, Prometheus, Grafana Cross-Functional Collaboration Cloud Platforms: Azure ,AWS (EKS, SageMaker, Lambda), GCP (Vertex AI, GKE, Cloud Run) APIs: REST, gRPC, FastAPI, Flask CI/CD: GitHub Actions, Jenkins, ArgoCD Built and maintained MLOps pipelines to support NLP workflows such as text processing, classification, and entity recognition through automated CI/CD processes. Supported end-to-end ML model lifecycle including development, testing, deployment, and monitoring using AWS SageMaker and Azure Machine Learning. AI Frameworks & Model Deployment Built and maintained automated MLOps pipelines for model training, validation, and deployment using CI/CD tools such as Jenkins, GitHub Actions, and Azure DevOps. Enabled model validation and versioning workflows, ensuring reproducibility, auditability, and compliance with enterprise standards. Deployed ML models as containerized microservices (Docker, Kubernetes) for scalable, real-time inference in cloud environments. Integrated event-driven architectures (Kafka) with ML pipelines to support real-time data ingestion and model scoring workflows. Implemented data pipelines, model validation (Airflow, AWS Glue, EMR) for preprocessing, feature engineering, and feeding models with high-quality validated datasets. Established monitoring and observability for ML systems using CloudWatch, Prometheus, Grafana, and Splunk to track model performance, drift, and system health. AWS: {EC2, S3, lambda, ECR, SQS, Step functions, RDS} Azure: {Logic apps, azure functions, blob storage, cosmos dB, azure form recognizer, open ai services} Cloud Computing & Big Data Processing Data-Driven Decision-Making AI Ethics & Compliance Communication & Presentation Skills Microsoft Excel & Networking Professional Experience: Client: Centene Corporation Mar 2024 Present Role: Data Scientist AI/ML Engineer Responsibilities: Developed and deployed machine learning models for healthcare prediction and classification (risk scoring, patient outcome prediction, anomaly detection) using Python and scikit-learn. Processed and analyzedstructured and unstructured healthcare data including CSV datasets, clinical logs, and text-based records, ensuring data quality and regulatory compliance. Used SQL to extract, join, and validate training and inference datasets from relational data sources. Supported both real-time inference via RESTAPIs and batch scoring pipelines based on business and operational requirements. Documented model assumptions, data constraints, performance trade-offs, and known limitations to support transparency and downstream consumption. Built and evaluated regression models to predict continuous outcomes, applying appropriate loss functions and validation strategies to ensure reliable predictions. Designed and trained machine learning and deep learning models using Python, TensorFlow, and Keras for healthcare risk prediction and patient outcome classification. Performed data cleaning, feature engineering, and feature selection on patient demographics, clinical indicators, lab results, and historical records to improve model accuracy. Designed and deployed RESTful APIs to expose ML models for real-time inference using Python-based frameworks (Flask/FastAPI) and containerized microservices. Integrated ML models with enterprise systems via API Gateway and Lambda, enabling seamless data validation, scoring, and response workflows. Implemented secure API communication using OAuth2, JWT, and HTTPS to ensure compliance with healthcare data security standards (HIPAA). Automated CI/CD pipelines for API deployment, ensuring version control, rollback strategies, and consistent delivery of model endpoints. Tuned and evaluated models using cross-validation and healthcare-relevant metrics such as precision, recall, F1-score, ROC-AUC, and RMSE. Designed and managed data pipelines, Monitoring tracking, (Airflow, AWS Glue, EMR) to prepare and validate datasets for ML model consumption. Built REST-based inference APIs using FastAPI/Flask to integrate ML models into healthcare applications and reporting systems. Deployed models on AWS and GCP, leveraging EC2 / Cloud VMs for scalable training and REST APIs for real-time inference. Collaborated with data engineers and software teams to integrate ML pipelines with databases and downstream clinical workflows. Implemented model monitoring and retraining strategies to detect data drift and maintain consistent prediction performance over time. Used Pandas, NumPy, and SQL to extract, transform, and prepare healthcare datasets from relational and cloud-based data sources. Developed end-to-end RAG (Retrieval-Augmented Generation) system integrating AmazonBedrock and OpenAI models for domain-specific document intelligence. Developed end-to-end RAG (Retrieval-Augmented Generation) system integrating Developed end-to-end RAG (Retrieval-Augmented Generation) system integrating AmazonBedrock and OpenAI models for domain-specific document intelligence. Designed vector search pipelines using FAISS with custom embeddings from Hugging Face Transformers. Applied MLOps practices including experiment tracking, model versioning, and CI/CD pipelines using MLflow and Git-based workflows. Containerized ML services using Docker and deployed solutions on cloud platforms (AWS / Azure) for scalable and secure execution. Integrated model monitoring using MLflow and drift detection with EvidentlyAI. Deployed scalable inference APIs using FastAPI with Docker and Kubernetes. Built intelligent agent workflows with tool-calling and multi-step reasoning. Documented model assumptions, validation results, and limitations to support transparency, audits, and stakeholder reviews. Implemented PySpark-based data processing on massive healthcare data from Delta Lake and HDFS, enabling batch processing of over 10M+ records per run. Orchestrated event-driven document workflows on AWS using Glue, Lambda, and Athena, with retry logic and conditional triggers. Employed MLflow for model Monitoring tracking, version control, and deployment orchestration with reproducibility. Built FastAPI microservices for real-time scoring and wrapped NLP models into scalable containers using Docker and Kubernetes. Integrated NLP-based models into REST APIs and event-driven systems (Kafka) for real-time processing and downstream consumption. Ensured secure and compliant deployment of AI/ML models handling sensitive healthcare data, aligned with HIPAA and enterprise governance standards. Created data ingestion framework with Airflow DAGs, managing retries, error handling, and SLA monitoring. Managed HIPAA-compliant data pipelines, ensuring PHI protection, encryption, and role-based access via IAM policies. Built alerting systems integrated with Slack, SNS, and CloudWatch alarms to flag anomalies or failed jobs. Implemented automated data validation and schema enforcement using PyDeequ and Great Expectations. Ensured near-perfect uptime of production scoring endpoints through health checks and failover mechanisms. Client: General Motors Aug 2021 Jan 2024 Role: Machine Learning Engineer Responsibilities: Designed deep learning pipelines using CNNs to automatically detect surface defects in vehicle images, improving QA throughput. Leveraged ResNet, EfficientNet, and YOLOv5 to develop real-time object detection models for visual inspections on production lines. Trained ComputerVision models from scratch, including CNNs and Transformer-based architectures, for image classification and object detection. Developed deep learning pipelines using TensorFlow, Keras, and PyTorch for real-time visual inspection of vehicle components. Designed multi-modal AI pipeline combining image embeddings (CNN/CLIP) and textual metadata using Transformer-based models. Fine-tuned Vision Transformers (ViT) and BERT-based encoders for joint image-text understanding. Integrated defect detection system into factory IoT cameras, enabling continuous inspection of vehicle components. Built image pre-processing module using OpenCV to normalize resolution, correctskew, and remove noise. Developed multi-label classification models using PyTorch and Keras, achieving very high precision on test images. Performed model quantization and pruning to enable deployment on edge devices and on-prem GPU clusters. Built custom loss functions to optimize recall for high-priority defect classes, reducing false negatives. Created pipelines using Airflow to retrain models automatically on new data every 3 weeks. Used Kafka to stream inspection data from manufacturing floor to central Spark-based cluster. Developed anomaly detection module using Isolation Forest and One-Class SVM for non-visible defect patterns. Visualized historical defect rates across plants using Looker, driving strategic quality improvement initiatives. Applied explainable AI techniques like Grad-CAM and SHAP to make model decisions interpretable by engineers. Built REST APIs with FastAPI for real-time inference integrated with GM s MES system. Designed and deployed computer vision and deep learning models on cloud-based infrastructure, leveraging scalable GPU environments. Evaluated and compared cloud ML tooling and deployment patterns across platforms to support enterprise manufacturing use cases. Worked with EEG / ECoG-like biomedical and time-series signals, performing preprocessing, normalization, and noise filtering. Performed large-scale model training on GPU-enabled cloud environments (AWS / Azure / GCP). Used Azure Databricks to orchestrate batch scoring jobs and schedule pipeline execution using Job Clusters. Stored results in Snowflake and created audit dashboards in Power BI for leadership tracking. Designed intelligent AI workflows combining vision models and decision logic for automated quality inspection systems. Implemented MLflow tracking for model metrics, reproducibility, and hyperparameter tuning. Orchestrated event-driven ML workflows using Airflow, Kafka, and Spark for scalable production deployments. Used Terraform scripts for deploying infrastructure and managing configuration across dev, UAT, and prod. Conducted root cause analysis on misclassified defects using segmentation and clustering techniques. Employed Data Version Control (DVC) to track changes in image datasets and training artifacts. Developed custom augmentation pipelines to simulate lighting, angle, and surface variation scenarios. Managed a team of 5, led daily standups, sprint planning, and technical design sessions. Integrated MCP with Python-based APIs to support scalable and production-ready GenAI services. Collaborated with control system engineers and supply chain teams to align models with hardware capabilities. Reduced inspection time per vehicle from 4 minutes to under 1 minute by integrating optimized ML models. Created comprehensive SOP documentation and conducted training sessions for QA team adoption. Optimized GPU inference using CUDA and Nvidia NIM runtime for low-latency deploymen Used TensorBoard and Weights & Biases to visualize model training metrics and early stopping triggers. Delivered quarterly presentations on AI model ROI and impact on operational efficiency. Set up monitoring with Prometheus and Grafana to track scoring latency and GPU resource utilization. Ensured compliance with ISO 26262 and automotive safety standards during model deployment. Client: Vistan NextGen Hyderabad Apr 2019- Aug 2021 Role: Data Scientist ML Engineer Responsibilities: Designed and deployed deep learning models (CNNs, VGG-16, InceptionV3) to classify retinal images for abnormalities such as Diabetic Retinopathy and Macular Edema. Worked with ophthalmologists to label over 100K+ high-resolution retina images, creating a gold-standard training dataset. Conducted pixel-level segmentation of fundus images using U-Net and Mask R-CNN, improving disease detection precision. Designed modular AI components that functioned as coordinated workflow stages, similar to agent-based systems. Applied data augmentation techniques (rotation, flipping, brightness variation) to enhance model generalizability. Implemented automated image preprocessing pipelines using OpenCV, including blood vessel extraction, histogram equalization, and noise reduction. Developed AI-driven clinical decision support systems using deep learning models integrated into enterprise healthcare platforms. Built dynamic prompt routing framework selecting optimal LLM based on task complexity. Architected agent-based AI system supporting tool-calling, memory persistence, and contextual task planning. Exposed AI models via Python-based APIs and connectors for mobile and enterprise applications. Utilized transfer learning to train models on small labeled datasets with high accuracy and reduced training time. Designed modular agent orchestration pipeline inspired by LangGraph-style state transitions. Integrated APIs, databases, and external tools into autonomous reasoning workflows. Built a model inference API using Flask and containerized it with Docker for deployment on cloud endpoints. Enabled real-time diagnosis support for ophthalmologists via lightweight mobile-accessible API services. Integrated Grad-CAM heatmaps to highlight regions of concern, improving physician trust in model predictions. Tracked experiment metadata, model metrics, and model versions using MLflow and TensorBoard. Optimized training with TPUs on Google Colab Pro and integrated GCP Storage buckets for storing large image datasets. Developed data versioning pipelines using DVC, facilitating reproducible experiments and smooth model updates. Implemented binary and multi-class classification architectures and achieved a high F1-score on validation sets. Performed threshold tuning based on ROC-AUC curves to maximize clinical decision support effectiveness. Created interactive dashboards using Streamlit to demo model outputs for cross-functional stakeholders. Developed custom annotation tools using Tkinter and PIL to accelerate manual image review. Conducted statistical analysis on patient demographics and diagnostic trends using Pandas and StatsModels. Trained models for early-stage classification using ensemble methods (XGBoost, LightGBM) as baselines. Ensured the system met medical data compliance (HIPAA) and regional healthcare data policies. Worked in an Agile Scrum team, contributed to user story grooming, and drove end-to-end delivery of ML modules. Validated models using k-fold cross-validation and managed model drift using Evidently AI dashboards. Participated in weekly code reviews, shared knowledge through internal wikis, and mentored junior engineers. Developed automated testing suites using PyTest and Postman for model API performance and contract testing. Logged metrics and inference feedback loops from real usage to iteratively improve prediction accuracy. Delivered detailed documentation on model training, hyperparameters, and evaluation strategies. Built real-time inference APIs and integrated ML outputs into downstream business intelligence dashboards. Conducted clinical stakeholder workshops to explain model functioning, build trust, and gather continuous feedback. Designed alert mechanisms for low-confidence predictions to prompt secondary physician review. Partnered with product and design teams to embed AI models into a retina screening mobile app prototype. Actively participated in AI in healthcare forums and presented learnings on deep learning in diagnostics. Conducted time series analysis on vision deterioration across patient timelines using Prophet and ARIMA. Client: Inovalon Hyderabad Nov 2015 Apr 2019 Role: Python Developer Responsibilities:s Developed ETL pipelines in Python for parsing and processing real-time telemetry data from IoT devices across smart homes and industrial sensors. Designed and built scalable data pipelines for ingesting high-velocity IoT telemetry data from 10K+ devices using Kafka, PySpark, and Airflow. Integrated multiple data sources and platforms through secure REST-based connectors. Developed automated ETL frameworks integrating AWS IoT Core, Hive, and MongoDB, enabling real-time monitoring and predictive fault detection. Designed cloud-hosted API integrations for sensor data analytics, generating real-time alerts and operational dashboards for leadership Integrated multiple data sources and platforms through secure REST-based connectors.. Built visualization dashboards in Power BI and Looker to monitor device anomalies, optimize energy consumption, and automate capacity planning. Optimized telemetry data ingestion pipelines to handle 50GB+ daily streaming data, ensuring performance and scalability. ________________________________________ Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree database rlang information technology procedural language |