Home

Raja Reddy - Senior AI/ML Engineering
[email protected]
Location: Dallas, Texas, USA
Relocation: YES
Visa: GC
Resume file: Raja_Reddy_AI ML Engineer_1770305146307.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
NAME: RAJA REDDY
SR. AI/ ML ENGINEER | DATA SCIENTIST
EMAIL ID: [email protected]
PHONE: +1 704-380-9579
LINKEDIN: http://www.linkedin.com/in/rajareddy1223

PROFESSIONAL SUMMARY
Senior AI/ML Engineer with 11+ years of experience designing, building, and operating production fgrade machine learning and generative AI platforms using Python and SQL, delivered scalable, secure, and business critical solutions across large enterprise and regulated environments.
Managed the complete AI/ML lifecycle initiatives across data engineering, model development, deployment, and operations, ensuring high availability, performance consistency, and measurable business impact in real world production environments.
Experienced in machine learning and deep learning, using scikit learn, XGBoost, LightGBM, TensorFlow, and PyTorch to build classification, regression, forecasting, and risk focused models for fraud detection, analytics, and decision systems.
Delivered in Agile/Scrum, partnering with data engineering, platform, security, and business stakeholders to convert ambiguous requirements into well defined, scalable, production ready AI/ML solutions.
Strong hands on experience delivering enterprise GenAI and LLM driven applications, including RAG based systems, copilots, and AI assisted analytics using Azure OpenAI Service, AWS Bedrock, and SageMaker within secure cloud environments.
Implemented security and governance controls for AI/LLM systems, including IAM based access control, data privacy safeguards, secrets management, audit logging, and policy driven prompt filtering for compliant GenAI deployments.
Built RAG pipelines using embeddings, vector databases, hybrid semantic search, reranking strategies, and metadata filtering to ground LLM responses and reduce hallucinations in production use cases.
Developed LLM applications with LangChain, applying orchestration patterns such as prompt templates, tool calling, context routing, response post processing, and human in the loop workflows for enterprise ready AI systems.
Implemented prompt, model, and data lifecycle management including prompt versioning, model versioning, evaluation, A/B testing, rollback strategies, and controlled releases across dev environments.
Designed scalable data ingestion, transformation, and feature pipelines using Pandas, PySpark, Apache Spark, Kafka, and SQL, enforcing training inference parity and producing high quality ML ready datasets.
Built production ML and LLM inference services using FastAPI and Flask, defining OpenAPI contracts, implementing input validation, authentication, API versioning, and logging to enable secure, scalable, and reliable enterprise system integration.
Established MLOps and LLMOps practices using MLflow style experiment tracking, centralized model registries, Git based workflows, CI/CD pipelines, and automated testing to support reproducible and scalable ML and LLM deployments.
Deployed AI/ML workloads using Docker and Kubernetes (Amazon EKS, Azure AKS) with autoscaling, blue green and canary deployments, and observability via CloudWatch, Azure Monitor, Prometheus, and Grafana, following strong software engineering principles including OOP, clean architecture, modular services, reusable components, and code reviews.

TECHNICAL SKILLS

AI / ML Tools: Python, NumPy, Pandas, scikit learn, XGBoost, LightGBM, PyTorch, TensorFlow; feature engineering, model evaluation, cross validation, hyperparameter tuning, A/B testing
Generative AI & LLMs Azure OpenAI Service, AWS Bedrock, Large Language Models (LLMs), LangChain; prompt engineering, RAG pipelines, embeddings, vector search, reranking, hallucination reduction, human in the loop workflows
Model Serving & APIs FastAPI, Flask, REST APIs, gRPC, OpenAPI / Swagger; inference services, authentication, versioning, structured logging, batch vs real time serving
Data Engineering SQL, Apache Spark, PySpark, Kafka, AWS Glue, Pandas; data ingestion, transformation pipelines, feature pipelines, training inference parity, data quality validation
Cloud Platforms AWS (SageMaker, Bedrock, S3, Redshift, DynamoDB, CloudWatch) | Azure (Azure Machine Learning, Azure OpenAI Service, AKS, Azure Monitor)
MLOps / LLMOps MLflow style experiment tracking, model registries, CI/CD (Jenkins, GitHub Actions), Docker, Kubernetes (EKS, AKS); model versioning, controlled releases, blue green and canary deployments
Observability & Reliability CloudWatch, Azure Monitor, Prometheus, Grafana, Evidently AI; performance monitoring, data drift, latency tracking, cost observability
Databases & Storage PostgreSQL, SQL Server, Oracle, Snowflake, Amazon Redshift, DynamoDB, Amazon S3; analytical, transactional, and vector backed retrieval workloads
Security & Governance IAM / RBAC, secrets management, audit logging, data privacy controls, prompt and output filtering, compliance aligned AI deployments
Software Engineering & Delivery Object oriented design, clean architecture, modular services, reusable components, code reviews, automated testing, Agile / Scrum

PROFESSIONAL EXPERIENCE

Client: Citi Bank, New York, NY Sep 2024 Present
Role: Senior AI / ML Engineer
Responsibilities:
Designed and delivered production grade ML solutions for fraud detection and transaction risk scoring using Python, FastAPI, and AWS SageMaker, reducing manual review workload and improving decision outcomes by approximately 30% through automated model inference and faster prioritization and intelligent case routing.
Implemented a secure, event driven AI platform using microservices, REST/gRPC APIs, Amazon Bedrock, and Kubernetes (EKS), enabling low latency real time inference and scalable model serving with IAM/RBAC based access control and reliable integration with core banking services.
Delivered AI initiatives in Agile/Scrum bi weekly sprints, partnering with Product Owners and stakeholders to refine use case requirements, define acceptance criteria, and release audit ready capabilities aligned with risk and control requirements.
Built data ingestion pipelines across REST APIs, AWS S3, SQL/NoSQL data stores, and Kafka streams to support batch and near real time machine learning workloads, including schema validation and automated retries.
Engineered scalable feature pipelines using Pandas, PySpark, AWS Glue, and SQL to transform raw inputs into model ready features, with built in quality validation and end to end traceability.
Implemented and maintained analytical, operational, and vector data stores across Redshift, Snowflake, DynamoDB, S3, Pinecone, and pgvector (PostgreSQL), enabling analytics workloads, model training, and RAG based AI use cases.
Designed and deployed vector search capabilities using Pinecone and pgvector enabling semantic search and embedding based retrieval to power RAG pipelines for internal knowledge systems supporting AI assisted analysis and decision workflows.
Evaluated machine learning models including XGBoost, LightGBM, and PyTorch-based neural networks, balancing accuracy, latency, and regulatory constraints across enterprise production workloads.
Assessed foundation and large language models such as OpenAI and Claude via AWS Bedrock, selecting approaches aligned with performance requirements, governance controls, and compliance-driven enterprise use cases.
Implemented advanced AI techniques including RAG pipelines, fine tuning with LoRA/PEFT, and real time inference workflows to enhance model relevance, mitigate hallucinations, and improve response accuracy for banking use cases.
Optimized ML and LLM performance through hyperparameter tuning, prompt engineering, and inference optimization, applying batch versus real time serving trade offs to increase throughput and cost efficiency under production workloads.
Controlled and reduced cloud resource usage and inference costs by right sizing compute, tuning batch sizes, and aligning model serving strategies with workload patterns, ensuring scalable and cost efficient AI operations in production.
Built standardized, reusable AI components using PyTorch, Scikit-learn, LangChain, and Hugging Face Transformers to accelerate development velocity and promote consistent engineering patterns across ML initiatives.
Leveraged Spark Streaming and modular AI frameworks to enable scalable, real-time processing and ensure reliable, production-ready implementations across multiple enterprise machine learning and GenAI programs.
Applied solid software engineering practices, including object oriented design, modular Python architectures, reusable libraries, and standardized API contracts to reduce technical debt and maintain maintainability across production AI systems.
Designed and executed structured experimentation and validation, including A/B testing and offline metrics, to verify model performance against business KPIs and risk thresholds before production release.
Containerized AI and ML services with Docker and Amazon ECR to standardize build artifacts and ensure consistent deployments across development, staging, and production.
Deployed and operated machine learning services on Kubernetes (EKS), applying autoscaling, rolling updates, and blue green deployment strategies to maintain high availability and minimize downtime during production releases.
Built and maintained CI/CD pipelines using Jenkins and GitHub Actions to automate model packaging, validation, security scanning, and deployment across development, staging, and production environments, improving reliability and consistency.
Provisioned and managed cloud resources using Infrastructure as Code (Terraform and CloudFormation), ensuring secure, repeatable, and compliant environments aligned with Citi s cloud governance standards.
Implemented end to end monitoring and observability for production AI systems using CloudWatch, Prometheus, Grafana, and Evidently AI, enabling proactive tracking of model performance, data drift, inference latency, and system health.
Implemented security, privacy, and responsible AI controls including data access restrictions, PII handling patterns, model input/output validation, and audit logging to ensure AI systems complied with enterprise risk, security, and regulatory policies.
Produced API documentation and knowledge artifacts (Swagger/OpenAPI, Confluence) enabling effective onboarding, audit readiness, and maintainable AI platform operations.
Supported production operations through incident triage, root cause analysis, and post release performance reviews, applying feedback and error analysis to continuously improve model reliability, accuracy, and operational resilience.
Environment: Python, SQL, Pandas, NumPy, PySpark, Scikit learn, XGBoost, LightGBM, PyTorch, Hugging Face Transformers, LangChain, Prompt Engineering, Retrieval Augmented Generation (RAG), LoRA, PEFT, REST APIs, gRPC, FastAPI, Docker, Kubernetes (EKS), AWS SageMaker, AWS Bedrock, AWS S3, AWS Redshift, Snowflake, DynamoDB, PostgreSQL (pgvector), Pinecone, Apache Kafka, Spark Streaming, Jenkins, GitHub Actions, Terraform, AWS CloudFormation, CloudWatch, Prometheus, Grafana, Evidently AI, Git, Confluence, Agile/Scrum.
Client: CVS Health, Woonsocket, Rhode Island Oct 2022 Aug 2024
Role: AI /ML Engineer
Responsibilities:
Designed and delivered production grade ML solutions for care analytics and member risk stratification using Python, Azure Machine Learning, and REST APIs, enabling clinical and business teams to act on automated risk scores and operational KPIs with faster, more consistent decision making.
Implemented scalable, microservices based AI solutions on Azure, integrating data pipelines, model training, and inference workflows to support batch and near real time healthcare analytics use cases.
Delivered ML initiatives in Agile/Scrum two week sprints, partnering with Product Owners and stakeholders to refine use case requirements, define acceptance criteria, prioritize high value work, and release production ready models aligned with healthcare delivery constraints.
Built data ingestion pipelines sourcing data from REST APIs, Azure Data Lake Storage (ADLS), relational databases, and event streams, enabling consistent and secure data flow for machine learning workloads.
Developed data preprocessing, cleansing, and feature engineering workflows using Pandas, PySpark, and Azure Data Factory, ensuring data quality, reproducibility, and readiness for model training.
Implemented data validation and quality checks across ingestion and transformation pipelines to detect missing data, anomalies, and schema drift early, improving trust in downstream ML outputs.
Managed structured and semi structured datasets across Azure Data Lake, Azure SQL Database, and Azure Synapse Analytics, supporting analytics, model training, and reporting needs at enterprise scale.
Designed and trained predictive machine learning models using Scikit learn, XGBoost, LightGBM, and PyTorch to support healthcare risk prediction, utilization forecasting, and operational optimization.
Applied supervised learning, feature selection, and ensemble methods while aligning model behavior with healthcare domain constraints, privacy, and regulatory requirements, emphasizing stable performance across diverse member groups.
Optimized model performance through hyperparameter tuning, feature refinement, and cross validation, improving accuracy, stability, and generalization across diverse member populations.
Implemented model governance practices including documentation of assumptions, feature lineage, evaluation results, and thresholds to support healthcare compliance, audit reviews, and responsible model usage.
Leveraged Azure Machine Learning for experiment tracking, model versioning, and reproducible training pipelines, enabling controlled promotion of models from development to production.
Applied strong software engineering practices, including modular Python development, object oriented design, reusable libraries, and API first implementations to maintain clean, maintainable ML codebases.
Containerized machine learning services using Docker, published images to Azure Container Registry (ACR), and deployed inference endpoints via Azure Machine Learning managed endpoints for scalable production serving.
Implemented CI/CD workflows using Azure DevOps pipelines to automate model training, validation, and deployment across development, QA, and production environments.
Supported production ML operations through incident triage, root cause analysis, and post deployment reviews, collaborating with platform teams to maintain reliable and stable ML services.
Managed Azure infrastructure using Infrastructure as Code (ARM templates, Terraform), ensuring secure, repeatable, and compliant cloud resource management.
Implemented monitoring and reliability controls using Azure Monitor and Application Insights to track service health, latency, and operational signals for production ML endpoints.
Supported unit and integration testing with PyTest and maintained documentation via internal knowledge base and Swagger/OpenAPI to improve maintainability and audit readiness.
Environments:
Python, Scikit learn, XGBoost, LightGBM, PyTorch, Pandas, PySpark, REST APIs, Azure Machine Learning, Azure Data Factory, Azure Data Lake Storage (ADLS), Azure Synapse Analytics, Azure SQL Database, Docker, Azure Container Registry (ACR), Azure DevOps, ARM Templates, Terraform, Azure Monitor, Application Insights, Git, Agile/Scrum.

Client: State Of Connecticut, Hartford, CT Feb 2020 Sep 2022
Role: Data Scientist/ML Engineer
Responsibilities:
Designed and delivered data science and ML solutions for financial risk scoring and operational analytics using Python, SQL, and Amazon Web Services, improving decision support and reporting accuracy across state programs.
Built end to end analytical workflows spanning ingestion, feature engineering, model training, and reporting layers, enabling repeatable delivery of predictive insights for program stakeholders.
Collaborated closely with data engineering, analytics, and operations teams to validate data sources, define analytical requirements, and ensure consistent business logic across dashboards, reports, and ML outputs.
Conducted analytical feasibility assessments and requirement analysis to identify suitable data sources, modeling approaches, and delivery timelines, helping optimize effort, cost, and expected program outcomes.
Developed ETL/ELT pipelines integrating AWS S3 and RDS with Snowflake, optimizing SQL extraction and incremental loads to meet analytics and audit reporting requirements.
Engineered scalable data transformation and feature preparation workflows using Pandas, NumPy, and SQL to produce model ready datasets from structured and semi structured sources.
Implemented data validation and quality checks across ingestion and transformation layers to identify data gaps, inconsistencies, and anomalies early, improving reliability of analytics and downstream ML models.
Designed dimensional data models using star and snowflake schemas, applying normalization/denormalization patterns to support OLTP/OLAP workloads across data marts and reporting layers.
Performed exploratory, univariate, and multivariate analysis to identify trends, correlations, and anomalies, strengthening feature selection and improving model explainability for risk analytics.
Built and tuned predictive models using Scikit learn, XGBoost, Random Forest, and SVM for classification and regression use cases, supporting risk assessment and operational forecasting initiatives.
Applied preprocessing techniques (imputation, outlier treatment, encoding, scaling) to improve model stability, generalization, and performance consistency across diverse datasets.
Implemented offline evaluation using cross validation and classification metrics (precision/recall, ROC AUC) to select models aligned with program KPIs and false positive reduction goals.
Developed NLP pipelines using NLTK and Scikit learn for text classification and sentiment analysis on unstructured inputs, improving signal quality and reducing manual review effort.
Conducted time series forecasting and trend analysis to support planning and operational optimization, translating outputs into stakeholder friendly insights and dashboards.
Packaged and deployed trained models using Docker and AWS SageMaker for scalable hosting, enabling consistent runtime environments across development and production.
Developed REST based Python services to expose ML predictions and analytical outputs for integration with web based applications and reporting workflows.
Implemented production monitoring using CloudWatch logs/metrics and alerting patterns to track latency, failures, and model service health post deployment.
Supported audit readiness by documenting data lineage, model assumptions, evaluation results, and deployment configurations, ensuring reproducibility and compliance with state reporting standards.
Ensured consistent definitions, metrics, and transformations across analytical outputs by enforcing data standards and reconciliation checks, improving trust in reports and ML driven insights across stakeholders.
Supported stakeholder reviews and knowledge transfer by presenting analytical findings, model outcomes, and limitations to technical and non technical.
Created ad hoc and scheduled analytics using SQL and Microsoft Power BI, validating data quality and documenting assumptions, transformations, and model inputs for long term maintainability.
Environment: Python, SQL, Pandas, NumPy, Scikit learn, XGBoost, Random Forest, SVM, NLTK, Time Series Forecasting, EDA, Feature Engineering, ETL/ELT, Star/Snowflake Dimensional Modeling, REST APIs, AWS (S3, RDS, SageMaker, CloudWatch), Snowflake, Docker, Power BI, Git, Agile/Scrum.

Client: Publix Super Markets, Lakeland, Florida May 2017 Dec2019
Role: Data Scientist
Responsibilities:
Developed demand forecasting models to predict store level sales and inventory requirements, enabling merchandising teams to plan replenishment strategies and reduce stockouts across multiple product categories.
Built recommendation models using customer purchase history and basket affinity analysis to personalize promotions, improve product discoverability, and increase engagement across in store and digital retail channels.
Created churn prediction models to identify customers at risk of disengagement, supporting targeted retention initiatives and data driven loyalty campaigns aimed at improving customer lifetime value.
Designed and implemented NLP models for product categorization and customer review sentiment analysis, helping business teams understand feedback trends and optimize assortment planning decisions.
Implemented fraud detection models to identify suspicious retail transactions, reducing manual fraud reviews while improving transaction level monitoring and customer account protection.
Developed pricing analytics and elasticity models to evaluate price sensitivity across products, supporting data driven pricing strategies, promotional planning, and margin optimization.
Automated feature engineering and data preprocessing pipelines using Python and Apache Spark, enabling repeatable, scalable model training and reducing manual data preparation effort.
Standardized model training, validation, and experimentation workflows in cloud environments to ensure consistent evaluation, reproducibility, and comparability of predictive models.
Conducted A/B testing and statistical analysis on promotions and recommendation strategies, measuring impact on conversion rates, customer engagement, and sales performance.
Built interactive business dashboards and KPI reports using Tableau and Power BI to communicate predictive insights, sales trends, and customer behaviour to non technical stakeholders.
Collaborated with data engineering, UX/UI, and business teams to integrate predictive insights into web and mobile retail applications.
Performed exploratory data analysis (EDA), data validation, and anomaly detection on large retail datasets to improve feature quality, model reliability, and trust in analytical outputs.
Documented model assumptions, evaluation metrics, and data transformations, supporting knowledge transfer, maintainability, and consistent reuse of analytical solutions across teams.
Environment: Python, R, SQL, Pandas, NumPy, scikit learn, XGBoost, TensorFlow, Apache Spark, Hadoop, Kafka, SQL Server, Oracle, Jupyter Notebooks, Tableau, Power BI, Git, Agile/Scrum.

Client: T Mobile, Herndon, VA Aug 2016 May 2017
Role : Data Analyst Process Associate
Responsibilities:
Developed customer churn analysis models using Logistic Regression, Random Forest, and XGBoost to identify at risk subscribers and support targeted retention strategies.
Supported network anomaly detection analysis using Isolation Forest and autoencoder based techniques to surface unusual traffic patterns and help reduce service disruption incidents.
Performed sentiment analysis on customer support interactions using NLTK and spaCy to identify recurring service issues and improve customer experience insights.
Assisted in building batch and near real time data pipelines using Apache Kafka and Apache Spark to support telecom operational analytics and reporting.
Automated customer segmentation analysis based on usage behavior and engagement patterns, enabling more personalized marketing and retention campaigns.
Collaborated with data engineering teams on ETL workflows and feature preparation, ensuring clean and analysis ready datasets for reporting and ML use cases.
Conducted A/B testing and statistical analysis on retention offers to evaluate the effectiveness of personalized plans and promotional strategies.
Produced SQL based reports and ad hoc analysis to support regulatory reviews and customer satisfaction KPI tracking.
Environment: : Python, R, SQL, Pandas, scikit learn, XGBoost, TensorFlow (basic), Autoencoders, NLTK, spaCy, Apache Kafka, Apache Spark, Hadoop, Tableau, Power BI, SQL Server.

Client: American Express, India Mar 2014 May 2016
Role : Data Analyst Process Associate
Responsibilities:
Created SQL based credit risk and portfolio reports using PostgreSQL, Cassandra, and SQL Server to support risk assessment and financial decision making teams.
Built and maintained ETL workflows using Alteryx and Python to clean, validate, and prepare large financial datasets, reducing manual data preparation effort and reporting delays.
Developed interactive Power BI dashboards to provide visibility into credit portfolios, risk metrics, and operational KPIs for finance and risk stakeholders.
Applied feature engineering and SQL performance tuning to improve the speed and reliability of large scale financial data queries and analytical reports.
Performed exploratory data analysis (EDA) using pandas and NumPy to identify trends, anomalies, and data quality issues in transactional and portfolio datasets.
Partnered with finance and risk teams to define data models, reporting requirements, and KPIs aligned with business objectives and regulatory expectations.
Produced ad hoc and recurring reports using SQL and Excel to support business teams, automating routine reporting and improving turnaround time.
Collaborated with BI developers to standardize dashboard design and reporting definitions, increasing usability and adoption of analytics across teams.
Conducted data validation, integrity checks, and root cause analysis to ensure accuracy and compliance with internal audit and governance standards.
Environment: Python, R, SQL, SQL Server, PostgreSQL, Cassandra, Alteryx, Power BI, Tableau, Excel, Git.

Education:
Ramaiah University of Applied Sciences in Computer Science
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning user interface user experience business intelligence sthree active directory rlang Connecticut Idaho New York Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6779
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: