Home

Jaya Krishna - Senior Data Engineer
[email protected]
Location: Dallas, Texas, USA
Relocation: Yes
Visa: Green Card
Resume file: Jaya_krishna_CV_1779830683142.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Jaya Krishna
Senior Data Engineer

Phone: +1 940 295 5544
Email: [email protected]
LinkedIn: https://www.linkedin.com/in/jaya-krishna-223002139/

PROFESSIONAL SUMMARY:

Microsoft Fabric Certified Azure Data Engineer Associate with 12+ years of experience in designing, developing, and deploying large-scale data solutions, data warehouses, and analytics platforms across Azure and AWS.
Strong expertise in Azure services including Data Factory (ADF), Synapse Analytics, Data Lake Gen2, Databricks, Cosmos DB, Event Hubs, Functions, Logic Apps, Azure SQL, and Azure Analysis Services.
Experienced in Databricks (PySpark, Spark SQL, Delta Lake, MLflow) for ETL, real-time streaming, and ML model integration.
Proficient in Big Data ecosystem: Hadoop, HDFS, MapReduce, Spark (Core, SQL, Streaming, MLlib), Hive, Sqoop, Kafka, Zookeeper, and HBase for distributed processing and advanced analytics.
Skilled in data modeling (Star Schema, Snowflake Schema, Slowly Changing Dimensions) and building data warehouses/EDWs using SQL Server, Oracle, Teradata, and Snowflake.
Proficient in Python (Pandas, NumPy, SciPy, Matplotlib, PySpark) and Shell scripting for automation, data wrangling, and pipeline integration.
Expertise in CI/CD pipelines using Azure DevOps, Jenkins, GitHub Actions, and Bitbucket; automated ETL deployments with Groovy, Maven, SonarQube, and Docker/Kubernetes.

TECHNICAL SKILLS:

Programming & Scripting Python, SQL, PySpark, Scala, Java, T-SQL, PL/SQL
Azure Cloud Services Azure Data Factory, Azure Databricks, Azure Functional Apps, Azure DevOps, Azure Synapse Analytics, Azure Data Lake Gen 2, Cosmos DB, Azure AI Foundry, Azure Purview, Microsoft Fabric, Azure Kubernetes, Azure SQL Server
AWS Cloud Services EC2, S3, Lambda, Route 53, Elastic Beanstalk (EBS), VPC, IAM, EC2 Container Service (ECS), Dynamo DB, Auto Scaling, Security Groups, Redshift, CloudWatch, DynamoDB
Big Data Technologies HDFS, MapReduce, Spark, Hive, HBase, Yarn, Apache Airflow, Apache NiFi, Kafka, Spark Streaming, Oozie, Sqoop, Zookeeper, Pig, Cribl, Vector, Flume
Databricks Delta Lake, Delta Live Tables, Pipelines, Unity Catalog, MLflow, Genie, Databricks SQL, Agent Bricks, Lakehouse Architecture
Databases MySQL, Oracle, MS-SQL Server, Teradata, HBase, Snowflake, Cassandra, Cosmos DB, Dynamo DB, Mongo DB, Azure SQL DB
Data Modelling Fact & Dimension Modelling, Star Schema, Snowflake Schema, Medallion Architecture, SCD (Slowly Changing Dimension), Partitioning & Bucketing Strategies
Data Warehousing & ETL Informatica PowerCenter, Informatica Workflow Manager, dbt(data build tool), , Alteryx, SSIS, SSRS, Erwin Data Modeler, Oracle Data Warehouse
Python Libraries NumPy, Pandas, SciPy, Matplotlib, Seaborn, Scikit-learn, TensorFlow, PyTorch, NLTK, PyTest
Version Control Git, GitHub, Bitbucket, Jenkins
Machine Learning Regression, Classification, Clustering (K-Means, Hierarchical), Decision Trees, Random Forest, SVM, XGBoost, Time Series Forecasting, LSTM for Time Series
Generative AI LLM s, Langchain, Langsmith, Retrieval-Augmented Generation (RAG), Vector Databases (Faiss, Chroma, Pinecone), Embedding Models, Prompt Engineering
Hadoop Frameworks Cloudera CDHs, Hortonworks HDPs
File Formats ORC, Parquet, JSON, CSV, Avro, TXT, XML, Excel
Visualization Tools Power BI, Tableau
PROFESSIONAL EXPERIENCE:
Toyota, TX Oct 2022 Present
Sr. Data Engineer
Built and optimized end-to-end ETL/ELT pipelines using Azure Data Factory, Databricks (PySpark/Scala), and Snowflake to process large-scale transactional and semi-structured data (JSON, Avro, Parquet, ORC), enabling analytics for finance and operations.
Designed and implemented Snowflake data warehouses, creating schemas, tables, secure views, and materialized views to support BI dashboards and enterprise reporting.
Developed complex SQL queries, stored procedures, and UDFs in Snowflake and Azure SQL to support business use cases including trend analysis, forecasting, and KPI reporting.
Built data lakehouse architecture by combining Snowflake with cloud object stores for Medallion (bronze/silver/gold) layering.
Built and optimized Delta Lake pipelines using Apache Spark and Databricks on Azure, processing petabytes of structured and semi-structured data with improved performance and scalability over traditional data lake models.
Enhanced Spark and Databricks jobs through caching, partitioning, and configuration tuning, reducing runtime by 35% and improving scalability for large-volume datasets.
Designed GenAI-enabled data workflows leveraging Databricks MLflow, vector embeddings, and RAG pipelines for metadata enrichment and intelligent data discovery.
Built incremental and CDC-based data ingestion pipelines using ADF with watermarking, Change Tracking, and event-driven triggers, significantly improving pipeline efficiency and reducing load times.
Implemented Azure Databricks Unity Catalog for centralized governance, enabling fine-grained access control, lineage tracking, and compliance across multiple workspaces and data domains.
Managed Delta Lake storage on Azure by leveraging Azure Data Lake Storage (ADLS Gen2), ensuring secure, scalable, and cost-effective data storage solutions for high-volume data ingestion.
Developed mappings/Transformations by using a mapping designer, transformation developer, and mapplets designer in Informatica Power Center.
Implemented Delta Lake versioning and Time Travel features to enable seamless rollback and data auditing capabilities in Databricks, enhancing disaster recovery and providing data lineage for compliance purposes.
Implemented data governance and compliance for Salesforce data integration, maintaining metadata and lineage tracking in Azure Purview to ensure secure, accurate, and compliant data pipelines.
Integrated structured and unstructured data (tables, logs, documents, APIs) into unified GenAI pipelines using Databricks Lakehouse architecture.
Integrated MLflow with Databricks to version and track machine learning models, ensuring reproducibility of experiments and enabling seamless model promotion across different environments.
Ensured data security by encrypting PII in ADLS with Key Vault managed keys and enforcing authentication/authorization via Azure AD/Entra ID.
Designed, developed, and maintained complex semantic models (star schemas, dimensions, fact tables) for Power BI and Microsoft Fabric-based reporting, enabling actionable insights for business stakeholders.
Modeled Snowflake data warehouse structures using dimensional modeling (Star/Snowflake schemas), implementing Slowly Changing Dimensions (SCDs) and Change Data Capture (CDC) for historical tracking.
Designed and implemented a Microsoft Fabric (POC) to evaluate unified analytics capabilities using OneLake, Lakehouse, and Fabric Data Factory, reducing data duplication and simplifying data access across domains.
Built Fabric Lakehouse architecture on OneLake, enabling seamless ingestion, transformation, and querying of structured and semi-structured data using Spark notebooks and SQL endpoints.
Orchestrated CI/CD workflows for Fabric pipelines and AI deployments using Azure DevOps, ensuring automated testing, RBAC-based access control, and zero-trust security compliance.
Collaborated with product owners, architects, and business analysts in Agile Scrum ceremonies, using JIRA to track sprints, backlog items, and releases.
Environment: Azure Databricks, Apache Spark, ADLS Gen2, Azure Data Factory, Cosmos DB, Azure SQL DB, Azure Fabric, PostgreSQL, MySQL, Azure Key Vault, Azure Purview, Logic Apps, Airflow, Informatica, Kubernetes (AKS), Snowflake, Spark SQL, SSIS, Git, JIRA, Power BI.

Micron Technology, ID Apr 2020 - Sept 2022
Data Engineer
Built Databricks notebooks to extract and process data from DB2, Teradata, and SQL Server into Azure SQL DB and Synapse, applying cleansing, wrangling, and ETL transformations.
Designed and developed scalable pipelines in Azure Data Factory (ADF v2) and Databricks to orchestrate ingestion, transformations, and loading into Synapse and Snowflake, with AWS Glue/Lambda used where S3 was the staging layer.
Migrated on-prem Hadoop and Oracle ETL workloads to Azure HDInsight and Synapse, while also integrating with AWS EMR for select analytics use cases.
Implemented PySpark-based solutions for data quality (deduplication, null handling, schema validation), ensuring high-quality data pipelines.
Built data lakes on Azure Data Lake Gen2 and AWS S3, applying partitioning, lifecycle policies, and multi-region strategies to optimize storage performance and costs.
Automated ingestion into S3 and ADLS using AWS Glue, Lambda, and Python scripts, integrating with Athena, Redshift, and EMR for querying and analytics.
Automated multi-cloud ingestion pipelines (AWS S3, ADLS Gen2, GCP Storage) into Unity Catalog enabled Lakehouse, leveraging PySpark for scalable transformations.
Built real-time streaming pipelines in Databricks using Kafka + DLT, enabling near real-time insights and anomaly detection for operational dashboards.
Designed Lambda-style architectures combining batch (Synapse, Snowflake, S3) and streaming (Kafka, Event Hubs, Spark) for unified analytics.
Implemented CI/CD pipelines in Azure DevOps, Jenkins, and GitHub Actions for automated deployment of ETL pipelines and ML models across hybrid environments.
Migrated legacy Oracle ETL jobs into Azure Synapse and Databricks, applying partitioning strategies for performance and scalability.
Designed and deployed Kafka Connect pipelines for ingesting high-volume streaming data from relational and NoSQL sources into data lakes and warehouses, enabling near real-time analytics.
Designed serverless stream processing pipelines by publishing telemetry data into Kafka clusters and Spark jobs via Azure Event Hubs.
Enhanced existing Python APIs and modules for data validation and automated loading into HBase, Cosmos DB, and SQL-based warehouses.
Managed database integration across heterogeneous platforms (MySQL, Oracle) ensuring seamless end-to-end pipelines.
Documented high-level and low-level designs, created interface specifications, and delivered workflow guides for cross-team knowledge sharing.
Actively participated in Agile sprints with product owners and architects, mentoring junior engineers, troubleshooting issues, and aligning delivery with business requirements.
Environment: Azure Data Factory, Azure Databricks, Azure Synapse, ADLS Gen2, Azure DevOps, SQL Server, Oracle, Power BI, Apache Kafka, Spark Streaming, Python, PySpark, Git, Airflow, AWS S3, Glue, Lambda, Athena, Redshift, EMR

Cardinal Health, OH Apr 2018 - Mar 2020
Data Engineer
Designed and deployed Azure Analytic Services tabular models to power BI dashboards and enterprise reporting.
Built data pipelines in Azure Data Factory (ADF) and PySpark to ingest, cleanse, and transform data from Teradata, SQL Server, Oracle, and DB2 into Azure SQL DW/Synapse.
Developed a scalable Spark-based ETL framework leveraging Spark Data Sources, DataFrames, and Hive objects to migrate datasets from RDBMS systems into Azure Data Lake.
Optimized Spark jobs by converting RDD DataFrames, caching, and partitioning, reducing batch runtime and improving streaming throughput.
Used Synapse Spark pools for big data transformations, blending batch and streaming data in a unified platform.
Deployed Python and Scala Spark SQL scripts to perform complex aggregations and enrichments across high-volume transactional datasets.
Integrated Informatica PowerExchange with IICS to process condensed files and load into Azure SQL DW.
Leveraged IICS event-driven triggers and dynamic mappings to automate data flows, reducing manual overhead and improving SLA adherence.
Used Informatica DQ features to enforce quality rules, lineage, and metadata management, ensuring clean and governed data pipelines.
Designed and executed SQL/T-SQL stored procedures, triggers, and UDFs to optimize query performance in Azure SQL and Synapse environments.
Utilized NoSQL stores (MongoDB, Cassandra) for high-velocity transactions and semi/unstructured data, enabling flexible schemas for operational use cases.
Automated database imports/exports with SSIS/DTS to integrate legacy systems into the cloud pipeline.
Delivered BI dashboards in Tableau, Power BI, and SSRS, providing business stakeholders with real-time KPI monitoring and predictive trend analysis.
Implemented CI/CD practices with Terraform + Git, enabling repeatable infrastructure deployments and configuration automation.
Environment: Azure Analytic Services, Azure Data Factory, Synapse Spark Pools, Azure SQL DW, Informatica IICS, Informatica PowerExchange, Python, Scala, MongoDB, Cassandra, SQL Server, Oracle, SSIS, Power BI

Capital One, NY Mar 2016 - Mar 2018
Big Data Engineer
Monitored and managed Hadoop clusters on Hortonworks (HDP), ensuring stability, scalability, and optimal performance.
Designed and developed Spark applications using RDDs, DataFrames, Datasets, Spark SQL to process data from RDBMS and streaming sources.
Migrated iterative MapReduce jobs into Spark transformations, improving performance and reducing runtimes.
Integrated RDBMS data (Oracle, MySQL, SQL Server) into Hadoop using Sqoop, exporting results back for downstream consumption.
Developed PySpark ETL pipelines & used Python scripting for database updates, cleansing, and data manipulation.
Leveraged advanced Python libraries (NumPy, SciPy) for numerical computations and scientific analysis inside data workflows.
Optimized HDFS storage with compression techniques and partitioning to maximize efficiency.
Designed data storage solutions using Hive, HBase, and ElasticSearch for analytics and fast search capabilities.
Automated ingestion and backups with AWS CLI + Lambda, archiving data from Hadoop into S3, EBS, and AMIs.
Managed AWS infrastructure with EC2, ELB, CloudWatch, VPC, IAM, and CloudFront for scalable, secure deployments.
Provisioned high-availability clusters on EC2 with auto-scaling to support Hadoop/Spark workloads.
Implemented role-based security and governance using AWS IAM policies with MFA and least privilege access.
Collaborated with cross-functional teams in Agile sprints, conducting stand-ups, sprint reviews, and knowledge-sharing sessions with offshore teams.
Environment: Hadoop (HDP), Spark (RDD, DataFrames, SQL), Hive, Sqoop, HBase, Kafka, Storm, ElasticSearch, Python, MySQL, SQL Server, Oracle, AWS EC2, S3, Lambda, ELB, CloudWatch, IAM, VPC , Jenkins, Agile.

Kroger, OH Aug 2014 - Feb 2016
Big Data Engineer
Developed Apache Spark applications in Scala and Python to process large-scale data from RDBMS and streaming sources, leveraging RDDs, DataFrames, and Spark SQL for efficient transformations.
Built data ingestion pipelines using Sqoop for RDBMS integration and Flume for log/behavioral data, ensuring scalable ingestion into Hadoop.
Designed Hive tables (external & managed) and automated ingestion/repairs using shell scripts, enabling faster query performance for analytics teams.
Optimized MapReduce jobs with techniques like partitioning, combiners, distributed cache, and bucketing strategies, reducing query execution time.
Designed and developed dimensional data models using Erwin, structuring claims and transaction data for efficient reporting in Hadoop.
Utilized Oozie workflows and Control-M scheduling to automate ETL processes, reducing manual intervention and increasing pipeline reliability.
Leveraged Zookeeper for distributed synchronization, configuration management, and metadata handling across Hadoop ecosystem components.
Built and managed ETL pipelines capable of processing ~450 GB/day, optimizing Spark jobs with in-memory caching and broadcast joins.
Migrated data between UNIX file systems and HDFS, creating reusable scripts for bulk loading, validation, and monitoring of data pipelines.
Utilized Git for version control (migrating from SVN), managing code repositories and enabling collaborative development in Agile sprints.
Environment: Hadoop, HDFS, Spark (RDD, DataFrame), Hive, Pig, Sqoop, Flume, Kafka, Oozie, Zookeeper, MapReduce, HBase, Cassandra, Python, Scala, Shell scripting, Erwin Data Modeler, SSIS, Tableau, Control-M, Git.

FIN Infocom Pvt Ltd, India Jun 2013 - Jul 2014
Data Warehouse Engineeer
Performed data analysis of source and target systems with a strong understanding of data warehousing concepts including staging, dimensions, facts, star schema, and snowflake schema.
Built diverse OLAP Cubes and Dimensions in SSAS, writing MDX scripts to enhance cube functionality for advanced BI analytics.
Developed and deployed SSIS packages for ETL automation, scheduling jobs, and optimizing performance for large-volume data loads.
Automated report generation and cube refresh processes using SSIS jobs and SSRS reports, ensuring timely and accurate delivery of KPIs to business users.
Worked on ETL processes using PL/SQL procedures, triggers, and functions for data transformation and validation.
Utilized Informatica Workflow Manager to configure and manage complex ETL workflows across multiple environments.
Loaded and transformed data from diverse sources including flat files, XML, Oracle, and DB2 into staging and data warehouse layers.
Developed slowly changing dimensions (SCD Type 1/2) in Informatica to maintain accurate historical data.
Engaged in production deployments, debugging ETL failures, and providing L2 support for data integration issues.
Environment: Informatica PowerCenter, Oracle 11g, SQL Server 2000/2008, SSIS, SSRS, SSAS, DB2, PL/SQL, XML, Flat Files, Tableau, Visual SourceSafe, Agile Scrum.

EDUCATION:

Bachelor s in Computer Science, GITAM University May 2013

CERTIFICATIONS:

Databricks Certified Data Engineer Associate Certification
Microsoft Certified: Fabric Data Engineer Associate
Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree database active directory microsoft mississippi procedural language Idaho New York Ohio Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];7364
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: