| Mokesh Balakrishnan - Data Engineer |
| [email protected] |
| Location: Boston, Massachusetts, USA |
| Relocation: Yes |
| Visa: GC |
| Resume file: Mokesh Data Engineer_ Resume (1) (1)_1772057499732.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Mokesh Balakrishnan
Sr.Data Engineer Phone : 5088174241 Email : [email protected] Linkedin : www.linkedin.com/in/mokesh-b-11b46992 Professional Summary Senior Data Engineer with 10+ years of experience designing and implementing enterprise data lakes, modern data warehouses, and large-scale ETL/ELT pipelines across Banking, Financial Services, Insurance, and Healthcare domains. Extensive expertise in building modern Data Lakes and cloud-native Data Warehousing solutions using Microsoft Azure technologies. Strong hands-on experience with Azure Data Factory, Azure Databricks, Azure Synapse Analytics, and Azure Data Lake Storage Gen2. Proficient in Snowflake data modeling, secure data sharing, and performance optimization for large-scale analytics workloads. Advanced knowledge of Apache Spark, PySpark, Spark SQL, and distributed data processing frameworks. Experience developing both batch and real-time streaming data pipelines using Kafka and Spark Streaming. Proven ability to migrate legacy on-premise ETL systems (Oracle, SQL Server, SSIS) to scalable Azure cloud architectures. Skilled in implementing dimensional modeling techniques including Star Schema, Slowly Changing Dimensions (Type 1 & 2), and Change Data Capture (CDC). Designed optimized schemas and transformation logic to support regulatory reporting, risk analysis, and business intelligence dashboards. Strong performance tuning expertise including partitioning strategies, caching, indexing, broadcast joins, and Spark cluster optimization. Built reusable ETL/ELT frameworks to standardize enterprise-wide data ingestion and transformation processes. Hands-on experience integrating structured and semi-structured data from APIs, databases, flat files, and streaming platforms. Added robust data quality validations and governance controls to ensure accuracy, compliance, and secure data handling. Provided production support for mission-critical pipelines, performing root cause analysis and troubleshooting failures. Worked with high-volume transactional, claims, eligibility, and compliance datasets in regulated environments. Collaborated with architects, DevOps engineers, DBAs, and reporting teams in Agile/Scrum environments to deliver scalable solutions. Consistently delivered secure, high-performance, and business-aligned data solutions enabling analytics, compliance reporting, and strategic decision-making. Experience designing medallion architecture (Bronze, Silver, Gold layers) in Azure Data Lake to support scalable analytics pipelines. Strong expertise in handling large-scale structured and semi-structured data including JSON, Parquet, Avro, and CSV formats. Implemented secure data access controls using RBAC, IAM policies, and encryption standards to protect sensitive financial and healthcare data. Experienced in workload management and cost optimization strategies across Azure and Snowflake environments. Designed automated monitoring and alerting mechanisms using Azure Monitor, Log Analytics, and CloudWatch to ensure pipeline reliability. Strong understanding of data governance frameworks including data lineage, metadata management, and compliance auditing. Led enterprise-scale cloud data modernization initiatives, migrating legacy ETL systems to secure and scalable Azure-based architectures. Architected and optimized multi-terabyte data platforms supporting real-time analytics and regulatory reporting requirements. Education Masters of Science in Computer Science University of Maryland, Baltimore County August 2012- December 2013 Bachelors in Electronics and Communication Engineering JNTUK University, Kakinada, India 2008-2012 Technical Skills Azure Services Azure data Factory, Airflow, Azure Data Bricks, Logic Apps, Functional App, Snowflake, Azure DevOps, Azure SQL & Advanced SQL Performance Tuning, Root Cause Analysis (RCA) Big Data Technologies MapReduce, Hive, Python, PySpark, Scala, Kafka, Spark streaming, Oozie, Sqoop, Zookeeper Hadoop Distribution Cloudera, Horton Works Languages Java, SQL, PL/SQL, Python, HiveQL, Scala. Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. Build Automation tools Ant, Maven Version Control GIT, GitHub. IDE & Build Tools, Design Eclipse, Visual Studio. Databases MS SQL Server 2016/2014/2012, Azure SQL DB, Azure Synapse. MS Excel, MS Access, Oracle 11g/12c, Cosmos DB AWS Services S3, Redshift, Glue, Lambda, Athena, EMR, Managed Airflow (MWAA), CloudFormation, EC2, RDS, CloudWatch, IAM, Step Functions Work Experience Azure Data Engineer Jan 2023 - Till Now Citizens BANK - Jonston,RI Responsibilities: Collaborated with business stakeholders, risk teams, and banking applications to design and support enterprise-grade Azure data pipelines for reporting, compliance, and customer analytics. Designed, developed, and maintained scalable ETL/ELT pipelines using Azure Data Factory (ADF) and Azure Databricks, integrating data from on-prem systems (MySQL, Cassandra) and cloud platforms into Azure Data Lake and analytical environments. Worked within Financial Services environment ensuring Regulatory Compliance, Data Accuracy, and Enterprise-Grade Security Standards for sensitive Investment and Portfolio Datasets. Built reliable batch and near real-time ingestion frameworks using Kafka and Spark Streaming to process high-volume transactional datasets. Developed complex transformation logic using PySpark and Spark SQL, implementing data cleansing, enrichment, and standardization before loading into curated data models. Implemented dimensional modeling techniques including SCD Type 1 & Type 2, Surrogate Keys, and Change Data Capture (CDC) to enable accurate historical and regulatory reporting. Wrote and optimized advanced SQL queries, Stored Procedures, and Views to support data validation, reconciliation, and performance-sensitive reporting workloads. Designed and implemented Data Validation Frameworks, including Row-Count Reconciliation, Duplicate Detection, and Anomaly Checks to ensure financial data integrity. Automated workflows using Azure Functions and Azure Logic Apps to integrate APIs and external enterprise systems. Performed Spark Performance Tuning, SQL Query Optimization, and Partition Optimization to reduce processing time and improve pipeline efficiency. Owned Production Support, including Monitoring, Alerting, Incident Management, and structured Root Cause Analysis (RCA) to ensure platform stability and SLA adherence. Implemented CI/CD pipelines using Git, Jenkins, and Azure DevOps to enable version-controlled and traceable deployments across DEV, QA, and PROD environments. Worked within Agile/Scrum methodology, contributing to sprint planning, estimation, and release cycles using JIRA. Partnered with architects, DBAs, DevOps, and reporting teams to deliver secure, compliant, and high-performing Azure Data Solutions aligned with enterprise Data Governance standards. Environment: Microsoft Azure (Azure Data Factory, Azure Databricks, Azure SQL Database, Azure Blob Storage, Azure Data Lake Storage Gen2, Azure Functions, Azure Logic Apps), Snowflake, SQL Server, Oracle, MySQL, Cassandra, Apache Spark, PySpark, Spark SQL, Scala, Kafka, Spark Streaming, Hadoop (HDFS, YARN, MapReduce), Hive, Python, Shell Scripting, CI/CD (Jenkins), Git, JIRA, Power BI, Agile/Scrum, Data Modeling (SCD, CDC), ETL/ELT Pipelines, Data Warehousing, Production Support. Azure Data Engineer Sep 2021 Jan 2023 PNC bank - Pittsburgh,PA Responsibilities: Built and maintained scalable data pipelines in Azure to support banking applications including risk reporting, customer analytics, and regulatory reporting. Improved Spark job performance by implementing partitioning strategies, caching, and broadcast joins to reduce shuffle time and enhance processing efficiency. Designed and implemented data ingestion frameworks to collect data from databases, APIs, and flat files using Azure Data Factory (ADF), Apache Kafka, and Apache NiFi. Migrated on-premise Oracle ETL processes to Azure Synapse Analytics and Azure Data Lake, modernizing legacy workflows into cloud-based architectures. Processed large volumes of structured and semi-structured data in Azure Databricks using PySpark, Spark SQL, and Scala. Built batch processing and real-time data processing solutions using Spark Streaming and Kafka for transaction and event-driven data. Developed Hive tables, partitions, and bucketing strategies to improve query performance, and implemented Hive UDFs to support complex banking business logic. Designed and maintained data models in Hive and Snowflake to support reporting and analytics use cases. Used PolyBase and Azure Synapse to efficiently transfer and load large datasets across Azure services. Managed database access controls and performed secure migration of on-prem databases to Azure Data Lake using Azure Data Factory. Created end-to-end data pipelines using Spark, Azure Data Factory, and Apache Airflow to ensure reliable data orchestration across environments. Deployed applications and data workflows using Azure DevOps, implementing CI/CD pipelines, and automating build and release processes. Provided production support by troubleshooting job failures, resolving data quality issues, and performing performance tuning. Collaborated with Business Analysts, Data Architects, and DevOps teams to deliver scalable, secure, and compliant data solutions aligned with banking security standards. Used JIRA for task tracking and Git for version control in an Agile/Scrum environment. Environment: Microsoft Azure (Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake Storage, Azure SQL Database, Azure Blob Storage, Azure Logic Apps, Azure DevOps), Snowflake, MS SQL Server, Oracle, Apache Spark, PySpark, Spark SQL, Scala, Kafka, Spark Streaming, Apache NiFi, Hadoop (HDFS, YARN, MapReduce), Hive, PolyBase, Python, Shell Scripting, Git, Jenkins, JIRA, Power BI, Agile/Scrum, Data Modeling, ETL/ELT, Production Support. Big Data Engineer Jun 2019 Aug 2021 Humana - Louisville,KY Responsibilities: Helped design and build an Enterprise Data Lake in Azure to support healthcare analytics, claims reporting, and member data processing. Developed data pipelines using Azure Data Factory (ADF) and Azure Databricks to ingest data from claims systems, provider data sources, and internal databases. Processed large volumes of structured and semi-structured healthcare data using PySpark and Spark SQL, loading curated data into Azure Data Lake and Azure Synapse Analytics. Collaborated with business stakeholders and data architects to gather reporting requirements and ensure data accuracy for claims, eligibility, and provider analytics. Performed data cleansing, standardization, and reference data management to maintain consistency across healthcare systems. Created tabular models in Azure Analysis Services (AAS) to provide reliable datasets for dashboards and operational reporting. Built batch processing and near real-time data solutions using Apache Spark to meet timely healthcare reporting requirements. Optimized Spark jobs by tuning partitions, executor configurations, and memory allocation to improve performance for large claims datasets. Implemented data validation checks, data quality rules, and integrity controls within pipelines to ensure completeness of sensitive healthcare data. Developed reusable ETL frameworks to migrate data from on-premise SQL Server and other databases into Azure Data Lake. Wrote Python scripts and Bash scripts to automate data extraction, transformation, logging, and monitoring processes. Configured secure cluster access using Kerberos authentication and followed data security best practices to protect healthcare information (PHI compliance). Migrated legacy SSIS/DTS packages to modern cloud-based data processing solutions in Azure. Delivered curated datasets for SSRS and Tableau dashboards supporting operations and leadership reporting. Provided production support by troubleshooting pipeline failures, resolving data quality issues, and ensuring timely delivery of critical healthcare reports. Environment : Microsoft Azure (Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2, Azure Blob Storage, Azure SQL Database, Azure Synapse Analytics, Azure Analysis Services), Apache Spark, PySpark, Spark SQL, Python, Scala, Hadoop (HDFS, Hive, HBase), Sqoop, SQL Server (SSIS, DTS), Snowflake, SSRS, Tableau, Bash Scripting, Git, JIRA, CI/CD (Azure DevOps), ETL/ELT Pipelines, Data Modeling, Big Data Processing, Production Support. Data Engineer Feb 2017 Jun 2019 Met Life, Jersey City, NJ. Responsibilities: Designed and developed Data Lake applications to transform raw insurance data into analytics-ready datasets for business reporting. Built data ingestion pipelines using Apache Flume and Apache Sqoop to load customer behavior data, policy details, and transaction records into HDFS for distributed processing. Developed MapReduce programs to filter invalid records, clean log data, and convert unstructured files into structured formats for downstream analysis. Worked extensively with the Hadoop ecosystem, including HDFS, NameNode, DataNode, Resource Manager, and MapReduce for distributed data processing. Utilized Hive to perform data transformations, joins, aggregations, and event-based processing, storing curated datasets in HDFS. Created internal and external Hive tables with static and dynamic partitioning to enhance query performance and efficiently manage large datasets. Leveraged Spark SQL to process JSON data, convert it into DataFrames/RDDs, and load structured outputs into Hive tables. Automated batch workflows using Apache Oozie to schedule, orchestrate, and monitor data processing jobs. Configured Fair Scheduler settings to optimize cluster resource utilization across multiple MapReduce jobs.Configured Fair Scheduler settings to optimize cluster resource utilization across multiple MapReduce jobs. Developed Shell scripts to automate end-to-end data movement, synchronization, and job execution between Hadoop clusters. Contributed to CI/CD implementation using Jenkins, Maven, Nexus, and GitHub to streamline build and deployment pipelines. Prepared technical design documents (TDDs) outlining data architecture, processing logic, and performance optimization strategies. Environment: Cloudera CDH 3/4, Hadoop, HDFS, MapReduce, Hive, Apache Spark, Spark SQL, Oozie, Pig, Flume, Sqoop, MySQL, Shell Scripting, Jenkins, Maven, Nexus, GitHub, Linux, Data Lake Architecture, Batch Processing. Data warehouse Developer Oct 2014 Feb 2017 Care Source, Dayton, OH. Responsibilities: Designed and maintained SQL Server databases to support operational reporting, server monitoring, and performance inventory tracking. Developed and maintained ETL processes using SSIS (SQL Server Integration Services) to extract data from multiple source systems, apply business rule transformations, and load into enterprise data marts. Built automated SSIS jobs for scheduled data loads, report generation, and cube refresh processes to ensure timely data availability. Deployed SSIS packages to production environments and configured environment-specific parameters to support seamless migration across DEV, QA, and PROD. Developed OLAP cubes using SSAS (SQL Server Analysis Services) to enable multidimensional reporting and advanced analytical queries. Created drill-down and drill-through reports using SSRS (SQL Server Reporting Services) and Power BI, implementing dynamic filters, sorting, and subtotals for enhanced business insights. Developed stored procedures, user-defined functions (UDFs), and triggers to enforce business logic and maintain data integrity. Collaborated with reporting teams to design scalable data marts supporting downstream analytics and ad-hoc reporting. Built user access tools to enable self-service reporting and direct querying of analytical cubes by business users. Utilized SQL Profiler for query monitoring, performance tuning, and database optimization. Shared curated datasets externally via Snowflake Secure Data Sharing, enabling secure collaboration without additional pipelines. Worked in an Agile Scrum environment, participating in daily stand-ups, sprint planning, and iterative delivery cycles. Environment: Windows Server, MS SQL Server 2014, SSIS, SSAS (OLAP Cubes), SSRS, SQL Profiler, Power BI, C#, PerformancePoint Server, Snowflake, SharePoint, Visual SourceSafe, Trello, MS Office Suite. Keywords: csharp continuous integration continuous deployment quality analyst business intelligence sthree database active directory information technology microsoft mississippi procedural language Kentucky New Jersey Ohio Pennsylvania Rhode Island |