| Manju Latha Looking for Data Engineer Positions - Snowflake Data Engineer |
| [email protected] |
| Location: St Louis, Missouri, USA |
| Relocation: Yes |
| Visa: |
| Resume file: LATHA_Snowflake_Data Engineer_1767380098745.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
LATHA CH
(314) 3752288 | [email protected] Professional Summary: AWS Certified Data Analytics Professional and accomplished Data Engineer with over 5+ years of experience in architecting, developing, and optimizing data pipelines, DBT, ETL processes, and scalable data models across multi-cloud environments (AWS, Azure, GCP). Proven expertise in Snowflake, Python, PySpark, and SQL, delivering high-performance data-driven solutions for analytics, reporting, and AI/ML integration. Skilled in real-time data processing, cloud-native data engineering, and data warehousing with a strong background in Agile methodologies. Results-oriented Data Engineer with hands-on experience designing and implementing scalable data processing pipelines using Apache Spark and Python in distributed environments. Strong expertise in building robust ETL workflows using tools like Alteryx and Apache NiFi, enabling efficient data ingestion, transformation, and integration from diverse sources. Proficient in writing advanced SQL for data extraction, transformation, and performance tuning across large datasets. Experience working with cloud platforms such as AWS and Azure, deploying Spark jobs, managing storage (S3, Blob), and integrating with cloud-native ETL services like Glue or Data Factory. Adept at collaborating with cross-functional teams to translate business requirements into efficient data solutions, ensuring high performance and data quality in production environments. Expertise in Data Engineering, pipeline design, development, and implementation as a Sr. Data Engineer/Developer and Data Modeler. Developed large-scale Snowflake-based data warehouse solutions, leveraging Snowpipe, Streams, and Tasks for real-time ingestion and transformation. Strong understanding of SDLC and Agile/Waterfall methodologies. Automated DBT model deployments with CI/CD pipelines, ensuring version control and quality checks. Strong experience in building ETL pipelines, data warehousing, and data modeling and achieve unprecedented success. Migrated legacy Hadoop & Hive workloads into Snowflake and cloud-native architectures, improving scalability and reducing costs. Experience on Migrating SQL database to Azure Data Lake, Azure SQL Database, Data Integration, Data Bricks and Azure SQL Data warehouse and controlling and granting database access and Migrating On premise databases to Azure Data Lake store using Azure Data factory. Designed and implemented robust, scalable ETL/ELT pipelines using SQL, dbt, Airflow, and PySpark with CI/CD integration. Implemented CI/CD pipelines using Azure DevOps for data engineering solutions. Implemented real-time data ingestion/pipelines using Snow Pipe, Storage integration and ADF. Collaborated with teams to manage, version control, and deploy Alteryx workflows that interact with Snowflake environments. Built real-time analytics pipelines using Kafka Snowflake for event-driven insights. Implemented AI/ML pipelines with Airflow, MLflow, and FastAPI, enabling real-time model deployment and inference. Delivered data visualization and reporting via Power BI, Tableau, and Looker integrated with Snowflake. Scheduled and orchestrated DBT runs using Airflow, Prefect, or other orchestration tools. Tested Python data pipelines using pytest for robustness and accuracy. Skilled in utilizing Matillion's orchestration and transformation components to optimize data pipelines. Deployed RESTful APIs built with Python (Flask or Django) on Tomcat for data integration and analytics TECHNICAL SKILLS: Data Modelling Tools: Erwin Data Modeler, ER Studio v17, Snowflake, dbt (Data Build Tool), Azure Data Factory. Programming Languages: Python, SQL, PL/SQL, Scala, Unix Shell Scripting. Methodologies: System Development Life Cycle (SDLC), Agile (Scrum/Kanban), RAD (Rapid Application Development), JAD (Joint Application Development) Cloud Platforms: Snowflake, AWS, Azure Databricks, ADF, SnowSQL, Snowpark Google Cloud Platform (GCP), Google BigQuery. Databases: Oracle 12c/11g, Teradata R15/R14, PostgreSQL, MongoDB. Data Analytics and Visualization Tools: Tableau, Power BI, SSAS, Business Objects, Looker. ETL / Data Integration Tools: Informatica 9.6/9.1, Apache NiFi, Talend, Apache Airflow, Tableau Prep. Operating Systems: Windows, Unix, Linux (Red Hat/Ubuntu), Sun Solaris, Mac OS. Big Data Tools: Hadoop Ecosystem, Apache Spark (including PySpark), Kafka, HBase, Hive, Apache Flink, Delta Lake. Professional Experience: Data Engineer - Navvis (Healthcare), Remote May 2022 Present Responsibilities: Data Cleansing and transformations using HiveQL and MapReduce for various file formats. Optimized Spark algorithms in Hadoop using Spark-SQL, DataFrames, and YARN for improved performance. Created data visualizations using Tableau, Matplotlib, and Seaborn for insights and reporting. Monitored BigQuery, Dataproc, and Cloud Dataflow jobs via Stackdriver in GCP environments. Integrated DBT with cloud-based warehouses like Snowflake, BigQuery, and Redshift to optimize ELT pipelines. Built end-to-end ML workflows in Python for data preprocessing, feature engineering, modeling, and deployment. Designed and optimized large-scale data processing pipelines using Apache Spark, handling batch and streaming data in distributed environments. Developed Spark applications in PySpark for ETL and data transformation tasks, improving processing time by X% over legacy systems. Tuned Spark jobs for performance by managing partitioning, caching, and cluster resource allocation. Built robust data ingestion and transformation scripts using Python, leveraging libraries like Pandas, NumPy, and PySpark for scalable data workflows. Created custom Python modules to automate repetitive ETL tasks and integrated them with orchestration tools like Airflow or NiFi. Developed error-handling and logging mechanisms in Python to improve observability and reliability in production ETL pipelines. Developed and maintained ETL workflows in Alteryx Designer for data blending, transformation, and preparation across multiple data sources. Implemented Apache NiFi flows for real-time and batch data ingestion, routing, and transformation, enabling seamless integration with cloud and on-premise systems. Collaborated with data analysts to convert business logic into scalable ETL processes using visual and code-based tools. Wrote complex SQL queries and stored procedures to support data extraction, cleansing, and validation across large relational databases. Performed performance tuning on SQL queries to reduce runtime and improve overall efficiency in reporting and analytics workflows. Deployed Spark jobs and ETL workflows to cloud platforms (AWS EMR / Azure Databricks), optimizing for cost and scalability. Utilized cloud-native tools like AWS S3, Lambda, Glue or Azure Data Factory for data storage, transformation, and orchestration. Configured IAM roles, storage permissions, and environment variables to ensure secure and efficient data operations in cloud environments. Skilled in Python libraries (Pandas, NumPy) for data manipulation and analysis. Designed Star and Snowflake Data Models for Enterprise Data Warehouse using ERWIN Tuned DBT models for performance by optimizing SQL queries and leveraging Snowflake's capabilities, resulting in faster execution times. Worked on Migration on premise data from IBM Db2 to Snowflake. Created Pipelines in Azure data factory using Linked Services/Datasets/Pipeline/ to Extract, Transform and load data from different sources like Azure SQL, ADLS Gen2, IBM DB2, Snowflake and Azure SQL Data warehouse. Consumed external APIs to extract, transform, and load (ETL) data into data warehouses. Configured Airflow on Kubernetes and Docker for high availability and scalability. Manage Data Storage and processing pipelines in AWS for serving AI and ML services in Production, development and testing using SQL, Spark, Python and AI. Designed and maintained ETL pipelines to process large-scale EHR data using Python and SQL. Managed deployment and scaling of Python-based applications on Apache Tomcat for seamless integration with enterprise systems. Implemented snow pipe for real-time data ingestion/pipelines using S3 and AWS storage integration. Built ETL pipelines using Snowflake for data ingestion from various sources (e.g., S3, Azure Data Lake, GCS) and transformed raw data for downstream applications. Developed data models and transformations using DBT with Python integrations. Data Extraction and transformations from Snowflake, Oracle, DB2, and HDFS using Sqoop for reporting and visualization. Worked on encoding and decoding JSON objects with PySpark to create and modify DataFrames in Apache Spark. Developed Scala applications on Amazon EMR for data ingestion from S3 to SQS queues. Monitored applications using CloudWatch for logs and performance metrics. Designed library for emailing executive reports from Tableau REST API using python, Kubernetes, Git, AWS Code Build, and Airflow. Automated deployment workflows for Tomcat-hosted services using CI/CD tools like Jenkins or GitLab CI. Integrated data into Oracle Data Warehousing solutions for analytics and reporting. Utilized Snowflake's SQL capabilities and Snowpipe to automate data loading from cloud storage, ensuring near real-time data availability. Managed healthcare databases (e.g., PostgreSQL, Snowflake) to support EHR analytics and reporting. Experience working with various file formats such as CSV, Excel, JSON, and Parquet using Pandas. Designed audit logging capabilities within ETL workflows to track business logic applications. Automated ETL pipelines for customer data analysis, improving reporting accuracy by 15%, enabling better decision-making for marketing campaigns." Experience with Docker and Kubernetes for containerized data pipelines. Data Engineer - Synchrony Financial, Hyderabad, India Sep 2019 Jul 2021 Responsibilities: Performed data extraction, transformation, loading, and integration in data warehouse, operational data stores and master data management Developed Spark Applications by using Scala and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources. Data ingestion to one or more Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in Azure Databricks. Created interactive reports and dashboards in Looker, using Snowflake connections to deliver valuable data insights to stakeholders. Automated ETL processes by implementing Airflow DAGs for report generation and data ingestion, reducing manual interventions. Managed Databricks Notebooks, Delta Lake with python, Delta Lake with Spark SQL. Implemented Apache Airflow for authoring, scheduling, and monitoring Data Pipelines Integrated Hadoop with Python libraries such as PyArrow for reading/writing data in HDFS and PySpark for distributed data processing. Install and configure Apache Airflow for S3 bucket and Snowflake data warehouse and created dags to run the Airflow. Experienced in developing MapReduce programs using Apache Hadoop for working with Big Data. Used Pandas for quick prototyping and pre-processing in ETL pipelines. Created a de-normalized BigQuery Schema for analytical and reporting requirements and provide solution and support for GCP data store for BI platforms. Skilled in using NumPy's vectorized operations to optimize performance for large datasets. Produce unit tests for Spark transformations and helper methods. Design data processing pipelines. Leveraged Kubernetes to deploy and scale distributed data processing framework. Involved in data migration to snowflake using AWS S3 buckets. Worked on real-time streaming applications using Java with Apache Flink or Kafka Streams. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Designed and implemented configurable data delivery pipeline for scheduled updates to customer facing data stores built with Python. Designed and implemented RESTful APIs using Flask/Django for seamless data integration. Developed a batch data ingestion pipeline using Sqoop and Hive to ingest, transform and analyze Supply Chain data. Extensively working on Hive queries to load data from various sources like Teradata, DB2, Oracle, Mainframe etc. Ability to handle large datasets using chunking and optimized Pandas functions for performance. Migrating the transformed data to Azure data lake to be consumed by consumers depending on the business need. Implemented a Continuous Delivery pipeline with Docker, and GitHub, Azure. Was involved in setting up of apache airflow service in GCP. Worked on GCP Dataproc, GCS, Cloud functions, BigQuery. Worked on building and maintaining dashboards using the visualization tools like Tableau and Qlikview. Environment: Azure Data factory, U-SQL Azure Data Lake Analytics, Azure SQL, Azure DW, Databricks, GitHub, Docker, Talend Big Data Integration, Snowflake, Oracle, Sql Server, MySQL, No SQL, MongoDB, Hbase, Cassandra, Python- PySpark, Pytest, Pymongo, PyExcel, Matplotlib, NumPy and Pandas. EDUCATION: Masters from Webster University, Webster Groves, USA. Bachelors from JNTU, Kakinada, India. Keywords: continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree procedural language Delaware |