| Shashi Mamidibathula - Senior Data Engineer |
| [email protected] |
| Location: Richardson, Texas, USA |
| Relocation: |
| Visa: US CITIZEN |
| Resume file: Shashi_Mamidibathula_1767641286066.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Venkatasasidhar Shashi Mamidibathula Email: [email protected]
LinkedIn: linkedin.com/in/shashi-mamidi Phone: (732) 917-5921 Visa status: US Citizen (with Public Trust security clearance) Professional Summary * Senior data engineer with around 19+ years of strong experience working with Big data technologies and various database systems * Nine plus years of experience working with Big data (Apache spark, python, scala, Apache AirFlow / MWAA, etc.), Cloud, AWS, Hadoop technologies (MapReduce, Hive, Impala, Sqoop, OOZIE, Hue, etc.) * Extensive experience working with various types of databases Relational, NoSQL, etc., for structured, semi-structured and unstructured data processing * Expertise in OLTP / OLAP, ETL / ELT, Data pipeline building for Business analytics / intelligence, normalization and de-normalization of data * Thorough understanding of system / data analysis, design and programming, providing technical and functional services for a wide variety of industries and domains * Very good understanding and hands-on experience in SSAD, SDLC, Agile and Waterfall methodologies, CI/CD (Git, Jenkins), data modeling, query optimizations and performance tuning * Proficiency in writing complex shell scripts and scheduling them as cron jobs * Taken sole responsibility for several Big data projects executed * Experience in migrating data from Oracle and other databases over to AWS Redshift * Extensive experience with ERP systems (Oracle Manufacturing / Financials / CRM / HRMS) * Strong interpersonal skills, good verbal and communication capabilities. Expertise Hadoop HDFS, Amazon S3, Redshift Spark V2/V3, Scala, sbt Python V2/V3, PySpark Apache Airflow / MWAA NoSQL, Cassandra, Hive RDBMS (Oracle / MySQL, MariaDB) DataBricks, Delta Lakes Data Lakes, MLlib IntelliJ Idea, Notebooks Oracle Manufacturing / SCM Oracle Financials / CRM Unix Shell programming PL/SQL, SQL Oracle Forms, Reports, C, Pro*C Agile & Waterfall methodologies Data Warehousing, ETL/ELT Informatica CI / CD, Git, Jenkins PostgreSQL COBOL Spark Streaming, Kafka Education M. S. (Hons) Mathematics Birla Institute of Technology and Science, Pilani, India Master of Management Studies (MMS) Birla Institute of Technology and Science, Pilani, India Professional Experience Sr. AWS Data Engineer (contract) Dec 2023 to Oct 2025 GDIT Inc, Remote, USA Clients served: Department of Veterans Affairs (VA), USA * Worked on database analysis design and preparation of ER Diagrams (using Visio) for target tables that store the transformed data * Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes * Developed stored procedures in Redshift to handle full and incremental loads of dimension and fact tables * Performance tuning of ETL pipelines identify / alter the partition keys, database restructuring, etc. * Transforming data using Python, PySpark, AWS Lambda, and AWS Glue * In Databricks, created ad-hoc queries in SQL and pySpark scripts to share with data scientists as notebooks * Provide test cases (in SQL, etc.) to the QA team and coordinate with the QA / test engineers until the deliverables are signed off * Deliverables were monitored and tracked using Agile methodologies (Jira and Confluence), CI/CD pipelines. Senior Data Engineer (consultant) March 2023 to Nov 2023 Accenture / Wisemen Consultants, Remote, USA Clients served: Entergy Corporation, LA, USA * The project was to migrate data (50+ TB) and programs from Hadoop / HDFS to AWS infrastructure. Worked as an individual contributor primarily focusing on Autosys to Airflow migration. * Successfully generated several hundred complex MWAA Airflow DAGs using data from legacy Autosys (JILs) and Informatica workflows, using Python, Jinja-2 templates, pandas, numpy, etc. * Wrote PySpark programs to create execution / exception reports; PySpark scripts are executed via DAGs on EMR / EKS virtual clusters. * In the initial stages of the migration project (Hadoop to AWS), participated in AWS DataSync transfer of data from on-prem to AWS S3 buckets. * Supported the data scientists with required SQL / python / pySpark scripts within Databricks platform * Written several AWS lambda functions to perform certain specific activities for the project. * Extensively used GitHub, CI/CD pipelines, etc. to move code across environments. * Gave several KT / training sessions in Airflow and AWS related services to other team members * Used Jupyter Notebooks for python / pandas development and for proofs of concept. * Agile (Jira and Confluence) methodologies were used for tracking the project progress. Senior Data Engineer May 2022 to Feb 2023 ClearScale, LLC San Francisco, CA, USA Clients served: (1) Experian, CA, USA (2) ReTool, CA, USA * Migrated several PySpark, Scala and Java programs from Cloudera Hadoop to Amazon AWS EMR; using Maven, programs were recompiled with the applicable AWS JARs * Built data pipelines to perform ETL operations on CSV datafiles; part of the data was stored sent to SOLR for searches * Written a few python (AWS lambda and step functions) and PySpark programs for data manipulations and cleansing * Orchestrated execution of programs in AWS EMR using zookeeper and oozie workflows; created and modified shell scripts to submit spark related scripts * Migrated large data from Azure PostgreSQL to AWS PostgreSQL and Aurora using pg_dump, pg_restore and pg_logical and other SQL scripts. Senior Data Engineer Jun 2021 to Apr 2022 BroadRidge Financial Solutions / IntelliBus, Newark, NJ, USA * Built data pipelines to receive data from our customers (in CSV file format), convert into Parquet format, perform ETL processing, load into Redshift and generate CSV files on S3, to be consumed by upstream systems * Extensively used Redshift to create internal and external tables, RDS (with SQL Server / PostgreSQL); created several SQL scripts / stored procedures to move data across systems; Used AWS glue for storing schemas, etc. * Developed a data pipeline to get data from Kafka streams to S3 and then over to DynamoDB (NoSQL). "Amazon S3 Sink Connector" was used to facilitate this. * Created and modified step functions / lambda functions using Python. Used libraries such as NumPy, Pandas, boto3, etc. * Leveraged several EC2 instances for various activities; written / modified shell scripts to automate executions of file transfers and batch runs * Prepared a POC to evaluate Apache Iceberg with PySpark programs to improve performance and data storage requirements Senior Data Engineer Apr 2020 to May 2021 PNC Financial Services / Medigrity Innovations, PA, USA * Designed, developed, implemented and tuned complex data pipelines for large-scale distributed systems for a large US based financial services organization, using Apache Spark, Scala on Amazon AWS (using S3, EMR, etc.) * Performed data analysis and developed analytic solutions. Data investigation to discover correlations / trends and the ability to explain them. * Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables - visualization) * Prepared data pipelines for facilitating the tasks of new customer acquisition and personalized marketing for existing customers of our client. This data helped the client in improving their campaign management by 20%, by efficiently analyzing their existing customer data * Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all cleaning and conforming tasks. Migrate data from on-prem to AWS storage buckets. * Architect and design serverless application CI/CD by using AWS Serverless (Lambda) application model. Sr. Data Engineer Oct 2015 to Mar 2020 Medigrity Innovations, Hyderabad, India Clients served: ITC Ltd, India, ICICI Lombard, India, SurgiKart.com, India * For a large consumer goods manufacturing conglomerate in India, created data pipelines (for data requested by data scientists) to augment demand forecasting, optimal pricing of end products, packaging parameters, and effective resource utilization of manufacturing processes * Migrated data warehouse data from Oracle RDBMS over to Amazon RedShift cloud data warehouse, using Amazon S3, CSV files and schema design and creation at the destination. This project also involved fine tuning Oracle table structures, cleaning up unwanted and redundant data before sending the schema and table data over to AWS * Using Kafka and Spark streaming, scalable fault tolerant live data streams were implemented for eCommerce site with heavy traffic; Also, worked on Recommendation Systems with Machine Learning (ALS) and MLlib; Apache Hadoop, HDFS and Spark with Scala were used in this project * Designed and built various data pipelines for intelligently utilizing health insurance data for a large health insurance provider in India; on-prem Hadoop clusters were used in this project * Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System * Used Sqoop to channel data from different sources of HDFS and RDBMS * Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats. * Expertise in using Docker to run and deploy the applications in multiple containers like Docker Swarm and Docker Wave * Worked on preparing a PoC for evaluating Pentaho vs. Informatica as the tool to build data warehouse * Many of the projects were hosted on Databricks platform on AWS Technical Manager / Service Delivery Manager Aug 2009 to Jun 2011 Apps Associates Private Limited, Hyderabad, India * As a technical manager, supervised analysis, design and implementation of technical projects for various clients in USA, in the areas of Oracle EBS Applications (R11/ R12) implementations, Oracle Business Intelligence / OBIEE, Data Warehousing, etc. * All projects were in Oracle EBS Applications (R12 / R11) and Oracle Business Intelligence (BI) space, database performance tuning, creating and presenting Proofs of Concept (POC), etc. * Achieved close to 100% customer satisfaction in addressing & resolving issues / pending tasks, analyzing and mitigating risks Sr. Oracle EBS Applications Consultant Oct 2001 to May 2009 Self Employed, Massachusetts, USA Clients served: Accellent, Inc., Wilmington, MA, USA Caliper Life Sciences, Hopkinton, MA, USA Ametek Aerospace & Defense, Wilmington, MA, USA Mercury Computer Systems, Chelmsford, MA, USA Sycamore Networks, Inc., Chelmsford, MA, USA NetScout, Inc., Westford, MA, USA Staples, Inc., Framingham, MA, USA Hollingsworth & Vose, East Walpole, MA, USA MKS Instruments, Andover, MA, USA Sycamore Networks, Inc. Chelmsford, MA, USA * Implemented and customized various Oracle ERP modules (SCM / Financials / HRMS / CRM) spanning multiple versions * Performed data mapping and conversion from legacy systems into Oracle Applications, reports and workflow customizations, design and development of custom modules. Data conversions were done with 99.999% accuracy * Interfaced external applications / systems, with various Oracle EBS modules; was instrumental in gathering requirements from stakeholders / end-users, design, develop and implement solutions with 100% customer satisfaction; Extensively used various Oracle tools - Forms and Reports, Discoverer, XML / BI Publisher, PL/SQL, Unix Shell scripting, Oracle Configurator, Data Warehousing (ETL) * Prepared various functional and technical documents to capture requirements and come up with an execution plan using available tools and techniques; Performance tuning of existing and newly built modules / functionalities was done to achieve significant program and process improvements Previous organizatons: Atos Syntel Various locations across USA Sr. Technical consultant Dec 1996 - Sep 2001 Optimum Technologies Various locations across USA Software Engineer Mar 1995 - Nov 1996 Certifications AWS Certified Solutions Architect Associate Amazon Web Services (AWS) Databricks Certified Data Engineer Associate (# 169255084) Databricks Databricks Certified Assoc. Developer for Spark 3.0 - Python Databricks Generative AI Fundamentals Academy Accreditation Databricks Certified Scrum Master (CSM) (#1331234) Scrum Alliance Project Management Professional (PMP) (#315006) Project Management Institute (PMI) Certified in Production & Inventory Management (CPIM) Association of Supply Chain Management ITIL V4 Foundation Certified Axelos Keywords: cprogramm continuous integration continuous deployment quality analyst artificial intelligence business intelligence sthree active directory procedural language bay area California Delaware Louisiana Massachusetts New Jersey Pennsylvania Virginia |