Home

Shashi Mamidibathula - Senior Data Engineer
[email protected]
Location: Richardson, Texas, USA
Relocation:
Visa: US CITIZEN
Resume file: Shashi_Mamidibathula_1767641286066.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Venkatasasidhar Shashi Mamidibathula Email: [email protected]
LinkedIn: linkedin.com/in/shashi-mamidi Phone: (732) 917-5921
Visa status: US Citizen (with Public Trust security clearance)

Professional Summary
* Senior data engineer with around 19+ years of strong experience working with Big data technologies and various database systems
* Nine plus years of experience working with Big data (Apache spark, python, scala, Apache AirFlow / MWAA, etc.), Cloud, AWS, Hadoop technologies (MapReduce, Hive, Impala, Sqoop, OOZIE, Hue, etc.)
* Extensive experience working with various types of databases Relational, NoSQL, etc., for structured, semi-structured and unstructured data processing
* Expertise in OLTP / OLAP, ETL / ELT, Data pipeline building for Business analytics / intelligence, normalization and de-normalization of data
* Thorough understanding of system / data analysis, design and programming, providing technical and functional services for a wide variety of industries and domains
* Very good understanding and hands-on experience in SSAD, SDLC, Agile and Waterfall methodologies, CI/CD (Git, Jenkins), data modeling, query optimizations and performance tuning
* Proficiency in writing complex shell scripts and scheduling them as cron jobs
* Taken sole responsibility for several Big data projects executed
* Experience in migrating data from Oracle and other databases over to AWS Redshift
* Extensive experience with ERP systems (Oracle Manufacturing / Financials / CRM / HRMS)
* Strong interpersonal skills, good verbal and communication capabilities.

Expertise
Hadoop HDFS, Amazon S3, Redshift
Spark V2/V3, Scala, sbt
Python V2/V3, PySpark
Apache Airflow / MWAA
NoSQL, Cassandra, Hive
RDBMS (Oracle / MySQL, MariaDB)
DataBricks, Delta Lakes
Data Lakes, MLlib
IntelliJ Idea, Notebooks
Oracle Manufacturing / SCM
Oracle Financials / CRM
Unix Shell programming
PL/SQL, SQL
Oracle Forms, Reports, C, Pro*C
Agile & Waterfall methodologies
Data Warehousing, ETL/ELT
Informatica
CI / CD, Git, Jenkins
PostgreSQL
COBOL
Spark Streaming, Kafka

Education
M. S. (Hons) Mathematics Birla Institute of Technology and Science, Pilani, India
Master of Management Studies (MMS) Birla Institute of Technology and Science, Pilani, India

Professional Experience
Sr. AWS Data Engineer (contract) Dec 2023 to Oct 2025
GDIT Inc, Remote, USA
Clients served: Department of Veterans Affairs (VA), USA
* Worked on database analysis design and preparation of ER Diagrams (using Visio) for target tables that store the transformed data
* Analyzed the system for new enhancements/functionalities and perform Impact analysis of the application for implementing ETL changes
* Developed stored procedures in Redshift to handle full and incremental loads of dimension and fact tables
* Performance tuning of ETL pipelines identify / alter the partition keys, database restructuring, etc.
* Transforming data using Python, PySpark, AWS Lambda, and AWS Glue
* In Databricks, created ad-hoc queries in SQL and pySpark scripts to share with data scientists as notebooks
* Provide test cases (in SQL, etc.) to the QA team and coordinate with the QA / test engineers until the deliverables are signed off
* Deliverables were monitored and tracked using Agile methodologies (Jira and Confluence), CI/CD pipelines.

Senior Data Engineer (consultant) March 2023 to Nov 2023
Accenture / Wisemen Consultants, Remote, USA
Clients served: Entergy Corporation, LA, USA
* The project was to migrate data (50+ TB) and programs from Hadoop / HDFS to AWS infrastructure. Worked as an individual contributor primarily focusing on Autosys to Airflow migration.
* Successfully generated several hundred complex MWAA Airflow DAGs using data from legacy Autosys (JILs) and Informatica workflows, using Python, Jinja-2 templates, pandas, numpy, etc.
* Wrote PySpark programs to create execution / exception reports; PySpark scripts are executed via DAGs on EMR / EKS virtual clusters.
* In the initial stages of the migration project (Hadoop to AWS), participated in AWS DataSync transfer of data from on-prem to AWS S3 buckets.
* Supported the data scientists with required SQL / python / pySpark scripts within Databricks platform
* Written several AWS lambda functions to perform certain specific activities for the project.
* Extensively used GitHub, CI/CD pipelines, etc. to move code across environments.
* Gave several KT / training sessions in Airflow and AWS related services to other team members
* Used Jupyter Notebooks for python / pandas development and for proofs of concept.
* Agile (Jira and Confluence) methodologies were used for tracking the project progress.

Senior Data Engineer May 2022 to Feb 2023
ClearScale, LLC San Francisco, CA, USA
Clients served: (1) Experian, CA, USA (2) ReTool, CA, USA
* Migrated several PySpark, Scala and Java programs from Cloudera Hadoop to Amazon AWS EMR; using Maven, programs were recompiled with the applicable AWS JARs
* Built data pipelines to perform ETL operations on CSV datafiles; part of the data was stored sent to SOLR for searches
* Written a few python (AWS lambda and step functions) and PySpark programs for data manipulations and cleansing
* Orchestrated execution of programs in AWS EMR using zookeeper and oozie workflows; created and modified shell scripts to submit spark related scripts
* Migrated large data from Azure PostgreSQL to AWS PostgreSQL and Aurora using pg_dump, pg_restore and pg_logical and other SQL scripts.

Senior Data Engineer Jun 2021 to Apr 2022
BroadRidge Financial Solutions / IntelliBus, Newark, NJ, USA
* Built data pipelines to receive data from our customers (in CSV file format), convert into Parquet format, perform ETL processing, load into Redshift and generate CSV files on S3, to be consumed by upstream systems
* Extensively used Redshift to create internal and external tables, RDS (with SQL Server / PostgreSQL); created several SQL scripts / stored procedures to move data across systems; Used AWS glue for storing schemas, etc.
* Developed a data pipeline to get data from Kafka streams to S3 and then over to DynamoDB (NoSQL). "Amazon S3 Sink Connector" was used to facilitate this.
* Created and modified step functions / lambda functions using Python. Used libraries such as NumPy, Pandas, boto3, etc.
* Leveraged several EC2 instances for various activities; written / modified shell scripts to automate executions of file transfers and batch runs
* Prepared a POC to evaluate Apache Iceberg with PySpark programs to improve performance and data storage requirements

Senior Data Engineer Apr 2020 to May 2021
PNC Financial Services / Medigrity Innovations, PA, USA
* Designed, developed, implemented and tuned complex data pipelines for large-scale distributed systems for a large US based financial services organization, using Apache Spark, Scala on Amazon AWS (using S3, EMR, etc.)
* Performed data analysis and developed analytic solutions. Data investigation to discover correlations / trends and the ability to explain them.
* Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables - visualization)
* Prepared data pipelines for facilitating the tasks of new customer acquisition and personalized marketing for existing customers of our client. This data helped the client in improving their campaign management by 20%, by efficiently analyzing their existing customer data
* Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all cleaning and conforming tasks. Migrate data from on-prem to AWS storage buckets.
* Architect and design serverless application CI/CD by using AWS Serverless (Lambda) application model.

Sr. Data Engineer Oct 2015 to Mar 2020
Medigrity Innovations, Hyderabad, India
Clients served: ITC Ltd, India, ICICI Lombard, India, SurgiKart.com, India
* For a large consumer goods manufacturing conglomerate in India, created data pipelines (for data requested by data scientists) to augment demand forecasting, optimal pricing of end products, packaging parameters, and effective resource utilization of manufacturing processes
* Migrated data warehouse data from Oracle RDBMS over to Amazon RedShift cloud data warehouse, using Amazon S3, CSV files and schema design and creation at the destination. This project also involved fine tuning Oracle table structures, cleaning up unwanted and redundant data before sending the schema and table data over to AWS
* Using Kafka and Spark streaming, scalable fault tolerant live data streams were implemented for eCommerce site with heavy traffic; Also, worked on Recommendation Systems with Machine Learning (ALS) and MLlib; Apache Hadoop, HDFS and Spark with Scala were used in this project
* Designed and built various data pipelines for intelligently utilizing health insurance data for a large health insurance provider in India; on-prem Hadoop clusters were used in this project
* Automated the data processing with Oozie to automate data loading into the Hadoop Distributed File System
* Used Sqoop to channel data from different sources of HDFS and RDBMS
* Developed Spark applications using PySpark and Spark-SQL for data extraction, transformation and aggregation from multiple file formats.
* Expertise in using Docker to run and deploy the applications in multiple containers like Docker Swarm and Docker Wave
* Worked on preparing a PoC for evaluating Pentaho vs. Informatica as the tool to build data warehouse
* Many of the projects were hosted on Databricks platform on AWS

Technical Manager / Service Delivery Manager Aug 2009 to Jun 2011
Apps Associates Private Limited, Hyderabad, India
* As a technical manager, supervised analysis, design and implementation of technical projects for various clients in USA, in the areas of Oracle EBS Applications (R11/ R12) implementations, Oracle Business Intelligence / OBIEE, Data Warehousing, etc.
* All projects were in Oracle EBS Applications (R12 / R11) and Oracle Business Intelligence (BI) space, database performance tuning, creating and presenting Proofs of Concept (POC), etc.
* Achieved close to 100% customer satisfaction in addressing & resolving issues / pending tasks, analyzing and mitigating risks

Sr. Oracle EBS Applications Consultant Oct 2001 to May 2009
Self Employed, Massachusetts, USA
Clients served:
Accellent, Inc., Wilmington, MA, USA
Caliper Life Sciences, Hopkinton, MA, USA
Ametek Aerospace & Defense, Wilmington, MA, USA
Mercury Computer Systems, Chelmsford, MA, USA
Sycamore Networks, Inc., Chelmsford, MA, USA
NetScout, Inc., Westford, MA, USA
Staples, Inc., Framingham, MA, USA
Hollingsworth & Vose, East Walpole, MA, USA
MKS Instruments, Andover, MA, USA
Sycamore Networks, Inc. Chelmsford, MA, USA
* Implemented and customized various Oracle ERP modules (SCM / Financials / HRMS / CRM) spanning multiple versions
* Performed data mapping and conversion from legacy systems into Oracle Applications, reports and workflow customizations, design and development of custom modules. Data conversions were done with 99.999% accuracy
* Interfaced external applications / systems, with various Oracle EBS modules; was instrumental in gathering requirements from stakeholders / end-users, design, develop and implement solutions with 100% customer satisfaction; Extensively used various Oracle tools - Forms and Reports, Discoverer, XML / BI Publisher, PL/SQL, Unix Shell scripting, Oracle Configurator, Data Warehousing (ETL)
* Prepared various functional and technical documents to capture requirements and come up with an execution plan using available tools and techniques; Performance tuning of existing and newly built modules / functionalities was done to achieve significant program and process improvements

Previous organizatons:
Atos Syntel
Various locations across USA
Sr. Technical consultant
Dec 1996 - Sep 2001
Optimum Technologies
Various locations across USA
Software Engineer
Mar 1995 - Nov 1996

Certifications
AWS Certified Solutions Architect Associate
Amazon Web Services (AWS)
Databricks Certified Data Engineer Associate (# 169255084)
Databricks
Databricks Certified Assoc. Developer for Spark 3.0 - Python
Databricks
Generative AI Fundamentals Academy Accreditation
Databricks
Certified Scrum Master (CSM) (#1331234)
Scrum Alliance
Project Management Professional (PMP) (#315006)
Project Management Institute (PMI)
Certified in Production & Inventory Management (CPIM)
Association of Supply Chain Management
ITIL V4 Foundation Certified
Axelos
Keywords: cprogramm continuous integration continuous deployment quality analyst artificial intelligence business intelligence sthree active directory procedural language bay area California Delaware Louisiana Massachusetts New Jersey Pennsylvania Virginia

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6594
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: