Resume View

Home

Sri Sindhu - Seniore Data Engineer

Location: Iowa City, Iowa, USA

Relocation: Open (515-605-7328)

Visa: H4EAD

Resume file: Sindhu DE_1774537792056.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Sri Ravuru
Senior Data Engineer
Email: [email protected] | Ph #: 515-605-7328
Linkedin: linkedin.com/in/sindhu-r-698430269

PROFESSIONAL SUMMARY
9+ years of expertise in data engineering and data science, with a focus on developing scalable end-to-end ETL/ELT pipelines that include data collecting, ingestion, transformation, modeling, integration, and analytics for structured and unstructured data sources.
Extensive hands-on experience with the Hadoop ecosystem (HDFS, MapReduce, Spark, Scala, Hive, Pig, Sqoop, Flume, Oozie, Impala, HBase, YARN) and real-time data streaming with Kafka, Storm, and Spark Streaming
Extensive expertise creating secure and scalable cloud-native data systems using AWS (EC2, S3, EMR, RDS, Redshift, Glue, Lambda, IAM, CloudWatch, SQS, SNS), Azure (ADF, Data Lake, Databricks), and GCP (Compute Engine, Cloud Storage, Cloud SQL) technologies.
Experience developing batch and real-time data pipelines in PySpark, Spark SQL, Scala, and Python, as well as orchestrating processes in Airflow, NiFi, AWS Step Functions, and Azure Data Factory.
Deep understanding of data warehousing and dimensional modeling (Star Schema, Snowflake Schema), as well as the creation of enterprise data lakes and optimized data marts for analytics and business intelligence reporting.
Practical knowledge with Snowflake (SnowSQL, Snowpipe), Amazon Redshift, and performance tuning via complicated SQL queries, stored procedures, indexing, and query optimization approaches.
Extensive expertise with NoSQL and RDBMS databases such as MongoDB, Cassandra, DynamoDB, MySQL, PostgreSQL, Oracle, and SQL Server, assuring data integrity, migration, and validation.
Used Scikit-learn, TensorFlow, Keras, PyTorch, and SageMaker to create regression, clustering, PCA, SVM, decision trees, and deep learning models for predictive analytics and business insights.
Experience with data preparation, feature engineering, exploratory data analysis (EDA), statistical modeling, and large-scale data transformations with NumPy, Pandas, and PySpark.
Created interactive dashboards and reporting solutions with Tableau, AWS QuickSight, and Data Studio, allowing business stakeholders to gain real-time insights.
Analyzed complex relational datasets in PostgreSQL to identify data patterns, anomalies, and optimization opportunities supporting enterprise analytics initiatives.
Implemented CI/CD and DevOps methods using Git, Jenkins, and cloud automation tools delivered scalable data products with monitoring, logging, and performance optimization.
Strong grasp of Agile processes, as well as outstanding communication skills, for coaching team members and providing enterprise-grade, secure, and high-performance data solutions.

TECHNICAL SKILLS

Big Data Ecosystem HDFS, YARN, Pig, Hive, Kafka, Apache NiFi, Flume, Sqoop, Spark core, Spark SQL, Spark Streaming, HBase, Oozie, Zookeeper.
Languages Python, SQL, R, Scala
Database Oracle, MySQL, MS SQL Server, PostgreSQL, Teradata, Sydata, SAS Studio.
No SQL Database MongoDB, Cassandra
Machine Learning Libraries TensorFlow, Pytorch, Scikit-learn, Keras
ETL Tools Informatica, Datastage, Terraform, AWS Glue
Reporting Tools Tableau, Power BI, Cognos
Version Control Tools Git, Tortoise SVN
Visualization Tools Tableau, Python (matplotlib, seaborn)
IDE/Testing Tools Eclipse, IntelliJ, pycharm, Anaconda, R-studio
DevOps Docker , Kubernetes Container, CI/CD Pipeline

Brown Brothers Harriman Financial | NYC /NY Sept 2023 Present
Senior Data Engineer
Responsibilities:

Created and developed scalable batch and real-time data pipelines for large-scale data intake, transformation, and analytics with Hadoop (HDFS, MapReduce, Hive, Pig, HBase), Spark (PySpark, Spark SQL, Scala), Databricks, and Kafka/Kinesis.
Designed and implemented ETL/ELT pipelines using Azure Data Factory and Databricks to migrate Salesforce data (Sales Cloud, Service Cloud, SFMC) into cloud-based data warehouses and led end-to-end Salesforce data migration projects, including extraction from multiple Salesforce orgs, transformation, and loading into Azure Synapse for analytics and reporting.
Designed, developed, and maintained scalable ETL/ELT pipelines to process large volumes of structured and unstructured data and optimized complex SQL queries for data extraction, transformation, and performance tuning.
Developed and optimized scalable data pipelines using Python and SQL, enabling efficient ingestion, transformation, and processing of large-scale structured and unstructured datasets.
Built robust ETL/ELT pipelines leveraging tools such as Apache Spark, Hadoop, and Airflow, ensuring high data availability, reliability, and performance across distributed systems and implemented real-time data streaming solutions using Apache Kafka, improving data latency and enabling near real-time analytics and decision-making.
Designed and developed scalable ETL/ELT pipelines using PySpark, Python, and SQL to process large volumes of structured and unstructured data and built data lake solutions on Oracle Cloud Infrastructure (OCI), enabling centralized storage, governance, and analytics-ready datasets.
Integrated data from diverse sources including IoT devices, sensors, MES, ERP systems, and relational databases, ensuring seamless data ingestion and consistency and automated data workflows using Apache Airflow, improving pipeline reliability, scheduling, and monitoring.
Leveraged PySpark for distributed data processing, optimizing performance through partitioning, caching, and parallel execution techniques and collaborated closely with data scientists, analysts, and business stakeholders to deliver high-quality datasets and support advanced analytics and ML use cases.
Implemented data quality checks, validation rules, and governance frameworks, ensuring accuracy, consistency, and compliance across data pipelines and optimized data processing performance by tuning queries, improving job efficiency, and reducing data latency and infrastructure costs.
Developed and maintained CI/CD pipelines for data engineering workflows, enabling automated deployment, testing, and version control and utilized Linux and shell scripting to automate batch processing, job scheduling, and system-level operations.
Worked with big data ecosystem tools including Kafka, Spark, Hadoop, and Databricks to build robust, scalable data solutions and guided offshore/junior team members, enforcing coding standards, best practices, and conducting code reviews.
Engineered data processing workflows in Spark (PySpark) to handle large datasets, optimizing performance through partitioning, caching, and parallel processing techniques and collaborated with data science teams to develop and integrate machine learning models using frameworks such as TensorFlow, PyTorch, and Scikit-learn, supporting end-to-end ML lifecycle.
Ensured data quality, validation, and consistency through automated checks and reconciliation processes and collaborated with cross-functional teams to gather requirements and deliver data-driven solutions.
Built and supported data warehouses and data lakes for enterprise analytics and implemented CI/CD pipelines for data workflows to ensure efficient deployment and version control.
Designed, developed, and maintained end-to-end data solutions to support advanced analytics initiatives, enabling data-driven decision making for multiple business domains and built scalable data pipelines and ETL/ELT workflows using Python, Java, SQL, and Scala to process large volumes of structured and unstructured data from multiple sources.
Created AI/ML solutions with Python, Pandas, NumPy, Scikit-learn, TensorFlow, Keras, and NLTK, including regression, clustering (K-Means, Gaussian Mixture), Random Forest, KNN, time-series forecasting, and real-time prediction models (buy probability, customer segmentation).
Proficient in working with MongoDB, Cassandra, and DynamoDB, as well as creating NoSQL data models for high-volume applications, data migration, API integration, and assuring security and governance compliance.
Affinity Bank | Atlanta, GA June 2021 Aug 2023
Senior Data Engineer
Responsibilities:

Developed scalable batch and real-time data pipelines with Spark (PySpark/Scala/Spark SQL), Kafka, Flink, Kinesis, and Apache Iceberg, allowing for fault-tolerant streaming and large-scale data processing across cloud platforms.
Created and maintained technical documentation, including architecture diagrams, data flow designs, and code documentation and collaborated with stakeholders and development teams using tools like Jira and Confluence for tracking and knowledge sharing
Optimized query performance by analyzing execution plans, indexing strategies, and partitioning techniques and supported BI reporting solutions using PeopleSoft data to track KPIs and business performance.
Developed high-performance data processing applications using Scala and Apache Spark, handling large-scale structured and unstructured datasets and Implemented functional programming patterns and immutable data structures for scalable and maintainable code.
Leveraged SQL and modern analytics platforms (Databricks, cloud data lakes) to extract, validate, and integrate data for downstream analytics and reporting.
Ensured adherence to data governance, security, and compliance standards using IAM roles, encryption, and access controls and documented ETL processes, data flow diagrams, and pipeline configurations for maintainability and knowledge sharing.
Built and optimized ETL/ELT pipelines using tools like Informatica, Talend, Azure Data Factory (ADF), or Apache Spark for seamless data ingestion from Salesforce and modeled and transformed Salesforce data into analytics-ready schemas (star/snowflake) for reporting and BI tools such as Power BI or Tableau.
Built and optimized data models in Snowflake and BigQuery, leveraging clustering, partitioning, and materialized views for high-performance analytics and cost-efficient storage.
Built and maintained web applications using Django, designing modular, developer-centric solutions with robust backend functionality. Designed, developed, and integrated REST and GraphQL APIs to enable seamless communication between microservices, third-party applications, and client systems.
Set up and managed Databricks clusters, refactored ETL notebooks, and developed PySpark transformations, including hashing and encryption approaches for sensitive data compliance and governance.
Developed and integrated RESTful APIs for data ingestion and system integration, enabling real-time data exchange between applications and implemented CI/CD pipelines for data engineering workflows using DevOps practices to automate code deployment, testing, and environment management.
Streamlined microservice and containerized data application deployments by automating CI/CD pipelines with Jenkins, GitHub, Docker, Kubernetes, and integrated DevOps tools (Jira, Slack).
Designed and built REST APIs in Python (Flask), PostgreSQL, and MongoDB to provide high-performance data access layers and NoSQL solutions for scalable applications.
Used extensive data modeling, mapping, and transformation techniques, including SQL validation checks (duplicates, null handling, and aggregations), to ensure good data quality across ETL cycles.
Created AI/ML solutions utilizing Python, Pandas, NumPy, and Scikit-learn, including regression models, forecasting, customer segmentation, and real-time purchase probability prediction models for business intelligence.
Created interactive BI dashboards with Tableau and Qlik, combining numerous data sources (SQL, Salesforce, APIs, and Excel) to allow executive reporting and decision-making.
Kin Insurance Company | Chicago, IL Nov 2018 May 2021
Data Engineer
Responsibilities:
Migrated data pipelines from legacy systems to AWS Snowflake using DBT, Glue, and Python.
Built scalable ELT processes on Redshift, enhancing load and query efficiency.
Migrated on-prem data ingestion workflows to ADF and Snowflake, improving performance and maintainability.
Automated data extraction and ingestion from relational databases and flat files using ADF and Python scripts.
Developed and maintained data ingestion frameworks integrating AWS S3 and Snowflake.
Ensured data security, compliance, and governance in cloud deployments and API integrations.
Mentored junior engineers on Python development, Django frameworks, API best practices, and cloud deployment strategies.
Enhanced data ingestion monitoring and alerting with ADF triggers and log analytics for operational visibility.
Implemented real-time stream ingestion using Kafka and Spark Streaming for transactional data feeds.
Collaborated with cross-functional teams to integrate cloud-based applications, data pipelines, and APIs, ensuring consistency, reliability, and adherence to best practices.
Worked on Spark SQL and Scala to replace legacy Hive queries, improving performance.
Developed One Lake ingestion workflows using AWS Lambda and S3 for Capital One data lake.
Built and scheduled Control-M workflows for data orchestration and batch processing.
Integrated SSIS with Azure Data Factory for hybrid ETL workflows in cloud-native environments.
Implemented real-time stream processing using Apache Spark Structured Streaming with integration to Apache Kafka and AWS S3.
Utilized Jira and Confluence in Agile-based sprint planning and documentation.
Engineered Delta Lake structures on Databricks and enabled incremental loads using ADF.
Conducted performance tuning for Spark and Snowflake jobs, reducing processing times.
Built reusable PySpark modules and unit-tested data transformations.
Wrote scalable ETL jobs using Apache Spark in PySpark and Scala to handle structured and semi-structured data (JSON, Parquet, Avro).
Collaborated with business analysts to convert business logic into efficient ETL workflows.
Handled JSON/XML data parsing and transformation for ingestion into Redshift
New York Life Insurance Company | NYC, NY May 2017 Oct 2018
ETL Developer
Responsibilities:

Designed and developed ETL processes using Informatica and SQL for financial data systems.
Performed data extraction, transformation, and loading from multiple relational sources.
Developed star and snowflake schema models to support BI reporting and analytics.
Implemented job scheduling and monitoring using Control-M and Autosys.
Developed complex SQL queries and stored procedures for data transformation logic.
Ensured data consistency and accuracy through automated validation scripts.
Prepared detailed technical documentation for ETL design, mapping, and testing.
Implemented reusable mappings and workflows to optimize development time.
Collaborated with QA and UAT teams for validation and deployment support.
Monitored ETL workflows for failures and executed root cause analysis for incident resolution.
Enhanced performance through tuning of database queries and indexes.
Automated report generation for daily, weekly, and monthly data quality summaries.
Worked closely with business users to define and refine ETL specifications.
Supported production deployments and resolved post-implementation issues.
Contributed to migration of legacy ETL systems to Informatica PowerCenter

Educational Details:
Master of Business Administration MBA, National Institute of Technology Warangal -2011
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning business intelligence sthree rlang microsoft mississippi Georgia Illinois New York

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];7032

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: