| Suman Lamichhane - GCP- Data Engineer |
| [email protected] |
| Location: Queens, New York, USA |
| Relocation: Any where |
| Visa: |
| Resume file: suman_lamichhane_resume_1767743321210.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
|
Suman Lamichhane
Email: [email protected] Ph: 718-878-8875 GitHub: https://github.com/sumanlamichhane984 PROFESSIONAL SUMMARY Results-driven Senior Data Engineer with 5+ years of experience architecting and operationalizing cloud-native data platforms, streaming pipelines, and LakeHouse solutions across finance, healthcare, and telecom domains. Proven expertise in building scalable ETL/ELT workflows using Apache Spark, Flink, Kafka, Airflow, and dbt, supporting highthroughput batch and real-time analytics. Hands-on with multi-cloud ecosystems (AWS, Azure, GCP) and services like Redshift, Snowflake, EMR, S3, Glue, Synapse, and BigQuery. Experienced in Kubernetes-based deployments, IaC with Terraform, and data governance using AWS Lake Formation, Azure Purview, and Great Expectations. Adept at implementing microservice architectures, API-driven ingestion, and PoC-based iteration using Agile. Skilled in Python, SQL, and Shell scripting with a strong focus on data quality, lineage, and compliance (HIPAA, GDPR, SOX). Technical skills Operating Systems: Linux, Unix, Ubuntu, Windows, macOS Cloud Platforms: AWS (EC2, S3, Glue, Lambda, EMR, Redshift, Athena, Kinesis, MSK, CloudFormation, Secrets Manager, QuickSight, Lake Formation), Azure (ADF, Synapse, Databricks, ADLS, Functions, Monitor, Key Vault, Purview, Azure SQL, Logic Apps, API Management, AKS, Bicep), GCP (BigQuery, Dataflow, Pub/Sub, Cloud SQL, Composer, Deployment Manager, Cloud Storage, Dataproc) Big Data & Streaming: Apache Spark, PySpark, Hadoop, Hive, Pig, HDFS, MapReduce, Kafka, Kafka Connect, Flume, NiFi, Delta Lake, Zookeeper, Storm, Impala, Sqoop, Apache Flink, Apache Hudi, Apache Iceberg ETL & Orchestration: dbt, Informatica, SSIS, Airflow, Azure Data Factory, AWS Glue, GCP Composer, Oozie Programming & Scripting: Python, Scala, SQL, PL/SQL, Java, Bash, Shell Scripting, HiveQL, SAS, KQL, UNIX Machine Learning & AI: MLflow, TensorFlow, PyTorch, Spark MLlib, Scikit-learn, XGBoost, CatBoost, LightGBM, Decision Trees, Random Forest, SVM, Na ve Bayes, PCA, LDA, K-Means, KNN, Logistic/Linear Regression, CNN, RNN, LSTM, GRU Databases & Storage: Snowflake, Redshift, Synapse, BigQuery, Azure SQL, SQL Server, MySQL, PostgreSQL, Oracle, DB2, Teradata, MongoDB, Cassandra, DynamoDB, Cosmos DB, Elasticsearch Data Quality & Testing: Great Expectations, dbt, pytest, data profiling, schema validation, threshold checks DevOps, IaC & CI/CD: GitHub Actions, Jenkins, Azure DevOps, GitLab, Docker, Kubernetes (Cluster Admin, Helm), Terraform, Maven, AWS CloudFormation, Azure Bicep, GCP Deployment Manager, Prometheus Metadata, Governance & Security: Apache Atlas, Azure Purview, AWS Lake Formation, Alation, Collibra, Looker, Data Catalog, RBAC, PII Masking, Data Lineage, GDPR, SOC 2, HIPAA, PCI DSS, SOX Visualization & Monitoring: Power BI, Tableau, Looker, Amazon QuickSight, Kibana, Grafana, Azure Monitor, Elasticsearch, CloudWatch, Google Data Studio, SAP Project Management & Compliance: Agile (Scrum, Kanban), JIRA, Confluence, ServiceNow Education Master s in business administration (MBA), Business Analytics: The University of Findlay, Findlay/OH. PROFESSIONAL WORK EXPERIENCE JP Morgan Chase Bank New York, NY Senior Data Engineer | Aug 2022 Present Designed and scaled product analytics and telemetry data platforms on AWS to support deep analysis of high-volume user interaction and event-level datasets. Built end-to-end data ingestion and transformation pipelines using Python, SQL, PySpark, and Databricks, enabling advanced product analytics and player behavior modeling. Engineered event-driven data models optimized for retention analysis, feature adoption tracking, and user lifecycle analytics. Processed large-scale semi-structured telemetry data to uncover usage patterns, player relationships, and behavioral trends across digital platforms. Enabled experimentation frameworks by delivering analytics-ready datasets for A/B testing, hypothesis testing, and impact measurement. Developed scalable time-series analytics pipelines supporting engagement metrics, churn analysis, and longitudinal player behavior studies. Partnered closely with Product, Engineering, and Analytics teams to translate ambiguous product questions into measurable data science solutions. Supported machine learning workflows by building feature engineering pipelines for classification, clustering, and anomaly detection models. Implemented data quality validation frameworks using Great Expectations, ensuring accuracy and reliability of analytical insights. Optimized complex Snowflake and Redshift queries to support high-performance analytical workloads on large telemetry datasets. Automated analytics workflows using dbt, enforcing version control, testing, and documentation best practices. Integrated streaming data architectures using Apache Kafka and Spark Structured Streaming to deliver near real-time insights. Implemented governance, RBAC, PII masking, and audit readiness using AWS Lake Formation across analytical datasets. Designed medallion architecture layers to separate raw telemetry, refined analytics, and model-consumption datasets. Built Power BI and QuickSight semantic layers to support self-service analytics for non-technical stakeholders. Automated infrastructure provisioning using Terraform, improving scalability and reliability of analytics environments. Monitored pipelines using CloudWatch, enabling proactive incident response and performance optimization. Authored detailed technical documentation explaining data models, analytics logic, and modeling assumptions. Actively contributed to architecture reviews, sprint planning, and backlog grooming within Agile teams. Mentored junior engineers on analytics engineering, product data modeling, and best practices for large-scale telemetry analytics. CareFair Healthcare Columbus, OH Data Engineer | May 2020 July 2022 Built cloud-native product analytics platforms in Azure, enabling deep insights into user interactions and digital product performance. Designed scalable ETL/ELT pipelines using Azure Data Factory, PySpark, and SQL to process large, diverse event datasets. Modeled analytics-ready schemas in Snowflake and Azure Synapse Analytics to support funnel analysis, cohort analysis, and engagement metrics. Analyzed behavioral and operational datasets to identify usage trends, drop-off points, and opportunities to improve user experience. Partnered with Product, Design, and Operations teams to define core product KPIs and success metrics. Enabled experiment design and analysis by delivering clean, consistent datasets for feature testing and rollout evaluation. Built reusable data models supporting regression analysis, classification, and trend forecasting. Implemented data validation and anomaly detection checks using Great Expectations, improving confidence in analytical outputs. Developed event-driven ingestion pipelines using Kafka and Azure Event Hubs for near real-time analytics. Enforced HIPAA, GDPR, and enterprise data governance standards using Azure Purview, RBAC, and encryption controls. Automated transformations and metric definitions using dbt, ensuring consistency across reports and analyses. Delivered interactive Power BI dashboards used by leadership to monitor product performance and user engagement. Migrated legacy analytics workloads into Azure Data Lake, improving scalability and performance. Supported machine learning initiatives by delivering curated historical datasets for training and validation. Authored comprehensive technical documentation covering pipelines, analytics logic, and compliance requirements. Collaborated in cross-functional Agile ceremonies, including architecture reviews and sprint planning. Reduced manual reporting through data automation and standardized analytics workflows. Enabled self-service analytics by delivering well-documented, analytics-ready datasets. Supported stakeholder alignment by translating analytical findings into actionable insights. Contributed to long-term analytics platform roadmap planning and modernization initiatives. Keywords: continuous integration continuous deployment artificial intelligence business intelligence sthree database procedural language New York Ohio |