Home

Frank Zhu - Devops/Cloud Engineer
[email protected]
Location: Vancouver, British Columbia, Canada
Relocation: Any
Visa: Any
Resume file: Frank Zhu_1767715286717.pdf
Please check the file(s) for viruses. Files are checked manually and then made available for download.
CHUNYANG (FRANK) ZHU
+1 (236) 999-9661 Burnaby, BC, Canada [email protected]
SUMMARY
Senior SRE with deep experience in AI platform reliability, Kubernetes infrastructure, and cloud automation. Led
global operations for Microsoft AI Translator, driving 99.9% availability, secure model releases, multi-region Kubernetes
operations, and SLO-driven incident reduction.
EDUCATION
M.S. Cybersecurity, New York Institute of Technology - Vancouver 2023 2024
M.S. Computer Science, Washington University in St. Louis 2014 2016
B.S. Computer Science, University at Buffalo, SUNY 2010 2014
SKILLS
Cloud & Infra: Azure, Kubernetes (AKS), Docker, OpenShift
Programming: Python, Java, Go
Observability: Prometheus, Grafana, Splunk, Datadog
Databases & Msg: MongoDB, MySQL, Couchbase, Kafka
DevOps: Azure DevOps, CI/CD Pipelines, VMware, Terraform
EXPERIENCE
Microsoft Feb 2025 Present
Site Reliability Engineer Lead (Contract via CSI Interfusion) Remote
Operated multi-region Linux/Kubernetes environments for global AI inference services, ensuring 99.9% availability
while performing node and VM-level troubleshooting beyond Kubernetes workloads.
Managed lifecycle operations for Linux VMs underpinning AKS clusters, including provisioning, patching, scaling,
node replacement, and resolving OS/network/storage issues impacting service health.
Designed and optimized production-grade AKS clusters with security hardening, autoscaling improvements, and
enhanced diagnostics, reducing MTTR by 50%.
Built automated CI/CD pipelines for microservices and AI models, enabling safe canary deployments, traceable
rollouts, and weekly global releases.
Led large-scale Linux patch automation and CVE remediation across distributed VM fleets, meeting Cyber EO
compliance and standardizing secure release governance.
Developed observability using Prometheus/Grafana, implementing SLO-based alerting, performance monitoring,
and integrating Exporter metrics into modern monitoring workflows.
Automated operational workflows using Python, Go, and Bash, improving deployment, monitoring, and troubleshooting
efficiency.
Coordinated a distributed 6-engineer SRE team, establishing DRI/on-call workflows, escalation processes, and
operational best practices across global services.
Branch Metrics 2020 2021
Solution Engineer Remote
Built and deployed mobile attribution integrations for enterprise clients, improving onboarding efficiency and
reducing integration time by 40%.
Designed end-to-end SaaS data flows using REST APIs, webhooks, and event-based pipelines to support highscale
marketing analytics.
NVIDIA 2017 2018
System Software Engineer Shanghai, China
Developed and maintained microservices for the NVIDIA Gaming Platform (50M+ users), enabling scalable
game distribution and user services.
Rebuilt CI/CD pipelines with automated testing and structured logging, reducing deployment time and improving
release reliability.
Implemented service-level monitoring and debugging tools to enhance production visibility and accelerate issue
resolution.
Walmart Inc. 2016 2017
Programmer Analyst Bentonville, AR, USA
Built Azure-based Java microservices supporting robotics-driven inventory automation, improving system efficiency
and operational throughput.
Implemented event-driven workflows and messaging using Azure services to support large-scale warehouse operations.
Collaborated with robotics and infrastructure teams to ensure reliability, service scalability, and smooth CI/CD
delivery.
Keywords: continuous integration continuous deployment artificial intelligence golang Arkansas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];6597
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: