Datheng Wang

DevOps Engineer | Cloud Engineer

About

  • Passionate, technology-driven, and self-motivated Full Stack Developer with expertise in cloud computing and DevOps culture & practice, and cloud-native solutions. My core mission is to transform systems into a highly resilient, self-healing entity through advanced architectural design, deep automation, and cutting-edge engineering practices.
  • Building unbreakable systems and enjoy the thrill of solving complex challenges with technology
Last updated: Wed Oct 29 2025

Experience

  • Summary:

    • Passionate, technology-driven System Architect and Technical Director, play a critical role in making applications and infrastructure services Reliable, Visible, Resilience, Self-healing to stakeholders for daily operation, troubleshooting, performance analysis, capacity planning through advanced architectural design, deep automation, and cutting-edge engineering practices.

    Responsibilities:

    • Architecture Design & Optimization: Spearhead the design, implementation, and optimization of High Availability (HA) and Disaster Recovery (DR) solutions across on-premise and public cloud environments, encompassing multi-AZ (Availability Zone) and multi-region architectures to eliminate single points of failure and ensure business continuity.
    • ​Infrastructure as Code (IaC) Leadership: Champion and drive Infrastructure as Code (IaC) practices, leveraging tools such as Terraform, ARM templates, and AWS CloudFormation to automate, version-control, and ensure repeatable, idempotent provisioning of all infrastructure components, achieving consistency and efficiency.
    • Observability & Monitoring Enhancement: Enhance system observability by establishing comprehensive metrics collection, alerting mechanisms, and centralized logging using platforms like Prometheus, Grafana, Alertmanager, and OpenSearch (ELK stack), enabling real-time performance analysis, proactive issue detection, and rapid troubleshooting.
    • Automation & Operational Efficiency: Develop robust automation scripts and tools using Ansible, Bash, and Python to eliminate manual, repetitive tasks, thereby streamlining operational workflows, accelerating incident response, and improving recovery times.
    • ​Incident Management & Continuous Improvement: Participate in on-call rotations for critical incidents and lead blameless post-mortem processes to perform deep-dive root cause analysis, driving actionable insights and continuous improvement across systems and processes.
    • Software Engineering Principles & CI/CD: Apply strong software development domain knowledge, including design patterns, code structure, and programming languages, with expertise in continuous integration and deployment (CI/CD) pipelines using Git, GitHub/GitLab, Jenkins, and ArgoCD
    • Team Leadership & Mentorship: Lead and mentor a team of engineers, fostering a culture of collaboration, continuous learning, and professional growth, while ensuring alignment with organizational goals and technical excellence.
    • Communication & ITSM Integration: Possess excellent written and verbal communication skills, with experience in ITOM/ITSM integration, specifically ServiceNow ITOM for event management and operational intelligence, alongside strong people management capabilities.
    • DevOps & Modern Ops Practices: Maintain deep awareness and implementation expertise in DevOps, DevSecOps, GitOps, and AIOps strategies, fostering a culture of automation, collaboration, and continuous delivery.

    Achievements:

    • I successfully collaborated with geographically dispersed teams across different countries, adapting communication for time zone/cultural differences during critical incident resolutions.
    • Build a dual-active architecture with Azure and Alibaba Cloud to support daily processing of over 100,000 orders for the PO system
    • Build On Premise high availability k8s cluster, implement Kubernetes elastic scaling strategy, and ensure peak Pod startup latency is less than 500ms
    • Build Proxmox bare metal cluster to support millisecond scale scaling for over 100 containerized applications
    • Enhance system disaster recovery capability through Chaos Monkey drill, achieving zero business interruption throughout the year
    • Develop a microservice monitoring system based on Spring Cloud, with an average response time of less than 200ms
    • Implement OpenTelemetry, Prometheus+Grafana+AlertManager and ELK log analysis platform, improve fault localization efficiency by 80%
    • Design Argo CD continuous delivery assembly line to achieve 1000 daily fault free deployments
    • Standardized deployment through Helm Charts reduces environment consistency error rate by 90%
    • Implement GitOps practice, reduce configuration drift rate from 15%/week to 0.5%/month
    • Chaos Engineering
    • SRE
    • DevOps
    • CI/CD
    • GitOps
    • Linux
    • Docker
    • Kubernetes
    • Prometheus
    • OpenTelemetry
    • Grafana
    • Azure Cloud
    • Alibaba Cloud
    • AWS
    • EKS
    • Jenkins
    • ArgoCD
    • HELM
    • Terraform
    • Ansible
    • Bash
    • Python
    • Java
    • Spring Boot
    • Spirng Cloud
    • ELK
    -
  • Summary:

    • Collaborated with biz development team in China, co-worked with global teams , ensuring seamless alignment between business requirements and technical solutions, like server operation, storage capacity, application deployment and SRE etc.

    Responsibilities:

    • Application Operational Management: Collaborated with development teams throughout the application lifecycle to ensure seamless deployment of new systems, maintaining production-grade quality and zero customer impact.
    • Change Management: Owned production environment governance under ITIL4 framework, enforcing compliance with Change Management, Incident Management, and Release Management policies.
    • Incident Management: Conducted proactive monitoring, root cause analysis, and resolution of production incidents, escalating to cross-functional teams when critical business continuity risks emerged.
    • Environment Patch Management: Executed systematic patching strategies to uphold security posture, ensuring 100% compliance with the latest vulnerability remediation protocols.
    • Mentored two Management Trainee, accelerating their technical growth and productivity.

    Achievements:

    • Led the migration of a legacy system from traditional server system to virtulization system(VMware vSphere), Virtualization rate reaches 60%, boosting scalability and reliability.
    • Passed the certification of ISO27001 information security management requirements
    • Implement Hierarchical Storage Management, Significantly reduce overall storage cost (TCO), Include HDS SAN Storage System, IBM SAN Storage System, Dell iSCSI Storage System and Net App NAS Storage System
    • ITIL4
    • ISO 27001
    • TCO
    • Incident Management
    • Change Management
    • Security Patch Management
    • VMVare vSphere
    • SAN Storage
    • NAS Storage
    • iSCSI Storage
    • Windows Server
    • Linux Server
    -
  • Summary:

    • Work with the Dev team on Sourcing Platform (the major business system) Deployment, Operation and Maintenance, HA Server, network, storage system etc.

    Achievements:

    • For the Souceing Platform, overseeing the full infrastructure lifecycle—including server and storage selection, backup, and high-availability solutions—from design through daily monitoring. Collaborated closely with development teams to implement agile practices, enabling rapid, reliable iteration and accelerated version releases.
    -
  • Summary:

    • During this time, the company grew from 50 to more than 500 people, and I also grew to the head of the company's IT department.

    Achievements:

    • I built the company's IT system from scratch, including the network system, server system, storage system, security system, OA system, and business system. I also led a team to maintain the company's IT system and provide technical support to the company's employees.
    -

Projects

Skills

  • DevOps

    Git
    Gitea
    Docker
    Kubernetes
    bare metal k8s cluster
    ECK
    AKS
    GKS
    Helm
    Portainer
    Harbor
    Sonatype Nexus
    Terraform
    Ansible
    Bash Script
    CI/CD
    Argo CD
    Argo Workflows
    Argo rollouts
    Jenkins
    GitLab CI/CD
    GitHub Actions
    Observability
    Promitheus
    Grafana
    Loki
    OpenTelemetry
    ELK Stack
    Elaticsearch
    Logstash
    Kibana
    Fluentd
  • Cloud Computing

    AWS
    Google Cloud
    Azure Cloud
    Alibaba Cloud
    Huawei Cloud
  • Backend

    Java
    Python
    Microservices
    Spring Boot
    Spring Cloud
    Spring Data JPA
    Spring Security
    Spring MVC
    Spring AOP
    Spring Cloud Gateway
    Spring Cloud Alibaba
    Spring AI
    Spring AI Alibaba
    MCP
    REST
  • Frontend

    Aastro Web Framework
    WordPress
    Angular
    Vue.js
    Vite
    TypeScript
    JavaScript
    Bootstrap
    NG-Alain
    Tailwind CSS
    CSS
    HTML
    SPA
    SSG
    SEO
  • Database

    SQL
    ORM
    JPA
    MySQL
    PostgreSQL
    NoSQL
    MongoDB
    Redis
  • Testing

    JUnit
    TestNG
    AssertJ
    Mokito
    Spring Boot Test
    Jasmine
    Jest
    Postman
    TDD
  • Collaboration

    Git
    Agile
    Jira
    Teams
    Zoom
    Slack
  • OS

    Linux
    MacOS
    Windows
  • Misc

    Kafka
    Rocket MQ
    Regex
    Markdown
    MDX
    CLI

Education

  • Zhejiang University

    Computer Science and Technology, B.S
    • Essential theories and necessary expertise in computer science and information technology related principles by connecting computer theories with applications, connecting computer software with hardware, and connecting engineering methodology with technology.
  • Guangdong University of Foreign Studies

    English Junior College
    • Learn the foundations of composition, critical thinking, and research in this English degree program designed to immerse you in all areas of literature and language.

Certificates

Languages

  • Chinese: Native speaker

  • English: Professional working proficient

  • Cantonese: Native speaker

Interests

  • Hiking

  • Running

  • Reading