DevJobs

DevOps Engineer (Reliability)

Overview
Skills
  • Linux Linux
  • Microservices Microservices
  • AWS AWS
  • Kubernetes Kubernetes
  • Terraform Terraform
  • CDKTF
  • Chaos Mesh
  • Reliability engineering
Description

monday.com is looking for a Reliability Engineer to join our Reliability team. This role will be integral in ensuring the robustness and dependability of our platform, impacting millions of users globally.

About The Role

  • Maintain a comprehensive understanding of our service architecture and its dependencies.
  • Identify and mitigate risks associated with tightly coupled services and complex interconnections.
  • Lead service re-architecture initiatives to improve reliability and scalability.
  • Review new services and ensure they meet our reliability standards.
  • Advocate for Chaos Engineering, collaborate with R&D teams, build tools/envs, and improve system resilience
  • Manage the full lifecycle of reliability tools and services, adhering to the comprehensive architectural guidelines
  • Collaborate with teams to define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs) that align with business goals and user expectations
  • Our Stack: Kubernetes, Datadog, Chaos Mesh, AWS, Terraform, CDKTF

Requirements

  • Proven k8s and Linux admin/internals experience.
  • Proven experience with microservice architectures and reliability engineering.
  • Deep understanding of reliability concepts (eg, SLOs, SLIs, and service interconnections).
  • Strong background in incident response and resilience efforts.
  • Ability to collaborate across teams to drive reliability improvements.
  • (Nice-to-have): Prior knowledge with chaos engineering.
monday.com