Site Reliability Manager

No longer accepting applications

Overview

Job TypeHybrid

Experience5 years

Job PositionCloud/DevOps

UpdatedJun 09, 2025

LocationCenter District

SalaryN/A

Skills

Bash
Go
Python
CI/CD
AWS
Kubernetes
Grafana
Terraform
Datadog
Prometheus

We are looking for a Site Reliability Engineering (SRE) & Production Team Leader to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.

We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.

In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.

About Us

Check Point is a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work.

Major Responsibilities

Design, build, and manage our SRE framework to ensure observability, resilience, and high availability.
Develop and automate solutions for proactive monitoring, incident response, and performance optimization.
Improve and maintain our alerting and monitoring stack, leveraging tools like Datadog, Prometheus, and Grafana.
Lead post-mortem analysis and implement continuous improvement initiatives.
Collaborate with DevOps, Engineering, and Product teams to ensure smooth and efficient delivery of reliable services.

Desired Background

SRE & Production Manager with 5+ years of experience in SRE, Production Engineering, or DevOps, including 2+ years in a leadership role.
Experience with monitoring and observability tools like Datadog, Prometheus, and Grafana.
A problem solver, capable of finding creative solutions and getting things done.
Fluent with incident management, RCA processes, and operational best practices.

It would be great if you also have:

Experience in high-scale distributed systems.
Background in security and compliance for cloud infrastructure.
Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations).
Understanding of cost optimization and resource management in cloud environments.
Familiarity with machine learning or predictive analytics for proactive reliability management.
Proficiency in Python, Go, or Bash for automation and scripting.

Check Point Software Technologies

Similar jobs

Sr. Engineer, iAuto (Remote)

IsraelJul 08, 2026
Senior DevOps Engineer

Ramat GanJul 07, 2026
DevOps & Infrastructure Engineer

Ramat GanJul 06, 2026
Software Engineer

Tel Aviv-YafoJul 06, 2026
DevOps Linux Administrator

RehovotJun 29, 2026
DevOps Engineer

Tel Aviv DistrictJun 18, 2026
Cloud Architect

RaananaJul 09, 2026
DevOps Engineer

GivatayimJul 06, 2026

Your Account

Your Account