We are looking for a Site Reliability Engineer (SRE) to join our Engineering team. Someone who has a passion for observability, monitoring, automation, and high-availability systems, and who has a desire to solve complex technological challenges with a proactive approach to continuous improvement.
We use an interesting and mixed technology stack: Kubernetes, Terraform, CI/CD pipelines, Datadog, Prometheus, and cloud-native architectures.
In this position, you will use your expertise in building and scaling SRE operations, and will design, implement, and operate a world-class reliability strategy.
About U
sCheck Point is a key player the network security field, striving to provide the leading SASE platform in the market. Our innovative approach, merging cloud and on-device protection, redefines how businesses connect in the era of cloud and remote work
.
Major Responsibiliti
- esDevelop and maintain our monitoring, alerting, and logging systems, ensuring high visibility into production environment
- s.Implement automation to improve system reliability, scalability, and efficienc
- y.Troubleshoot and resolve production incidents, leading root cause analyses and implementing permanent fixe
- s.Collaborate with software engineers and DevOps teams to enhance application performance and resilienc
- e.Continuously improve operational processes, focusing on reducing toil and improving reliabilit
y.Desired Backgrou
- nd3+ years of experience as an SRE, DevOps Engineer, or in a similar rol
- e.Hands-on experience with monitoring and observability tools like Datadog, Prometheus, and Grafan
- a.Strong understanding of Linux systems, networking, and cloud-native architecture
- s.Experience with Kubernetes, Terraform, and CI/CD pipeline
- s.A problem solver, capable of finding creative solutions and getting things don
- e.Fluent with incident management, RCA processes, and operational best practice
s.It would be great if you also hav
- e:Experience in high-scale distributed system
- s.Background in security and compliance for cloud infrastructur
- e.Familiarity with AWS (EKS, EC2, RDS, S3, networking configurations
- ).Proficiency in Python, Go, or Bash for automation and scriptin
- g.Understanding of cost optimization and resource management in cloud environment
- s.Familiarity with machine learning or predictive analytics for proactive reliability managemen
t.