DevJobs

SRE Critical Infrastructure Team-2592‏

Overview
Skills
  • SQL SQL
  • Python Python
  • CI/CD CI/CD
  • Jenkins Jenkins
  • Kubernetes Kubernetes
  • Grafana Grafana
  • Networking Networking
  • Elastic
  • HTTP
  • HTTPS
  • Infrastructure as Code
  • DNS
  • Prometheus Prometheus
  • DHCP
  • RKE2
  • Argo CD
  • TCP/IP
  • UDP

We are seeking a Site Reliability Engineer (SRE) to join our team responsible for critical infrastructure

Key Responsibilities

Operate and manage the Kubernetes platform (RKE2) platform

Set up and manage CI/CD pipelines using tools like Jenkins Argo CD and others

Implement Infrastructure as Code (IaC) and infrastructure automation

Design and maintain monitoring systems using Prometheus, Grafana and Elastic

Develop internal monitoring and quality control tools

Analyze incidents, trends and availability using SQL

Build Self – Service capabilities for development teams

Participate in on – call rotations, handle production incidents, and lead documented post- mortems

Write and Maintain operational documentation


Requirements

Familiarity with TCP/IP and communication protocols

Advanced Networking Knowledge: Deep understating of TCP/IP, network protocols (UDP, HTTP, HTTPS, DNS, DHCP) and network security

Experience with Kubernetes in production environment

Experience with Argo CD, Jenkins and other CI/CD tools

Experience with monitoring and logging tools (Prometheus, Grafana, Elastic)

Development experience in Python and Knowledge of SQL

Strong Infrastructure understanding and Troubleshooting skills in complex environments

Shabak - Israeli Security Agency - Career