DevJobs

AI Harness Engineer

Overview
Skills
  • Python Python
  • Kubernetes Kubernetes
  • Claude Code
  • Distributed systems
  • Database internals
  • DataDog
  • Elastic
  • Linux kernel drivers
  • LLM APIs
  • Prometheus Prometheus
  • Reverse engineering
AI Harness Engineer (Backend Developer)


What uses fewer tokens: giving an agent a dedicated MCP server, or putting it in a sandbox with bash and a whitelisted set of network endpoints? What are the tradeoffs beyond token cost?

We're looking for engineers to answer questions like this. You should have hands-on experience in challenging technical domains, such as designing distributed systems, database internals, low-level performance work, Linux kernel drivers, reverse engineering binaries, or anything else which is deeply technical with little room for error.

You will apply your skills to building agentic architectures, creating auto-research harnesses, shipping complex backend systems, and more.


Why this role is different

We operate at a velocity that makes some people uncomfortable.

Unlike traditional backend work, best practices can change overnight when Anthropic releases a new model. You need to be comfortable with rapid change and regularly invalidating previous assumptions.

To do this sustainably, we've invested heavily in automated agent harnesses for benchmarking and improving agents. A big part of this role is not only improving the agent that our customers use, but also improving the meta-version of it: the harness where AI can run experiments on AI, automatically.

We’re a fast growing startup, so you will do things beyond a traditional backend role. You’ll work closely with our CTO on special projects, and you’ll likely meet some of our customers to troubleshoot interesting edge cases in agent behaviour.


Examples features you could work on
  • Agent to Agent communication
  • Agent orchestrator/supervisor
  • Agent memory
  • Optimizing the cost of high-frequency agents  (agents that wake up every X minutes)
  • Webhook-triggered agents 
  • Evals Infrastructure
  • Auto-research loops
  • Grouping alerts into unique incidents - without invoking an LLM on each alert

  • What we expect you already know
    • 7+ years of backend engineering
    • Ability to understand and communicate tradeoffs
    • Extensive experience using coding agents like Claude Code to do tasks formerly too large to tackle in a reasonable timeframe
    Nice to have
    • Hands-on experience building with LLM APIs and tool-design
    • Python and Kubernetes experience
    • Familiarity with Prometheus / DataDog / Elastic / other observability stacks
    • Open source contributions to agent frameworks, MCP servers, or eval tooling
    • You've reverse-engineered Claude Code (or read the leaked source) to figure out how a specific subsystem works

    If you're a strong engineer and this role excites you, please reach out even if you don't check every box.

    About us

    We build SRE Agents to help companies reduce cloud downtime, by catching new errors early, finding their root causes, and automatically fixing them. We have a wide range of customers today, including several household names whose products you use.

    To apply: send a short note to [email protected] with your CV and your answer (or just your hypothesis) to the leading question above.



    Robusta