DevJobs

AI Researcher

Overview
Skills
  • Deep learning Deep learning
  • ML ML
  • Multi-node Clusters
  • Transformer Architectures
  • Statistical Modeling
  • Profiling
  • Precision Scaling
  • Optimizer Behavior
  • Multi-GPU Clusters
  • LLM Training
  • LLM Deployment
  • Distributed Training
  • Megatron-LM
  • CUDA
  • NeMo
  • GPU Kernel Development
  • SGLang
  • DeepSpeed
  • Triton
  • VLLM
Description

Location: Tel Aviv/ Ra'anana

Job Summary

DriveNets is seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization.

At DriveNets, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, we’re applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments.

Key Responsibilities

  • Conduct cutting-edge research in artificial intelligence and machine learning, from problem formulation to experimental validation.
  • Research, design, implement and evaluate novel algorithms, models, optimization strategies and architectures across areas of large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization).
  • Translate research ideas into working prototypes and production-ready solutions.
  • Stay up to date with state-of-the-art research, frameworks, and emerging trends in the AI ecosystem.
  • Publish research findings internally and externally (papers, technical reports, blog posts, or patents) and present results to internal and external technical audiences.
  • Collaborate closely with engineers, product teams, and other researchers to align research with real- world impact
  • Profile distributed training and inference pipelines - identifying algorithmic, memory, and scheduling inefficiencies to contribute to a technical decision-making and long-term research roadmaps.
  • Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.

Requirements

  • Strong foundation in machine learning, deep learning, and statistical modeling.
  • Deep understanding of deep learning internals—transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior.
  • Proven hands-on experience training or deploying LLMs on multi-GPU and/or multi-node clusters.
  • Ability to read, understand, and critically evaluate academic research papers. Demonstrated ability to translate theoretical ideas into practical, production-level performance improvements.
  • Strong problem-solving skills and ability to work independently on open-ended research problems.
  • Clear written and verbal communication skills in English.

Optional Qualifications

  • MSc or PhD in Computer Science, Electrical Engineering, Mathematics or a related quantitative field.
  • Strong mathematical background, including linear algebra, probability, and optimization.
  • Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks.
  • Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large- scale training ecosystems.
  • Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large

models across multi-node GPU systems.
DriveNets