Description
Location: Tel Aviv/ Ra'anana
Job Summary
DriveNets is seeking a senior AI Researcher to join its R&D group and lead the frontier of large-scale LLM optimization. You will focus on maximizing performance, scalability, and efficiency of LLM training and inference across massive GPU clusters, bridging deep learning research, distributed systems design, and hardware-aware optimization.
At DriveNets, we treat AI performance as a systems problem. Just as we reinvented networking through disaggregation and software-defined scale, we’re applying the same philosophy to AI infrastructure. Your work will directly influence how large models are deployed, scaled, and optimized across high-density compute environments.
Key Responsibilities
- Conduct cutting-edge research in artificial intelligence and machine learning, from problem formulation to experimental validation.
- Research, design, implement and evaluate novel algorithms, models, optimization strategies and architectures across areas of large-scale LLM training and inference (e.g., tensor/pipeline/expert parallelisms, quantization, prefill/decode disaggregation, GPU communication optimization).
- Translate research ideas into working prototypes and production-ready solutions.
- Stay up to date with state-of-the-art research, frameworks, and emerging trends in the AI ecosystem.
- Publish research findings internally and externally (papers, technical reports, blog posts, or patents) and present results to internal and external technical audiences.
- Collaborate closely with engineers, product teams, and other researchers to align research with real- world impact
- Profile distributed training and inference pipelines - identifying algorithmic, memory, and scheduling inefficiencies to contribute to a technical decision-making and long-term research roadmaps.
- Validate research through measurable impact, higher throughput, better FLOPS utilization, improved convergence efficiency, or reduced compute cost.
Requirements
- Strong foundation in machine learning, deep learning, and statistical modeling.
- Deep understanding of deep learning internals—transformer architectures, distributed training paradigms, precision scaling, and optimizer behavior.
- Proven hands-on experience training or deploying LLMs on multi-GPU and/or multi-node clusters.
- Ability to read, understand, and critically evaluate academic research papers. Demonstrated ability to translate theoretical ideas into practical, production-level performance improvements.
- Strong problem-solving skills and ability to work independently on open-ended research problems.
- Clear written and verbal communication skills in English.
Optional Qualifications
- MSc or PhD in Computer Science, Electrical Engineering, Mathematics or a related quantitative field.
- Strong mathematical background, including linear algebra, probability, and optimization.
- Strong grasp of parallel and distributed systems principles, including communication collectives, load balancing, and scaling bottlenecks.
- Proficiency with frameworks like DeepSpeed, Megatron-LM, NeMo VLLM, SGLang, or equivalent large- scale training ecosystems.
- Understanding of CUDA, Triton, or low-level GPU kernel development, and experience profiling large
models across multi-node GPU systems.