DevJobs

Senior Software Engineer - CUDA

Overview
Skills
  • C++ C++ ꞏ 5y
  • Python Python
  • Docker Docker
  • CUDA
  • Embedded Linux
  • GPU-accelerated services
  • NVIDIA Jetson
  • NVIDIA Nsight
  • ONNX Runtime
  • ROS2
  • TensorRT
  • Triton

Mentee Robotics is redefining humanoid automation with an AI-first approach. We integrate cutting-edge perception, reasoning, and dexterous manipulation into a fully autonomous humanoid robot that continuously adapts and learns. Our flagship product, Menteebot v3, is designed to perform complex tasks with human-like adaptability across industrial, logistics, and retail environments.

We are looking for a Senior Software Engineer to join our software team. In this role, you will be responsible for the high-performance software layer that bridges advanced AI models with physical robotic execution. Your work will focus on designing and implementing the core services responsible for real-time edge AI inference, ensuring that our systems process sensor data and execute commands with minimal latency and maximum reliability.



What You Will Do

  • Design & Optimize: Develop production-grade software in C++ and Python, specifically tailored for real-time inference and low-latency execution.
  • Edge AI Orchestration: Build and maintain the services that deploy and run neural networks directly on the robot’s edge hardware.
  • Sensor Integration: Develop robust pipelines to process high-frequency sensor data streams for real-time robotic perception.
  • Architect for Reliability: Create modular, well-architected components that ensure the robot remains stable and maintainable in complex, dynamic environments.
  • Cross-Functional Collaboration: Partner with AI researchers and hardware engineers to deploy and accelerate deep learning models on the edge.


Requirements:

  • 5+ years of Software Engineering experience, with a strict focus on modern C++.
  • Proven, hands-on expertise in writing, profiling, and optimizing CUDA code for high-performance edge computing.
  • Deep understanding of modern C++ standards, memory management, concurrency, and parallelism. Extensive knowledge of Python is also a strict requirement.
  • Deep knowledge of developing, debugging, and profiling within embedded Linux environments.
  • Experience building highly reliable, production-grade software.


Advantages:

  • Familiarity with inference frameworks like Triton, TensorRT, or ONNX Runtime.
  • Experience with using NVIDIA Nsight to deeply analyze performance and pinpoint execution bottlenecks.
  • Practical experience with the ROS2 ecosystem.
  • Expertise in GPU-accelerated services and zero-copy mechanisms to minimize data transfer overhead.
  • Experience with NVIDIA Jetson or similar embedded edge compute modules.
  • Experience with containerization (Docker) tailored for embedded environments.