Summary
We are looking for a research intern to join us for a research project aimed at publication at a top-tier venue. The intern will design and develop novel agentic systems that leverage large language models and vision-language models to reason over extended video content.
Description
Our team focuses on generative AI applications for videos. You'll work alongside fellow researchers and engineers, leveraging Computer Vision and Agentic Systems technologies to build future Apple products.
Responsibilities
- Design and implement novel LLM-based agentic systems for long-form video understanding, targeting established academic benchmarks
- Collaborate with researchers and engineers on the team to produce a publication-ready contribution
- Benchmark against established evaluation suites and iterate toward state-of-the-art results
Minimum Qualifications
- Currently enrolled in a graduate program (M.Sc. or Ph.D.) in Computer Science, Electrical Engineering, or a related field
- Publications at top-tier venues (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ACL, EMNLP or similar)
- Strong programming skills in Python and experience with deep learning frameworks (e.g., PyTorch)
- Solid foundation in computer vision, natural language processing, or multimodal learning
- Proficiency with agentic development tools (e.g., Claude Code)
Preferred Qualifications
- Demonstrated expertise working with Large Language Models (LLMs) and Vision-Language Models (VLMs)
- Experience building agentic or multi-component AI systems
- Familiarity with video understanding tasks and benchmarks
- Experience with prompt engineering and optimization techniques
At Apple, we believe accessibility is a fundamental human right. You’ll find that idea reflected in everything here — in our culture, our benefits and our digital tools. By welcoming as many perspectives as possible, we help you build a career where you feel like you belong.
Learn about accessibility in Apple’s workplace
Role Number: 200657779-0865