DevJobs

ML Data Engineer

Overview
Skills
  • Python Python
  • Kubernetes Kubernetes
  • Terraform Terraform
  • GCS
  • S3

About Bria:

Bria is pioneering enterprise-grade Visual Generative AI designed to power groundbreaking commercial creativity and extraordinary brand experiences. Our platform offers unmatched capabilities in image generation, editing, and custom-tailored solutions, all built on exclusive, fully licensed data to ensure seamless compliance and brand consistency. Additionally, we embrace openness by providing open-source access to our models, fostering innovation across the AI community.

Our exceptional team of researchers and technologists combine deep technical expertise with a profound understanding of industry needs, dedicated to pushing the boundaries of visual creativity. At Bria, we're passionate about building amazing technology that's powerful, innovative, and ready to meet the demands of the world’s most creative and visionary brands—and we're just getting started.


Position Overview:

We're looking for an ML Data Engineer, reporting to VP Research to build the data pipelines driving our next-generation generative image and video models. This role is central to our mission of training models exclusively on clean, high-quality data.

You'll develop data ingestion pipelines, captioning systems, and high-throughput, distributed architectures for large-scale data processing and curation. You’ll be responsible for solving some of the toughest challenges in data quality and model performance — from training and shipping quality scoring models to analyzing large-scale datasets and uncovering new challenges.


Key Responsibilities:

  • Build from scratch ML infrastructure that allows large scale processing of Billions of images/videos.
  • Design and implement systems for data ingestion, deduplication, validation, filtering, labelling, and quality scoring.
  • Implement observability and telemetry across the ML data lifecycle.
  • Collaborate with infrastructure teams to develop efficient data pipelines that support large-scale model training, running across thousands of GPUs.
  • Be part of the research team, collaborate closely with research scientists.
  • Work in a fast-moving environment with many known and unknown challenges to tackle.


Requirements:

  • Strong hands-on experience in ML and software engineering, including deploying models (e.g., classifiers, segmentation, quality scoring), with a focus on image, video, or audio modalities. At least 5 years of hands-on experience.
  • Deep experience in building and scaling data infrastructure for large-scale ML systems, ideally for image/video or multi-modal models.
  • Experience managing large-scale datasets and pipelines in production.
  • Fluency with Python.
  • Understanding of modern cloud infrastructure: Kubernetes, Terraform, S3/GCS, distributed compute.
  • Comfortable operating in environments with ambiguity and evolving priorities.


Why Join BRIA AI? If you are driven by innovation and motivated by the challenge of shaping the future of visual generative AI, we would love for you to apply. You can make a significant impact in an exciting and rapidly evolving field.

*BRIA AI is an equal opportunity employer that fosters a diverse and inclusive work environment where all ideas and innovations flourish.

Bria AI