Posted 3mo ago

AI Research Apprentice

@ Origin
Bengaluru, Karnataka, India
OnsiteFull Time
Responsibilities:Design diffusion, Train models, Integrate pipelines
Requirements Summary:Strong foundation in deep learning, probabilistic modeling and computer vision; diffusion models in PyTorch/JAX; multimodal transformers; data-centric AI; Python, PyTorch/Lightning; git; bonus C++/CUDA; on-device inference and synthetic data tools.
Technical Tools Mentioned:PyTorch, Lightning, Git, C++, CUDA, TensorRT, ONNX Runtime, Isaac Sim, Latent Diffusion, DDPM, BLIP, CLIP, Flamingo, LLaVA
Save
Mark Applied
Hide Job
Report & Hide
Job Description

About Origin

Origin (previously 10xConstruction) is building general-purpose autonomous robots for US construction to tackle rising costs, safety risks, and labor shortages. Our modular, multi-trade platform combines purpose-built hardware with real-time site intelligence to navigate complex environments and execute tasks with precision. Trained in high-fidelity simulation and already deployed on live sites, our robots deliver 5x faster execution, 250%+ margin expansion, and significant cost savings. Join India’s most talent-dense robotics team consisting of individuals from IITs, Stanford, UCLA, etc.

About the role

As an AI Research Apprentice you'll push the frontiers of generative and multimodal learning that power our autonomous robots. You will prototype diffusion-based vision models, vision–language architectures (VLAs/VLMs) and automated data-annotation pipelines that turn raw site footage into training gold.

Key Responsibilities

  • Design and train diffusion-based generative models for realistic, high-resolution synthetic data.
  • Build compact Vision–Language Models (VLMs) to caption, query and retrieve job-site scenes for downstream perception tasks.
  • Develop Vision–Language Action Models (VLA) objectives that link textual work-orders with pixel-level segmentation masks.
  • Architect large-scale auto-annotation pipelines that transform unlabeled images / point-clouds into high-quality labels with minimal human input.
  • Benchmark model performance on accuracy, latency and memory for deployment on Jetson-class hardware; compress with distillation or LoRA.
  • Collaborate with perception and robotics teams to integrate research prototypes into live ROS 2 stacks.