Posted 1d ago

Foundation Model Engineer

@ CommonAI
Cambridge, England, United Kingdom
HybridFull Time
Responsibilities:Design training, Build pipelines, Fine-tune models
Requirements Summary:Experience training and fine-tuning LLMs/multimodal models; strong evaluation, data quality, and feature engineering; Python and ML frameworks (PyTorch, TensorFlow); production ML pipelines; GPU optimisation; distributed training and MLOps knowledge; applied research/ publications helpful.
Technical Tools Mentioned:Python, PyTorch, TensorFlow, GPU, MLOps, CI/CD, Experiment tracking, Model versioning
Save
Mark Applied
Hide Job
Report & Hide
Job Description

CommonAI CIC is a non-profit membership organisation, founded on a belief in collaborative engineering for the safe and responsible development of foundational AI technologies. A place where AI startups, enterprises large and small, public sector bodies and academia can share resources and knowledge, to codevelop and grow businesses, fast.

We are led by experienced founders, investors and engineers who believe that collaborative engineering drives faster AI innovation and are backed by a mix of UK Government and private funding in order to design, build and deploy innovative AI systems.

The Opportunity

We’re seeking a highly skilled foundation model engineer who has experience of building, training, evaluating, and deploying LLMs or multimodal models end-to-end.

We are currently building an AI lab with multiple GPU clusters for testing new hardware and software technologies to accelerate machine learning and inference. This exciting role will primarily focus on model development, data pipelines and system performance. You’ll work across the full AI lifecycle, from experimentation to scalable deployment, with a strong emphasis on technical depth and rigour.

What You’ll Do

  • Design and implement end-to-end LLM training pipelines
  • Source and, where appropriate, preprocess datasets for training and evaluation
  • Fine-tune and optimise open weight models (LLMs, vision, or traditional ML)
  • Build evaluation frameworks and define performance metrics
  • Develop and maintain data pipelines and training workflows
  • Analyse training pipelines and optimise them for latency, cost, and scalability
  • Implement monitoring, logging, and feedback loops for continuous improvement
  • Experiment with modern AI tooling and services to investigate how they can be leveraged