Posted 4mo ago

Research Scientist - Speech/Audio Machine Learning

@ Institute of Foundation Models
Paris, Île-de-France, France
OnsiteFull Time
Responsibilities:Architectural Design, Loss Function, Experimental Iteration
Requirements Summary:PhD or MSc in Computer Science or related field; deep learning, signal processing, or computational linguistics; strong publication record.
Save
Mark Applied
Hide Job
Report & Hide
Job Description
About the Institute of Foundation Models 

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

As a Research Scientist specializing in speech/audio machine learning, you will contribute to the design and training of SoTA end-to-end neural speech models. You will be responsible for developing the core intellectual property, moving beyond cascaded ASR → TTS systems toward native audio-to-audio multimodal architectures.


About the Institute of Foundation Models 

We are a dedicated research lab for building, understanding, using, and risk-managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge-driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting-edge foundation model training, alongside world-class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem-solving skills will be instrumental in establishing MBZUAI as a global hub for high-performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

As a Research Scientist specializing in speech/audio machine learning, you will contribute to the design and training of SoTA end-to-end neural speech models. You will be responsible for developing the core intellectual property, moving beyond cascaded ASR → TTS systems toward native audio-to-audio multimodal architectures.


Key responsibilities
  • Architectural Design: Develop novel neural architectures for low-latency speech-to-speech translation and generation (e.g., Diffusion, Flow-matching, Transformer-based audio LLMs). 
  • Loss Function Engineering: Design and implement custom objective functions to optimize prosody (emotions, intelligibility, naturalness). 
  • Experimental Iteration: Conduct large-scale training runs, performing ablation studies on model architecture and tokenization strategies. 
  • Evaluation Frameworks: Establish rigorous internal benchmarks using both objective metrics (WER, MCD) and subjective human-in-the-loop (MOS) testing. 


  • Qualifications
  • PhD or MSc in Computer Science with a focus on Deep Learning, Signal Processing, or Computational Linguistics. 
  • Record of Research: Published work in top-tier venues (NeurIPS, ICLR, ICASSP, Interspeech).