Posted 2mo ago

Senior Research Engineer

@ Pathos
New York City, New York, United States
$180k-$200k/yrHybridFull Time
Responsibilities:Design models, Build architectures, Optimize kernels
Requirements Summary:PhD in CS/ML/Computational Biology or MS with 5+ years, strong ML publications, 3-5 years DL experience, PyTorch/JAX/TensorFlow, CUDA GPU programming.
Technical Tools Mentioned:PyTorch, JAX, TensorFlow, CUDA, Kubernetes, SLURM, DeepSpeed, Megatron, Triton, FSDP, Diffusion Models, GPU Kernel Development, xFormers
Save
Mark Applied
Hide Job
Report & Hide
Job Description

About the role

We are seeking exceptional Senior Research Engineers to join our mission-critical team building the world's best oncology foundational models. As an AI-driven drug development company, these models are the engine that powers everything we do, from predicting patient survival, to identifying novel therapeutic targets to optimizing clinical trial design.


In this role, you'll be at the intersection of cutting-edge AI research and real-world drug development. You'll work on foundational models that integrate diverse data modalities, known cancer biology, tumor mechanisms, DNA/RNA sequencing, detailed medical notes, and examination results to generate insights that directly inform our clinical-stage programs.


You'll participate in both pre-training and post-training of our foundation models, requiring deep expertise in modern architectures and post-training algorithms such as reinforcement learning. You may also operate at the CUDA level, building customized kernels and understanding performance at the hardware-software interface.


What You'll Do

  • Design, implement, and optimize large-scale oncology foundation models integrating genomic sequences, medical notes, lab results, imaging, and clinical outcomes
  • Build and experiment with modern architectures optimized for biomedical applications
  • Spearhead pre-training and post-training efforts, including RLHF, DPO, RLAIF, and other alignment techniques
  • Write and optimize custom CUDA kernels; profile and resolve performance bottlenecks across the hardware-software interface
  • Maintain and optimize our 1,000+ H200 GPU cluster for reliability, utilization, and performance
  • Build distributed training and inference pipelines, experiment tracking systems, and evaluation frameworks
  • Develop benchmarks that measure real progress on drug development-relevant tasks
  • Collaborate with oncologists, biologists, and clinical development teams to ground model development in real therapeutic questions
  • Contribute to publications in top-tier ML and biomedical venues (NeurIPS, ICML, ICLR, Nature, Cell, etc.)

What We're Looking For

Required

  • Ph.D. in Computer Science, Machine Learning, Computational Biology, or a related field, or an M.S. with 5+ years of relevant industry experience
  • Publication record in machine learning, including multiple first-author papers at top-tier venues
  • 3 to 5 years of hands-on deep learning experience (PyTorch, JAX, or TensorFlow)
  • Strong command of modern architectures: Transformers, attention mechanisms, state-space models, mixture-of-experts
  • Hands-on experience with post-training techniques: RLHF, DPO, PPO, or similar
  • Expert-level GPU programming and CUDA, including custom kernel development and performance profiling
  • Practical experience training or fine-tuning large-scale models (multi-billion parameter) in distributed settings (DeepSpeed, FSDP, Megatron, or similar)
  • Experience managing GPU clusters and ML infrastructure (Kubernetes, SLURM, or equivalent)
  • Strong software engineering fundamentals in Python and C++/CUDA
  • Clear communicator, able to present complex technical work to both engineering and scientific audiences

Preferred

  • Background in oncology, cancer biology, or drug development
  • Experience with biomedical foundation models (AlphaGenome, GeneFormer, Evo2, etc.)
  • Deep knowledge of cancer genomics, tumor biology, or mechanisms of resistance
  • Contributions to ML systems frameworks (FlashAttention, Triton, xFormers, etc.)
  • Experience with multi-modal learning and cross-modal architectures
  • Familiarity with advanced training techniques: synthetic data generation, curriculum learning, data filtering
  • Familiarity with regulatory considerations in healthcare AI (FDA, HIPAA, GxP)
  • Open-source contributions to ML projects or frameworks

Location

This is a hybrid role, requiring up to 3-4 days per week onsite, in our NYC Headquarters.