You will work on a custom ML compiler that transforms modern ML and DSP models into highly efficient programs for our accelerator.
This role spans the full compiler stack—from ingesting models and transforming intermediate representations to optimizing execution under tight memory and latency constraints.
What You’ll Do
- Build and maintain model ingestion pipelines (e.g., PyTorch / ONNX → internal IR)
- Implement graph transformations such as:
- Operator decomposition and canonicalization
- Shape inference and layout transformations
- Develop and extend intermediate representations (e.g., MLIR)
- Implement optimization passes including:
- Operator fusion and graph partitioning
- Basic scheduling and tiling strategies
- Memory planning and reuse
- Debug correctness and numerical issues across transformations
- Collaborate with hardware and ML teams to improve system performance
Requirements
Required qualifications and experience:
- 2+ years of experience in compilers and/or edge-AI
- Proficiency in Python and/or C++
- Experience with at least one:
- MLIR, LLVM, TVM, XLA, or similar
- Graph-level transformations or ML model internals
- Understanding of deep learning models (conv, sequence models, etc.)
- Ability to reason about correctness and performance tradeoffs
Nice to have:
- Experience with optimization techniques (tiling, scheduling, memory reuse)
- Familiarity with ONNX or PyTorch internals
- Exposure to quantization or low-precision computation
- Interest in hardware-aware ML systems
Benefits
- 401(k)
- Medical insurance
- Vision insurance
- Dental insurance
- Commuter benefits
- Disability insurance
- Paid maternity leave
- Paid paternity leave
- Child care support
femtoAI is an equal opportunity employer committed to a diverse workforce which strives to create an inclusive working environment empowering everyone to do their best work. We do not discriminate on the basis of race, ethnicity, religion, gender, gender identity, sexual orientation, age, marital status, veteran status, or disability status.