|
📣 HiringCafe is hiring a Founding Growth Hacker! Learn More · Apply here

National browse hub

Gpu Kernel Engineer Jobs in United States

67 gpu kernel engineer jobs from 26 companies hiring in United States.

67 jobs- United States
1mo
Save
Mark Applied
Hide
Staff GPU Kernel Engineer – AI & Deep Learning
San Jose, California, United States
HybridFull Time
AMD
AMDNASDAQ: AMD: Designs and manufactures computer processors and graphics technology.
0+ YOEGPU kernel development experience; AI frameworks (PyTorch, vLLM, SGLang); C++, Python; collaboration; BS/MS/PhD in CS/CE/EE.
Python, C++, CUDA, HIP, Assembly, Triton, CK, CUTLASS, PyTorch, SGLang, vLLM
4w
Save
Mark Applied
Hide
Senior Compute Kernel Architect, GPU Power
Santa Clara, California, United States
$184k-$357k/yr OnsiteFull Time
NVIDIA
NVIDIANASDAQ: NVDA: Designs GPU-accelerated computing and artificial intelligence hardware.
5+ YOEMS or PhD in CS/EE/CE; 5+ years in GPU kernel development; strong CUDA/C++; GPU profiling; PDN/power-aware design; Python automation; cross-disciplinary collaboration.
CUDA, C++, Python, Nsight Compute, Nsight Systems, nvprof
1mo
Save
Mark Applied
Hide
GPU Performance Engineer | Experienced Hire
New York, New York, United States
$200k-$300k/yr OnsiteFull Time
Susquehanna International Group
Susquehanna International Group: Global quantitative trading firm providing proprietary financial market liquidity.
Strong CUDA kernel optimization, C/C++, GPU architecture, numerical stability, low-level systems.
CUDA, C/C++, ONNX Runtime, TensorRT, Triton, TVM
2mo
Save
Mark Applied
Hide
Senior ML Accelerator Engineer - GPU
Sunnyvale or Washington or Austin or San Francisco or Warren
$129k-$261k/yr HybridFull Time
General Motors
General MotorsNYSE: GM: Manufactures automobiles and provides vehicle-related financial services.
3+ YOE3+ years of relevant experience; strong CUDA GPU programming; kernel development and performance optimization; collaboration across teams.
CUDA, NSight, CUTLASS, CuTe, GPU, Kernel development
2w
Save
Mark Applied
Hide
Software Engineer - C++ GPU Performance
Foster City or Boston or San Diego or Seattle
$168k-$239k/yr HybridFull Time
Zoox
ZooxNASDAQ: AMZN: Developing autonomous robotaxis for urban ride-hailing services.
3+ YOE3+ years experience; BS in Computer Science or related field; strong CUDA, C++, Linux; GPU performance optimization; experience with GPU kernels and performance tooling.
CUDA, Nsight, C++, Linux, TensorRT, XLA, OpenGL, RocM
1mo
Save
Mark Applied
Hide
Research Engineer - AI Performance & Kernel Optimization
San Francisco, California, United States
OnsiteFull Time
Zyphra
Zyphra: Develops multimodal AI models and autonomous agent software platforms.
Strong systems mindset; low-level performance intuition; ability to learn new systems quickly; excellent collaboration; eager to optimize performance.
PTX, CUDA, HIP, Triton, GPU kernels
1mo
Save
Mark Applied
Hide
Principal ML Systems Engineer – AI for Quantum
Milpitas, California, United States
$220k-$245k/yr OnsiteFull Time
PsiQuantum
PsiQuantum: Builds fault-tolerant quantum computers using silicon photonics technology.
Proficient Python, PyTorch, distributed GPU training, CUDA kernel development, and ML infrastructure; experience with autonomous/agentic AI systems.
Python, PyTorch, CUDA, ROCm, Triton, NCCL, Slurm, GPU, GPU kernel development, ML infrastructure, distributed training
2mo
Save
Mark Applied
Hide
Distributed Training & Performance Engineer - Vice President
New York, New York, United States
$164k-$260k/yr OnsiteFull Time
JPMorgan Chase
JPMorgan ChaseNYSE: JPM: Global financial firm providing banking and investment management services.
3+ YOEMaster’s degree with 3+ years or Ph.D. with 1+ years in CS/physics/math/engineering; strong distributed training, GPU programming, kernel optimization; Python and C++ proficiency; experience with PyTorch/JAX.
CUDA, Triton, Nsight, PyTorch, JAX, Python, C++
2mo
Save
Mark Applied
Hide
Principal Systems Software Engineer
San Francisco, California, United States
$260k-$340k/yr OnsiteFull Time
Crusoe
Crusoe: Builds and operates sustainable data centers for AI workloads.
12+ YOE12+ years designing and shipping core infrastructure; strong Linux kernel, virtualization (KVM/QEMU/Firecracker), and high-performance networking expertise.
Linux kernel, KVM/QEMU/Firecracker, RoCE v2, InfiniBand, NVIDIA GPUs, AMD GPUs, Kubernetes, Slurm, SR-IOV, RDMA, GPU scheduling, memory management, container orchestration
1mo
Save
Mark Applied
Hide
Software Engineer, Hardware Enablement
United Kingdom or Norway or United States
£115k-£140k/yr RemoteFull Time
Modular
Modular: Unified software infrastructure and programming language for AI development.
5+ YOE5+ years in high-performance computing or compiler engineering; proficient in C++ and heterogeneous programming models (CUDA/OpenCL/SYCL); experience with GPU kernels, ML frameworks (PyTorch at C++ level), and porting to new hardware.
CUDA, OpenCL, SYCL, C++, PyTorch (C++), Mojo, MLIR, LLVM
1mo
Save
Mark Applied
Hide
ML Framework (MetalLM) Engineer, Graphics, Game and ML
California, United States
OnsiteFull Time
Apple
AppleNASDAQ: AAPL: Designs and sells consumer electronics, software, and online services.
3+ YOE3+ years in C/C++/ObjC; GPU kernel development with Metal/CUDA; distributed training/inference; system programming and architecture.
Metal, CUDA, CuTE, CuTile, Triton, OpenXLA, LLVM
4w
Save
Mark Applied
Hide
Principal Software Engineer - AI and Simulation
Sunnyvale or Austin
$280k-$350k/yr OnsiteFull Time
Apptronik
Apptronik: Designs and manufactures humanoid robots for industrial automation.
12+ YOELead embedded AI and simulation development; GPU orchestration, on-device AI, and real-time robotics performance.
C/C++, Linux, Kernel development, HAL, Graphics, GPU programming, Embedded systems, Runtime systems, AI/ML deployment
1mo
Save
Mark Applied
Hide
Senior Engineer, AI Systems
San Jose, California, United States
$138k-$206k/yr OnsiteFull Time
Samsung Semiconductor
Samsung SemiconductorKorea Exchange: 005930: Designs and manufactures memory chips, processors, and sensors.
3+ YOEBachelor’s with 5+ years or Master’s with 3+ years or PhD with 0+ years; strong Triton kernel development; LLM fundamentals; accelerator hardware knowledge; Python and systems programming; experience with hardware–software co-design or compiler optimization.
Triton, CUDA, Python, GPUs, accelerator_hardware, compilers
5d
Save
Mark Applied
Hide
AI System Research and Development Engineer - Optimization
Bellevue or Menlo Park
$200k-$288k/yr HybridFull Time
Snowflake
SnowflakeNYSE: SNOW: Cloud-based platform for data storage, processing, and analytics.
5+ YOEDesign and optimize GPU kernels for LLM training/inference; develop scalable DL systems; profile/benchmark; reduce latency; contribute to agentic frameworks.
PyTorch, TensorFlow, JAX, CUDA, CUTLASS, Triton, cuDNN, nvprof, Nsight
1mo
Save
Mark Applied
Hide
Software Engineer III, ML Infrastructure, AI and Infrastructure
Mountain View, California, United States
$147k-$211k/yr OnsiteFull Time
Google
GoogleNASDAQ: GOOGL: Provides online search, advertising, cloud computing, and consumer electronics.
2+ YOEBachelor's or equivalent; 2y Python/C++; 1y ML infra; 1y low-level programming; HPC experience; preferred MS/PhD; 2y data structures/algorithms; GPU/TPU kernels; compilers.
Python, C++, GPU programming, TPU kernels, Compilers, HPC
1w
Save
Mark Applied
Hide
System Software Engineer, Graphics & Camera Pipeline
San Jose or Bellevue
$200k-$275k/yr OnsiteFull Time
Rivet Industries
Rivet Industries: Develops ruggedized wearable systems for defense and industrial workforces.
5+ YOE5+ years in system software/graphics/camera pipelines with Linux/Android system level experience. Strong driver/kernel knowledge; OpenGL/Vulkan/CUDA experience; real-time imaging focus.
OpenGL, Vulkan, CUDA, Linux, Android, AOSP, DMA, GPU interop, kernel, drivers
4d
Save
Mark Applied
Hide
Embedded Software Engineer - Platform
Sunnyvale, California, United States
$135k-$228k/yr OnsiteFull Time
Intuitive
IntuitiveNASDAQ: ISRG: Robotic-assisted systems for minimally invasive surgery.
5+ YOEBachelor's or Master's in a technical field with 3-5 years experience; strong C/C++ for embedded systems; embedded Linux; hardware interfaces; debugging; teamwork.
C/C++, Embedded Linux, GDB, JTAG, Cross-compilation, Kernel, Device drivers, User space applications, i2c, SPI, UART, USB, Ethernet, CAN, Python, bash scripting, NVIDIA Jetson, CUDA, TensorRT
2mo
Save
Mark Applied
Hide
Member of Technical Staff, ML Kernels
Santa Clara or Boston
OnsiteFull Time
Netpreme
Netpreme: Develops photonic-electronic memory fabrics for AI infrastructure.
Design, optimize, and benchmark high-performance ML kernels for GPUs; strong CUDA and C++; profiling and debugging experience; independent problem solving.
CUDA, Nsight, nvprof, C++, GPU
1mo
Save
Mark Applied
Hide
High Performance Computing Software Engineer - Supercomputing
Sunnyvale, California, United States
$150k-$300k/yr OnsiteFull Time
Institute of Foundation Models
Institute of Foundation Models: Develops open-source frontier-class AI foundation models and research.
Develop and optimize software for large-scale ML workloads (1000+ GPUs); deep Linux kernel and GPU kernel knowledge; distributed libraries (NCCL, MPI, UCX, RCCL, SHARP, Libfabric); ML frameworks (PyTorch, TensorFlow, JAX, MegatronLM); HPC scheduling/orchestration (Slurm, Kubernetes, Pyxis).
NCCL, RCCL, MPI, UCX, SHARP, Libfabric, PyTorch, TensorFlow, JAX, MegatronLM, DeepSpeed, Slurm, Kubernetes
2mo
Save
Mark Applied
Hide
Senior Embedded Systems & Hardware-in-the-Loop Engineer
Laurel, Maryland, United States
$105k-$290k/yr OnsiteFull Time
Johns Hopkins University Applied Physics Laboratory
Johns Hopkins University Applied Physics Laboratory: Conducts research and engineering for national security and space.
8+ YOEBS in computer/electrical engineering or computer science with 8+ years; strong C++ (real-time, multi-threading); FPGA firmware (VHDL/Verilog); embedded Linux (kernel drivers, PetaLinux/Yocto); HWIL test systems; ability to obtain Interim Secret and eventual Secret clearance; US citizenship.
C++, VHDL, Verilog, FPGA, Embedded Linux, Kernel Programming, Device Drivers, PetaLinux, Yocto, CUDA, OpenCV, GStreamer, MATLAB, Python
2w
Save
Mark Applied
Hide
Senior Embedded Software Engineer – Cyber
San Diego, California, United States
$158k-$237k/yr OnsiteFull Time
Innoflight
Innoflight: Designs and manufactures compact cyber-secure avionics for space missions.
9+ YOESenior embedded software engineer with 9+ years in C/C++, embedded Linux, RTOS, cryptography, secure networking, and hardware integration; requires active U.S. security clearance.
C, C++, Embedded Linux, Linux kernel, RTOS, U-Boot, GRUB, TLS, IPsec, AES, RSA, ECDSA, ECDH, OpenCL, CUDA, SPI, I2C, UART, PCIe, Ethernet, SpaceWire
3w
Save
Mark Applied
Hide
AI Software Engineer: Intelligent Data Infrastructure (San Jose, CA, US, 95128)
San Jose or Boulder or Pittsburgh or Raleigh
$131k-$195k/yr HybridFull Time
NetApp
NetAppNASDAQ: NTAP: Sells enterprise data storage and cloud management software.
8+ YOE8+ years in software development; strong Golang, Python, C/C++; Linux kernel, file systems, distributed systems; AI/ML infra knowledge.
Golang, Python, C/C++, Linux, Kubernetes, GPU computing, Distributed systems, Storage systems
3w
Save
Mark Applied
Hide
AI Software Engineer: Intelligent Data Infrastructure (San Jose, CA, US, 95128)
San Jose or Boulder or Pittsburgh or Raleigh
$131k-$195k/yr HybridFull Time
NetApp
NetAppNASDAQ: NTAP: Provides intelligent data infrastructure and management solutions.
8+ YOE8+ years software development; expert in Golang, Python, C/C++; Linux kernel and distributed systems; AI/ML infra familiarity; strong problem solving and collaboration.
Golang, Python, C/C++, Linux, Kubernetes, GPU
2w
Save
Mark Applied
Hide
Expert Software Engineer — Rapid System Prototyping & Integration
Santa Rosa or Santa Clara
$143k-$238k/yr OnsiteFull Time
Keysight Technologies
Keysight TechnologiesNew York Stock Exchange: KEYS: Manufactures hardware and software for electronic test and measurement.
8+ YOEBS/MS/PhD in CS/EE/CE; 8+ years in system integration and software/firmware for high-speed data streaming; proficient in modern C++ and embedded Linux; strong problem-solving and collaboration.
C++, Linux, Kernel Drivers, DPDK, RDMA, PCIe, FPGA, GPU, SmartNIC, CUDA, GPUDirect, Memory Management, Networking
2w
Save
Mark Applied
Hide
Expert Software Engineer — Rapid System Prototyping & Integration
Santa Rosa, California, United States
$143k-$238k/yr OnsiteFull Time
Keysight Technologies
Keysight TechnologiesNYSE: KEYS: Provides electronic design, test, and simulation software and hardware.
8+ YOEBS/MS/PhD in CS/EE/CE; 8+ years system integration and software/firmware development; strong C++ and embedded Linux; experience with high-speed data streaming, Ethernet, PCIe, and FPGA/IP integration.
Modern C++, Embedded Linux, Kernel drivers, Memory-mapped I/O, DPDK, RDMA/RoCE, PCIe, SmartNIC/DPU, SystemVerilog/VHDL, FPGA, Linux, GPU programming, DMA, Ethernet 100G/200G/400G, Python, CI/CD
1w
Save
Mark Applied
Hide
Software Engineer Intern - Kernels
Burlingame, California, United States
OnsiteAll Commitments Available
Quadric
Quadric: Designing licensable processor IP for on-device AI inference.
0+ YOEPursuing CS/EE degree; strong C/C++ and Python; understanding of computer architecture; problem solving; clear communication.
C, C++, Python, CUDA, DSP, NEON, Triton
1mo
Save
Mark Applied
Hide
Senior Embedded Linux Software Platform Engineer – ROS2 Robotics
Arvada, Colorado, United States
$100k-$250k/yr OnsiteFull Time
AION Robotics
AION Robotics: Developing rugged autonomous ground vehicles for outdoor industrial monitoring.
5+ YOE5+ years in build systems, cross-compilation, containerization, and Linux configuration; ROS2; NVIDIA Jetson/CUDA; CI/CD; embedded Linux.
CMake, Docker, CircleCI, GitHub Actions, GitLab CI, Jetson, ROS2, L4T, Jetpack, Yocto, CUDA, kernel, systemd
1mo
Save
Mark Applied
Hide
Member of Technical Staff, Kernels
San Mateo, California, United States
OnsiteFull Time
Inception
Inception: Builds high-speed parallel-processing large language models.
BS/MS/PhD in CS/Engineering or related field; strong GPU programming (CUDA, CuTe, Triton); systems view of ML frameworks; performance optimization; low-precision formats; distributed training; Python and at least one systems language; Docker, Kubernetes, CI/CD.
CUDA, CuTe, Triton, PyTorch, TensorFlow, Docker, Kubernetes, CI/CD, Python, C++, Rust, Go, XLA, TVM
2mo
Save
Mark Applied
Hide
Manager, Software Engineering-Kernels
Bangalore or Santa Clara
HybridFull Time
d-Matrix
d-Matrix: Designs specialized semiconductor chips for efficient AI inference.
10+ YOE5+ Mgmt10+ years in computer engineering/math/physics with strong CS/architecture knowledge; proficient in C/C++, Python on Linux; experience with ML operators, HW accelerators, and embedded SIMD; leadership and ownership skills.
C, C++, Python, Linux, CUDA, MLIR, LLVM, TVM, Glow, TensorFlow, PyTorch, Tensilica, Hardware accelerators

HiringCafe market data

Gpu Kernel Engineer Jobs market signals in United States

These signals come from live HiringCafe job inventory, not a static article. Use them to compare the market, then open listings above to apply directly.

Open jobs
67
Hiring companies
26
Fresh sample
44% new in 30d
Salary visibility
76% show pay

Companies appearing in this market

AMDNVIDIAGoogleSusquehanna International GroupAppleKeysight TechnologiesModularNetApp

Work location mix in visible results

Onsite: 30Hybrid: 10Remote: 1