United StatesRemote · Hybrid · Onsite · All Environments

📣 HiringCafe is hiring a Founding Growth Hacker! Learn More · Apply here

National browse hub

Gpu Kernel Engineer Jobs in United States

67 gpu kernel engineer jobs from 26 companies hiring in United States.

Explore by state:California·Texas·New York·Florida·Illinois·Pennsylvania·Ohio·North Carolina·Michigan·Georgia·New Jersey·Virginia·Massachusetts·Washington·Indiana·Tennessee·Arizona·Missouri·Maryland·Wisconsin·Minnesota·Colorado·Alabama·Louisiana·South Carolina·Kentucky·Oregon·Oklahoma·Connecticut·Iowa·Mississippi·Arkansas·Kansas·Utah·Nevada·New Mexico·West Virginia·Nebraska·Idaho·Maine·New Hampshire·Hawaii·Rhode Island·Montana·Delaware·South Dakota·Alaska·North Dakota·Vermont·District of Columbia·Wyoming

Join our community Talent Network

67 jobs- United States

1mo

Save

Mark Applied

Hide

1mo

Staff GPU Kernel Engineer – AI & Deep Learning

San Jose, California, United States

HybridFull Time

AMDNASDAQ: AMD: Designs and manufactures computer processors and graphics technology.

0+ YOEGPU kernel development experience; AI frameworks (PyTorch, vLLM, SGLang); C++, Python; collaboration; BS/MS/PhD in CS/CE/EE.

Python, C++, CUDA, HIP, Assembly, Triton, CK, CUTLASS, PyTorch, SGLang, vLLM

Job Posting

View all

Save

Mark Applied

Hide

Senior Compute Kernel Architect, GPU Power

Santa Clara, California, United States

$184k-$357k/yr OnsiteFull Time

NVIDIANASDAQ: NVDA: Designs GPU-accelerated computing and artificial intelligence hardware.

5+ YOEMS or PhD in CS/EE/CE; 5+ years in GPU kernel development; strong CUDA/C++; GPU profiling; PDN/power-aware design; Python automation; cross-disciplinary collaboration.

CUDA, C++, Python, Nsight Compute, Nsight Systems, nvprof

Job Posting

View all

1mo

Save

Mark Applied

Hide

1mo

GPU Performance Engineer | Experienced Hire

New York, New York, United States

$200k-$300k/yr OnsiteFull Time

Susquehanna International Group: Global quantitative trading firm providing proprietary financial market liquidity.

Strong CUDA kernel optimization, C/C++, GPU architecture, numerical stability, low-level systems.

CUDA, C/C++, ONNX Runtime, TensorRT, Triton, TVM

Job Posting

View all

2mo

Save

Mark Applied

Hide

2mo

Senior ML Accelerator Engineer - GPU

Sunnyvale or Washington or Austin or San Francisco or Warren

$129k-$261k/yr HybridFull Time

General MotorsNYSE: GM: Manufactures automobiles and provides vehicle-related financial services.

3+ YOE3+ years of relevant experience; strong CUDA GPU programming; kernel development and performance optimization; collaboration across teams.

CUDA, NSight, CUTLASS, CuTe, GPU, Kernel development

Job Posting View all

Save

Mark Applied

Hide

Software Engineer - C++ GPU Performance

Foster City or Boston or San Diego or Seattle

$168k-$239k/yr HybridFull Time

ZooxNASDAQ: AMZN: Developing autonomous robotaxis for urban ride-hailing services.

3+ YOE3+ years experience; BS in Computer Science or related field; strong CUDA, C++, Linux; GPU performance optimization; experience with GPU kernels and performance tooling.

CUDA, Nsight, C++, Linux, TensorRT, XLA, OpenGL, RocM

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Research Engineer - AI Performance & Kernel Optimization

San Francisco, California, United States

OnsiteFull Time

Zyphra: Develops multimodal AI models and autonomous agent software platforms.

Strong systems mindset; low-level performance intuition; ability to learn new systems quickly; excellent collaboration; eager to optimize performance.

PTX, CUDA, HIP, Triton, GPU kernels

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Principal ML Systems Engineer – AI for Quantum

Milpitas, California, United States

$220k-$245k/yr OnsiteFull Time

PsiQuantum: Builds fault-tolerant quantum computers using silicon photonics technology.

Proficient Python, PyTorch, distributed GPU training, CUDA kernel development, and ML infrastructure; experience with autonomous/agentic AI systems.

Python, PyTorch, CUDA, ROCm, Triton, NCCL, Slurm, GPU, GPU kernel development, ML infrastructure, distributed training

Job Posting View all

2mo

Save

Mark Applied

Hide

2mo

Distributed Training & Performance Engineer - Vice President

New York, New York, United States

$164k-$260k/yr OnsiteFull Time

JPMorgan ChaseNYSE: JPM: Global financial firm providing banking and investment management services.

3+ YOEMaster’s degree with 3+ years or Ph.D. with 1+ years in CS/physics/math/engineering; strong distributed training, GPU programming, kernel optimization; Python and C++ proficiency; experience with PyTorch/JAX.

CUDA, Triton, Nsight, PyTorch, JAX, Python, C++

Job Posting View all

2mo

Save

Mark Applied

Hide

2mo

Principal Systems Software Engineer

San Francisco, California, United States

$260k-$340k/yr OnsiteFull Time

Crusoe: Builds and operates sustainable data centers for AI workloads.

12+ YOE12+ years designing and shipping core infrastructure; strong Linux kernel, virtualization (KVM/QEMU/Firecracker), and high-performance networking expertise.

Linux kernel, KVM/QEMU/Firecracker, RoCE v2, InfiniBand, NVIDIA GPUs, AMD GPUs, Kubernetes, Slurm, SR-IOV, RDMA, GPU scheduling, memory management, container orchestration

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Software Engineer, Hardware Enablement

United Kingdom or Norway or United States

£115k-£140k/yr RemoteFull Time

Modular: Unified software infrastructure and programming language for AI development.

5+ YOE5+ years in high-performance computing or compiler engineering; proficient in C++ and heterogeneous programming models (CUDA/OpenCL/SYCL); experience with GPU kernels, ML frameworks (PyTorch at C++ level), and porting to new hardware.

CUDA, OpenCL, SYCL, C++, PyTorch (C++), Mojo, MLIR, LLVM

Job Posting

View all

1mo

Save

Mark Applied

Hide

1mo

ML Framework (MetalLM) Engineer, Graphics, Game and ML

California, United States

OnsiteFull Time

AppleNASDAQ: AAPL: Designs and sells consumer electronics, software, and online services.

3+ YOE3+ years in C/C++/ObjC; GPU kernel development with Metal/CUDA; distributed training/inference; system programming and architecture.

Metal, CUDA, CuTE, CuTile, Triton, OpenXLA, LLVM

Job Posting

View all

Save

Mark Applied

Hide

Principal Software Engineer - AI and Simulation

Sunnyvale or Austin

$280k-$350k/yr OnsiteFull Time

Apptronik: Designs and manufactures humanoid robots for industrial automation.

12+ YOELead embedded AI and simulation development; GPU orchestration, on-device AI, and real-time robotics performance.

C/C++, Linux, Kernel development, HAL, Graphics, GPU programming, Embedded systems, Runtime systems, AI/ML deployment

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Senior Engineer, AI Systems

San Jose, California, United States

$138k-$206k/yr OnsiteFull Time

Samsung SemiconductorKorea Exchange: 005930: Designs and manufactures memory chips, processors, and sensors.

3+ YOEBachelor’s with 5+ years or Master’s with 3+ years or PhD with 0+ years; strong Triton kernel development; LLM fundamentals; accelerator hardware knowledge; Python and systems programming; experience with hardware–software co-design or compiler optimization.

Triton, CUDA, Python, GPUs, accelerator_hardware, compilers

Job Posting View all

Save

Mark Applied

Hide

AI System Research and Development Engineer - Optimization

Bellevue or Menlo Park

$200k-$288k/yr HybridFull Time

SnowflakeNYSE: SNOW: Cloud-based platform for data storage, processing, and analytics.

5+ YOEDesign and optimize GPU kernels for LLM training/inference; develop scalable DL systems; profile/benchmark; reduce latency; contribute to agentic frameworks.

PyTorch, TensorFlow, JAX, CUDA, CUTLASS, Triton, cuDNN, nvprof, Nsight

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Software Engineer III, ML Infrastructure, AI and Infrastructure

Mountain View, California, United States

$147k-$211k/yr OnsiteFull Time

GoogleNASDAQ: GOOGL: Provides online search, advertising, cloud computing, and consumer electronics.

2+ YOEBachelor's or equivalent; 2y Python/C++; 1y ML infra; 1y low-level programming; HPC experience; preferred MS/PhD; 2y data structures/algorithms; GPU/TPU kernels; compilers.

Python, C++, GPU programming, TPU kernels, Compilers, HPC

Job Posting

View all

Save

Mark Applied

Hide

System Software Engineer, Graphics & Camera Pipeline

San Jose or Bellevue

$200k-$275k/yr OnsiteFull Time

Rivet Industries: Develops ruggedized wearable systems for defense and industrial workforces.

5+ YOE5+ years in system software/graphics/camera pipelines with Linux/Android system level experience. Strong driver/kernel knowledge; OpenGL/Vulkan/CUDA experience; real-time imaging focus.

OpenGL, Vulkan, CUDA, Linux, Android, AOSP, DMA, GPU interop, kernel, drivers

Job Posting View all

Save

Mark Applied

Hide

Embedded Software Engineer - Platform

Sunnyvale, California, United States

$135k-$228k/yr OnsiteFull Time

IntuitiveNASDAQ: ISRG: Robotic-assisted systems for minimally invasive surgery.

5+ YOEBachelor's or Master's in a technical field with 3-5 years experience; strong C/C++ for embedded systems; embedded Linux; hardware interfaces; debugging; teamwork.

C/C++, Embedded Linux, GDB, JTAG, Cross-compilation, Kernel, Device drivers, User space applications, i2c, SPI, UART, USB, Ethernet, CAN, Python, bash scripting, NVIDIA Jetson, CUDA, TensorRT

Job Posting View all

2mo

Save

Mark Applied

Hide

2mo

Member of Technical Staff, ML Kernels

Santa Clara or Boston

OnsiteFull Time

Netpreme: Develops photonic-electronic memory fabrics for AI infrastructure.

Design, optimize, and benchmark high-performance ML kernels for GPUs; strong CUDA and C++; profiling and debugging experience; independent problem solving.

CUDA, Nsight, nvprof, C++, GPU

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

High Performance Computing Software Engineer - Supercomputing

Sunnyvale, California, United States

$150k-$300k/yr OnsiteFull Time

Institute of Foundation Models: Develops open-source frontier-class AI foundation models and research.

Develop and optimize software for large-scale ML workloads (1000+ GPUs); deep Linux kernel and GPU kernel knowledge; distributed libraries (NCCL, MPI, UCX, RCCL, SHARP, Libfabric); ML frameworks (PyTorch, TensorFlow, JAX, MegatronLM); HPC scheduling/orchestration (Slurm, Kubernetes, Pyxis).

NCCL, RCCL, MPI, UCX, SHARP, Libfabric, PyTorch, TensorFlow, JAX, MegatronLM, DeepSpeed, Slurm, Kubernetes

Job Posting View all

2mo

Save

Mark Applied

Hide

2mo

Senior Embedded Systems & Hardware-in-the-Loop Engineer

Laurel, Maryland, United States

$105k-$290k/yr OnsiteFull Time

Johns Hopkins University Applied Physics Laboratory: Conducts research and engineering for national security and space.

8+ YOEBS in computer/electrical engineering or computer science with 8+ years; strong C++ (real-time, multi-threading); FPGA firmware (VHDL/Verilog); embedded Linux (kernel drivers, PetaLinux/Yocto); HWIL test systems; ability to obtain Interim Secret and eventual Secret clearance; US citizenship.

C++, VHDL, Verilog, FPGA, Embedded Linux, Kernel Programming, Device Drivers, PetaLinux, Yocto, CUDA, OpenCV, GStreamer, MATLAB, Python

Job Posting View all

Save

Mark Applied

Hide

Senior Embedded Software Engineer – Cyber

San Diego, California, United States

$158k-$237k/yr OnsiteFull Time

Innoflight: Designs and manufactures compact cyber-secure avionics for space missions.

9+ YOESenior embedded software engineer with 9+ years in C/C++, embedded Linux, RTOS, cryptography, secure networking, and hardware integration; requires active U.S. security clearance.

C, C++, Embedded Linux, Linux kernel, RTOS, U-Boot, GRUB, TLS, IPsec, AES, RSA, ECDSA, ECDH, OpenCL, CUDA, SPI, I2C, UART, PCIe, Ethernet, SpaceWire

Job Posting View all

Save

Mark Applied

Hide

AI Software Engineer: Intelligent Data Infrastructure (San Jose, CA, US, 95128)

San Jose or Boulder or Pittsburgh or Raleigh

$131k-$195k/yr HybridFull Time

NetAppNASDAQ: NTAP: Sells enterprise data storage and cloud management software.

8+ YOE8+ years in software development; strong Golang, Python, C/C++; Linux kernel, file systems, distributed systems; AI/ML infra knowledge.

Golang, Python, C/C++, Linux, Kubernetes, GPU computing, Distributed systems, Storage systems

Job Posting View all

Save

Mark Applied

Hide

AI Software Engineer: Intelligent Data Infrastructure (San Jose, CA, US, 95128)

San Jose or Boulder or Pittsburgh or Raleigh

$131k-$195k/yr HybridFull Time

NetAppNASDAQ: NTAP: Provides intelligent data infrastructure and management solutions.

8+ YOE8+ years software development; expert in Golang, Python, C/C++; Linux kernel and distributed systems; AI/ML infra familiarity; strong problem solving and collaboration.

Golang, Python, C/C++, Linux, Kubernetes, GPU

Job Posting View all

Save

Mark Applied

Hide

Expert Software Engineer — Rapid System Prototyping & Integration

Santa Rosa or Santa Clara

$143k-$238k/yr OnsiteFull Time

Keysight TechnologiesNew York Stock Exchange: KEYS: Manufactures hardware and software for electronic test and measurement.

8+ YOEBS/MS/PhD in CS/EE/CE; 8+ years in system integration and software/firmware for high-speed data streaming; proficient in modern C++ and embedded Linux; strong problem-solving and collaboration.

C++, Linux, Kernel Drivers, DPDK, RDMA, PCIe, FPGA, GPU, SmartNIC, CUDA, GPUDirect, Memory Management, Networking

Job Posting View all

Save

Mark Applied

Hide

Expert Software Engineer — Rapid System Prototyping & Integration

Santa Rosa, California, United States

$143k-$238k/yr OnsiteFull Time

Keysight TechnologiesNYSE: KEYS: Provides electronic design, test, and simulation software and hardware.

8+ YOEBS/MS/PhD in CS/EE/CE; 8+ years system integration and software/firmware development; strong C++ and embedded Linux; experience with high-speed data streaming, Ethernet, PCIe, and FPGA/IP integration.

Modern C++, Embedded Linux, Kernel drivers, Memory-mapped I/O, DPDK, RDMA/RoCE, PCIe, SmartNIC/DPU, SystemVerilog/VHDL, FPGA, Linux, GPU programming, DMA, Ethernet 100G/200G/400G, Python, CI/CD

Job Posting View all

Save

Mark Applied

Hide

Software Engineer Intern - Kernels

Burlingame, California, United States

OnsiteAll Commitments Available

Quadric: Designing licensable processor IP for on-device AI inference.

0+ YOEPursuing CS/EE degree; strong C/C++ and Python; understanding of computer architecture; problem solving; clear communication.

C, C++, Python, CUDA, DSP, NEON, Triton

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Senior Embedded Linux Software Platform Engineer – ROS2 Robotics

Arvada, Colorado, United States

$100k-$250k/yr OnsiteFull Time

AION Robotics: Developing rugged autonomous ground vehicles for outdoor industrial monitoring.

5+ YOE5+ years in build systems, cross-compilation, containerization, and Linux configuration; ROS2; NVIDIA Jetson/CUDA; CI/CD; embedded Linux.

CMake, Docker, CircleCI, GitHub Actions, GitLab CI, Jetson, ROS2, L4T, Jetpack, Yocto, CUDA, kernel, systemd

Job Posting View all

1mo

Save

Mark Applied

Hide

1mo

Member of Technical Staff, Kernels

San Mateo, California, United States

OnsiteFull Time

Inception: Builds high-speed parallel-processing large language models.

BS/MS/PhD in CS/Engineering or related field; strong GPU programming (CUDA, CuTe, Triton); systems view of ML frameworks; performance optimization; low-precision formats; distributed training; Python and at least one systems language; Docker, Kubernetes, CI/CD.

CUDA, CuTe, Triton, PyTorch, TensorFlow, Docker, Kubernetes, CI/CD, Python, C++, Rust, Go, XLA, TVM

Job Posting View all

2mo

Save

Mark Applied

Hide

2mo

Manager, Software Engineering-Kernels

Bangalore or Santa Clara

HybridFull Time

d-Matrix: Designs specialized semiconductor chips for efficient AI inference.

10+ YOE5+ Mgmt10+ years in computer engineering/math/physics with strong CS/architecture knowledge; proficient in C/C++, Python on Linux; experience with ML operators, HW accelerators, and embedded SIMD; leadership and ownership skills.

C, C++, Python, Linux, CUDA, MLIR, LLVM, TVM, Glow, TensorFlow, PyTorch, Tensilica, Hardware accelerators

Job Posting View all

HiringCafe market data

Gpu Kernel Engineer Jobs market signals in United States

These signals come from live HiringCafe job inventory, not a static article. Use them to compare the market, then open listings above to apply directly.

Open jobs

Hiring companies

Fresh sample

44% new in 30d

Salary visibility

76% show pay

Companies appearing in this market

AMDNVIDIAGoogleSusquehanna International GroupAppleKeysight TechnologiesModularNetApp

Work location mix in visible results

Onsite: 30Hybrid: 10Remote: 1

Related browse hubs

Jobs in United States Government Jobs Engineering Jobs Customer Service Jobs Graphic Designer Jobs Entry Level Jobs Full Time Jobs High Paying Jobs