Posted 10mo ago

Graphics Processing Unit (GPU) Engineer - Top Secret/SCI

@ Sunayu
Bethesda, Maryland, United States
HybridFull Time
Responsibilities:Architect GPU, Integrate OS, Optimize drivers
Requirements Summary:Senior GPU engineer with 10+ years in GPU architecture, OS integration, and Linux-based performance optimization.
Technical Tools Mentioned:CUDA, OpenCL, Linux, Python, BASH, Ansible, Puppet, Salt, Terraform, GPU debugging tools, Prometheus, Grafana, Slurm
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Location: Bethesda, MD

Category: Systems Engineer 

Travel Required: No

Remote Type: Onsite

Clearance: Top Secret/SCI



Sunayu, LLC is looking for a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking.  In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers.

This is a 100% on-site position. All work must be performed at the customer site in Bethesda at the Intelligence Community Campus.


Primary Responsibilities 

  • GPU Cluster Engineering: Design, configure, and maintain GPU Clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements.
  • Operating System Integration: Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates.
  • Performance Optimization: Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers.
  • Tooling and Automation: Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt.
  • Compliance & Documentation: Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards.

   

Basic Qualifications 

  • Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree.
  • 10+ years of relevant systems engineering experience
  • Experience in managing NVIDIA GPU data center platforms. (DGX, HGX, H200, H100, L4s).
  • Knowledge of enterprise server components (storage/network controllers, HBA, SSDs). 
  • Strong expertise with Linux distributions. (RHEL, Ubuntu, Oracle, and Rocky).
  • Excellent problem-solving skills and the ability to collaborate within a team.
  • Candidate must, at a minimum, meet DoD 8140/8570- IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP along with an appropriate computing environment (CE) certification). An IAT Level III certification would also be acceptable (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).

Clearance

  • Due to the nature of the government contracts we support, US Citizenship is required.
  • TS/SCI clearance with Polygraph required or a TS/SCI and willingness to obtain a Polygraph prior to starting.


Preferred Qualifications 

  • Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow).
  • Familiarity with GPU virtualization and cloud computing.
  • Experience with Prometheus/Grafana for monitoring.
  • Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.).