Posted 8h ago

Head of Infrastructure, Stealth Edge AI Co

@ Montauk Capital
New York City, New York, United States
HybridFull Time
Responsibilities:Own design, Deploy hardware, Tune clusters
Requirements Summary:Lead hardware and infrastructure buildout for edge AI compute; manage GPU infrastructure, data center deployments, and supply chain.
Technical Tools Mentioned:Linux, IPMI, Redfish, PXE, BMC, GPU, GPU scheduling, Automation, Monitoring, OOB management
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Head of Infrastructure

Full Time, NYC Area Preferred

About Montauk Capital

Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards electrified, intelligent technologies reshaping industries and driving unprecedented demand for energy. Our team combines deep investing acumen with decades of operating experience to give founders the strategic clarity and hands-on support that accelerates the building of enduring companies of consequence.

About Stealth Edge AI Co

Co-founded by Montauk Capital, Stealth Edge AI Co is a pre-seed venture specialized in modular, metro-edge AI capabilities. By leveraging existing infrastructure for inference deployment, Edge AI provides low-latency, SLA-guaranteed performance across diverse GPU SKUs and colocation environments. Our technology intelligently routes traffic based on demand proximity and real-world network limitations, bypassing the heavy power and infrastructure requirements of traditional hyperscalers. Currently initiating operations with pilot nodes in NYC, we are executing a city-by-city expansion strategy with plans for a broader multi-metro rollout.

About the Role

We’re building the automation, orchestration, and monitoring layer that unifies disparate metro edge GPU nodes into a single software-managed compute platform. You’ll own the definition, design, implementation, and execution of the hardware and infrastructure buildout, executing strategy across edge data center requirements, GPU selection, supply chain, technical implementation, operational maintenance and deployment as we scale. You’ll take the foundational groundwork and execute across the entire hardware and infrastructure side of our company, transforming our roadmap into production scale compute for AI inferencing.

You’ll ensure the GPU clusters deliver on customer requirements, are highly-available, and will be the hands on expert for the hardware side of our business. Most importantly, you’ll turn our high-level plans into real, technical execution, and will play a key role in making supply chain decisions about infrastructure and how we deploy, scale, and support it.

What You’ll Do

  • Own GPU infrastructure design and implementation details from planning through deployment

  • Own hardware selection, configuration, and deployment across early compute infrastructure

  • Help turn early technical groundwork into a functioning deployed system

  • Own the GPU roadmap we use to entice customers and build partnerships

  • Deploy, operate, and tune GPU clusters for both bare-metal and internal software stack

  • Own resilient networking implementation from each site to the cluster, including a robust OOB network for constant monitoring and management

  • Manage deployments at production scale

  • Interface with site ops on power, cooling, and connectivity

  • Build the automation and monitoring stack for distributed edge nodes

  • Own the supply chain for all infrastructure gear

  • Manage third party hardware vendors on provisioning, maintenance and break-fix support

What You’ll Bring

You’re a strong infrastructure engineer experienced with hardware deployment, data center environments, GPU selection, systems setup and design. You can manage the implementation details end to end and have ownership over the entire process. If AI infrastructure is your jam and you've built systems in production, we want to talk.

  • Strong infrastructure engineering experience and systems-level technical judgment

  • Experience deploying or managing compute infrastructure in real-world environments

  • Experience with data center, hardware, or GPU-based systems implementation

  • Experience owning GPU provisioning, hardware selection, and systems configuration

  • GPU scheduling and orchestration specifics: GPU type awareness, memory management, topology considerations, placement strategies for multi-GPU jobs, and fragmentation minimization

  • Bare-metal provisioning lifecycle: IPMI/Redfish, BMC-based remote management, PXE boot, and automated OS deployment workflows

  • On-board storage

  • Observability stack: distributed configuration and troubleshooting, plus monitoring, alerting, and tracing

  • Deployment planning, Hardware configuration, Operational troubleshooting

  • Linux systems depth: RHEL/Ubuntu, low-level troubleshooting, shell scripting

  • Security and operational best practices for bare metal

  • Deployment tooling at production scale

  • Networking fundamentals for inference workloads and OOB management

  • Startup / 0→1 DNA: You ship fast and communicate clearly.

Why Join Us

  • Category-Defining Opportunity: Solving the AI inference bottleneck without the burden of power and infrastructure constraints

  • Massive Market Opportunity: AI spending projected to exceed hundreds of billions annually, 54GW of AI Inference demand expected by 2030

  • Studio Support: Leverage Montauk Capital's resources, network, and operational expertise during critical early stages

  • Competitive compensation + equity: True ownership over what you build