Head of Infrastructure
Full Time, NYC Area Preferred
About Montauk Capital
Montauk Capital builds and backs companies at the forefront of the Electron Economy, the generational shift towards electrified, intelligent technologies reshaping industries and driving unprecedented demand for energy. Our team combines deep investing acumen with decades of operating experience to give founders the strategic clarity and hands-on support that accelerates the building of enduring companies of consequence.
About Stealth Edge AI Co
Co-founded by Montauk Capital, Stealth Edge AI Co is a pre-seed venture specialized in modular, metro-edge AI capabilities. By leveraging existing infrastructure for inference deployment, Edge AI provides low-latency, SLA-guaranteed performance across diverse GPU SKUs and colocation environments. Our technology intelligently routes traffic based on demand proximity and real-world network limitations, bypassing the heavy power and infrastructure requirements of traditional hyperscalers. Currently initiating operations with pilot nodes in NYC, we are executing a city-by-city expansion strategy with plans for a broader multi-metro rollout.
About the Role
We’re building the automation, orchestration, and monitoring layer that unifies disparate metro edge GPU nodes into a single software-managed compute platform. You’ll own the definition, design, implementation, and execution of the hardware and infrastructure buildout, executing strategy across edge data center requirements, GPU selection, supply chain, technical implementation, operational maintenance and deployment as we scale. You’ll take the foundational groundwork and execute across the entire hardware and infrastructure side of our company, transforming our roadmap into production scale compute for AI inferencing.
You’ll ensure the GPU clusters deliver on customer requirements, are highly-available, and will be the hands on expert for the hardware side of our business. Most importantly, you’ll turn our high-level plans into real, technical execution, and will play a key role in making supply chain decisions about infrastructure and how we deploy, scale, and support it.
What You’ll Do
Own GPU infrastructure design and implementation details from planning through deployment
Own hardware selection, configuration, and deployment across early compute infrastructure
Help turn early technical groundwork into a functioning deployed system
Own the GPU roadmap we use to entice customers and build partnerships
Deploy, operate, and tune GPU clusters for both bare-metal and internal software stack
Own resilient networking implementation from each site to the cluster, including a robust OOB network for constant monitoring and management
Manage deployments at production scale
Interface with site ops on power, cooling, and connectivity
Build the automation and monitoring stack for distributed edge nodes
Own the supply chain for all infrastructure gear
Manage third party hardware vendors on provisioning, maintenance and break-fix support
What You’ll Bring
You’re a strong infrastructure engineer experienced with hardware deployment, data center environments, GPU selection, systems setup and design. You can manage the implementation details end to end and have ownership over the entire process. If AI infrastructure is your jam and you've built systems in production, we want to talk.
Strong infrastructure engineering experience and systems-level technical judgment
Experience deploying or managing compute infrastructure in real-world environments
Experience with data center, hardware, or GPU-based systems implementation
Experience owning GPU provisioning, hardware selection, and systems configuration
GPU scheduling and orchestration specifics: GPU type awareness, memory management, topology considerations, placement strategies for multi-GPU jobs, and fragmentation minimization
Bare-metal provisioning lifecycle: IPMI/Redfish, BMC-based remote management, PXE boot, and automated OS deployment workflows
On-board storage
Observability stack: distributed configuration and troubleshooting, plus monitoring, alerting, and tracing
Deployment planning, Hardware configuration, Operational troubleshooting
Linux systems depth: RHEL/Ubuntu, low-level troubleshooting, shell scripting
Security and operational best practices for bare metal
Deployment tooling at production scale
Networking fundamentals for inference workloads and OOB management
Startup / 0→1 DNA: You ship fast and communicate clearly.
Why Join Us
Category-Defining Opportunity: Solving the AI inference bottleneck without the burden of power and infrastructure constraints
Massive Market Opportunity: AI spending projected to exceed hundreds of billions annually, 54GW of AI Inference demand expected by 2030
Studio Support: Leverage Montauk Capital's resources, network, and operational expertise during critical early stages
Competitive compensation + equity: True ownership over what you build