Posted 1mo ago

Senior Site Reliability Engineer

@ Satsuma
Austin or North America
RemoteFull Time
Responsibilities:Own infrastructure, Build pipelines, Define SLOs
Requirements Summary:5-8 years in SRE/DevOps; cloud providers; Kubernetes, Terraform; observability tools; scripting; AI-assisted development; on-call experience; startup/SaaS background; API gateway or commerce stacks; MCP AI infra.
Technical Tools Mentioned:Kubernetes, Terraform, Datadog, Grafana, Copilot, Claude Code, MCP
Save
Mark Applied
Hide Job
Report & Hide
Job Description

About Satsuma

Satsuma is a commerce iPaaS that builds merchant-specific APIs, MCP Servers, and MCP Apps, enabling retailers to connect their full commerce stack once and deploy branded shopping experiences across every AI channel. We work with enterprise retailers and move fast. Our infra has to match.

The role

We're looking for a Senior SRE to own the reliability, scalability, and operational posture of Satsuma's multi-cloud infrastructure. You'll be the person who keeps things running, builds the systems that prevent fires, and makes on-call not terrible.

This is an infra-first role. But we're an AI-native company, and we expect you to use AI-assisted development (Claude Code) as a core part of your workflow — writing tooling, automating runbooks, building internal utilities.

What you'll do

  • Own infrastructure across AWS, GCP, and Azure environments
  • Build and maintain CI/CD pipelines, observability stacks, and incident response workflows
  • Define and enforce SLOs/SLIs; lead postmortems
  • Author and maintain IaC (Terraform preferred)
  • Write internal tooling and automation using AI-assisted development workflows
  • Partner closely with engineering on reliability reviews and architecture decisions