Posted 5d ago

AgenticOps Engineer

@ Zywave
United States
RemoteFull Time
Responsibilities:monitoring agent performance, engineering prompts, diagnosing escalation causes
Requirements Summary:5+ years software engineering; strong fundamentals in systems thinking and debugging; hands-on experience with LLM APIs (prompt design, chain-of-thought, tool use, function calling); ability to diagnose complex issues.
Technical Tools Mentioned:LLM APIs, prompt engineering, dashboards, observability tooling
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Description

Role Summary

The Agentic Ops Engineer operates as a cross-team specialist responsible for the health, reliability, and continuous improvement of AI agent output across the organization. They are not embedded in any single team’s daily sprint cycle. Instead, they maintain a bird’s-eye view of how agents perform across a handful of product engineering teams, identifying systemic patterns, diagnosing recurring failure modes, and tuning the shared prompt and tooling infrastructure that every team depends on.

When a Product Engineer hits a wall with agent output quality—whether it’s a spec the agent can’t parse, a prompt structure that produces inconsistent results, or a workflow that degrades over time—the Agentic Ops Engineer is who they pull in.

Core Responsibilities

  1. Agent Performance Monitoring & Pattern Detection

    • Continuously monitor agent output quality, latency, and failure rates across all three product teams.

    • Identify recurring patterns such as “agents keep struggling with this type of spec” or “this prompt structure consistently produces higher-quality output.

    • Build and maintain dashboards and alerting systems that surface degradation before teams feel it.

    • Conduct periodic reviews of agent interaction logs to flag systemic issues and emerging trends.

  2. Prompt Engineering & System Tuning

    • Own the shared prompt infrastructure: templates, system prompts, few-shot libraries, and chain-of-thought scaffolding used across teams.

    • Iterate on prompt structures based on observed failure modes and A/B performance data.

    • Develop and maintain a prompt playbook documenting what works, what doesn’t, and why.

    • Evaluate and integrate new model capabilities, versioning changes, and API updates as they roll out from providers.

  3. Escalation Support & Embedded Problem-Solving

    • Serve as the on-call specialist when a Product Engineer encounters persistent agent output quality issues.

    • Diagnose root causes: Is it the prompt? The spec format? The model’s limitations? A context window issue?

    • Pair with Product Engineers to rapidly prototype and test fixes, then roll improvements back into shared systems.

    • Maintain a knowledge base of resolved issues and their solutions to reduce repeat escalations.

  4. Tooling, Evaluation & Infrastructure

    • Build and maintain evaluation harnesses, benchmarks, and regression test suites for agent workflows.

    • Develop internal tooling for prompt version control, output comparison, and automated quality scoring.

    • Collaborate with platform/infra teams to optimize agent execution pipelines (caching, context management, token budgets).

    • Establish and track key metrics: output acceptance rate, revision frequency, time-to-resolution on escalations.

  5. Knowledge Sharing & Team Enablement

    • Run regular cross-team syncs sharing findings, patterns, and updated best practices.

    • Produce internal documentation, guidelines, and training materials on working effectively with agents.

    • Coach Product Engineers on prompt construction, spec formatting, and debugging agent behavior.

    • Serve as the organizational point of contact for agent-related decisions (model selection, provider evaluation, capability assessments).

Requirements

Qualifications

Required

  • 5+ years of software engineering experience with strong fundamentals in systems thinking and debugging.

  • Hands-on experience building with LLM APIs (prompt design, chain-of-thought, tool use, function calling).

  • Demonstrated ability to diagnose and resolve complex, cross-cutting technical issues.

  • Strong analytical skills: comfortable building dashboards, writing queries, and reasoning about statistical patterns in output quality.

  • Excellent written and verbal communication—this role lives on documentation, cross-team clarity, and knowledge transfer.

Preferred

  • Experience with prompt evaluation frameworks, LLM observability tools (e.g., LangSmith, Braintrust, Humanloop), or building internal evaluation harnesses.

  • Background in developer tooling, platform engineering, or SRE/DevOps with an understanding of reliability principles applied to non-deterministic systems.

  • Familiarity with multiple LLM providers and models; able to reason about trade-offs in capability, cost, and latency.

  • Experience working cross-functionally across multiple product teams without direct authority.

Success Metrics (First 6 Months)

  1. Agent output acceptance rate increases measurably across all three teams (baseline established in month 1).

  2. Median escalation resolution time drops by 40%+ as patterns are documented and systemic fixes are applied.

  3. Prompt playbook and evaluation harness are live and actively used by all three teams.

  4. Repeat escalation rate for known failure modes trends toward zero as systemic fixes propagate.

  5. Cross-team visibility into agent performance is self-serve via dashboards and a monthly report cadence.

What This Role Is Not

  • Not a team lead or manager — this is an IC role with cross-cutting influence, not authority.

  • Not a data scientist — although analytical skills are essential, the focus is operational, not research.

  • Not an ML engineer — you’re tuning how the organization uses models, not training or fine-tuning them.

  • Not a project manager — you don’t own team roadmaps; you own the health of the agent layer beneath them.