Vijil is looking for a senior backend engineer (10+ years of experience) to own the platform end-to-end: cloud infrastructure, CI/CD, production operations, and the AI agents and libraries that make up the product. You will work across multiple codebases on a small team, ship to customers every week, and be the engineer others turn to when something breaks or a new service needs to go out. Two things matter most: deep, hands-on cloud development experience, and real production experience with agentic AI systems.

Description

About Us

Vijil is a venture-funded AI software startup on a mission to help organizations build and operate intelligent agents that people can trust. We are building tools and services to

enable AI developers to continuously improve the security and safety of custom large language models.

About the Role

You will own Vijil's backend: the platform services that run the product, the pipelines that ship them, the infrastructure they run on, and the AI agents and libraries we build on

top. You will work across multiple codebases on a small team, and you will be responsible for a production environment that customers depend on. When something breaks at 9pm, you are

the person other engineers come to. When a new service needs to ship, you are the person who decides how.

Day-to-day:

- Own production end-to-end: cloud infrastructure, deployments, observability, on-call.

- Own CI/CD: build, test, release, and image promotion across every repository we ship.

- Build, deploy, and maintain the AI agents and libraries that make up our product.

- Write production-quality backend code in Python.

- Set the engineering bar: clean architecture, real tests against real systems, no silent failures.

- Cut through cross-repo work that no one else has the context to finish.

This is not a research role. You will work alongside applied scientists who ship algorithms; your job is to make those algorithms run reliably for paying customers.

About You

You have spent the last decade shipping backend systems and running them in production. You have built or operated AI agents that other software relies on, so you know what these systems look like when they go wrong and what it takes to keep them running. You tend to read code before you write it, and you debug by tracing through the call chain rather than guessing. You hold strong architectural opinions and put them in writing, but once the team has decided, you build what was decided. You are comfortable working across many repositories at once because you have done it before. You are looking for a place where your work reaches customers quickly and your judgment carries weight from day one.

Minimum Qualifications

- 10+ years of professional software engineering, primarily backend.

- Core cloud development competency. You have designed, deployed, and operated production services on a major cloud. You understand networking, identity (IAM or equivalent), storage

(S3 or equivalent), and compute (EC2/EKS or equivalent) at the level needed to debug them under pressure. This is non-negotiable.

- Hands-on experience building or operating agentic AI systems in production — LLM-backed agents that call tools, coordinate with other agents, and run as services. You can speak

concretely about what failed in production and how you fixed it. This is non-negotiable.

- Strong Python, including modern async, type hints, and production frameworks such as FastAPI and Pydantic.

- Ownership of CI/CD pipelines you built or rebuilt yourself (GitHub Actions, AWS CodeBuild, or equivalent).

- Container-based deployment and orchestration at production depth: Docker, Kubernetes, and Helm.

- Working fluency with relational databases (PostgreSQL preferred), including writing migrations and debugging slow queries.

- Disciplined Git workflow, code review hygiene, and semantic commit history. GitHub link appreciated.

- Strong written communication. You write design docs that other engineers actually read.

Preferred Qualifications

- Familiarity with agent interoperability standards such as MCP (Model Context Protocol) or A2A (Agent-to-Agent).

- Experience with agent development frameworks such as Google ADK, LangGraph, or equivalent.

- Production observability experience with OpenTelemetry or equivalent.

- Experience operating a multi-service platform owned by a small team.

- Open-source maintainer or contributor history.

- Prior work on AI infrastructure, agent platforms, or evaluation systems.

- Comfort working directly with applied scientists and translating research into production code.

Benefits

- Competitive base pay and equity.

- Health, dental, and vision insurance plans.

- Flexible time-off policy.

- Paid parental and family leave.

- Immigration sponsorship.

How to Apply

Send your resume and links to relevant work — repositories, systems you have operated, design docs you are proud of — to [email protected]. Tell us about one production incident you owned end-to-end: what broke, how you found it, what you changed.

Vijil is an equal opportunity employer and complies with all applicable federal, state, and local laws pertaining to fair employment practices.

About the Company

Vijil is an AI software startup on a mission to help organizations build and operate autonomous agents that they can trust. Founded by technologists who built the deep learning platform in Amazon SageMaker at AWS and established responsible AI practices at Twitch and Splunk, Vijil is helping AI developers build-in trust into open-source large language models. If you're passionate about improving the explainability, security, and safety of neural networks that will be used in business-critical applications over the next decade, you'll enjoy making an impact at Vijil. Join us today to grow with the company and its products and customers.

Senior Software Development Engineer

Description

About the Company