We are seeking a Senior MLOps Engineer to design, build, and maintain the infrastructure and pipelines that operationalize AI and Machine Learning systems at scale. This role bridges the gap between model development and production deployment—ensuring ML and GenAI workloads are reliable, observable, cost-efficient, and continuously improving across enterprise environments.
Key Responsibilities
- Design and implement end-to-end ML pipelines covering data ingestion, feature engineering, model training, evaluation, and deployment.
- Build and manage CI/CD pipelines for ML models, including automated testing, validation, and rollback mechanisms.
- Architect and maintain model serving infrastructure for real-time and batch inference workloads, including LLM and agentic AI deployments.
- Implement model monitoring, drift detection, and alerting systems to ensure production model health and reliability.
- Manage experiment tracking, model versioning, and artifact registries to enable reproducibility and governance.
- Optimize compute costs and inference latency across GPU/CPU workloads on cloud platforms (AWS, Azure, or GCP).
- Containerize and orchestrate ML workloads using Docker and Kubernetes.
- Automate data pipeline workflows and feature store management for training and inference.
- Collaborate with AI Engineers, Data Scientists, and Platform teams to streamline the path from prototype to production.
- Establish and enforce MLOps best practices, standards, and documentation across the engineering organization.