Job Description
Key Responsibilities:
Monitor platform health and data feeds daily to ensure continuity and accuracy of
operations
Respond to system alerts, investigate incidents, and perform root cause analysis to prevent
recurrence
Architect and implement scalable microservices in Python and Go
Design, build, and maintain GitLab CI/CD pipelines with AI-integrated workflow
automation
Integrate Snowflake Cortex and Azure AI Foundry capabilities into data and platform
workflows
Manage deployments and operations via Kubernetes CLI and GitOps workflows
Integrate with AWS services including S3, Lambda, SQS, and EC2
Implement secure authentication and authorization using EntraID
Manage and rotate secrets, credentials, and access keys per security best practices
Perform regular patching, upgrades, and maintenance with minimal service disruption
Participate in on-call rotations to support production systems"
Required Qualifications:
6+ years of combined experience in DevOps, Site Reliability Engineering, or platform
engineering
High-level proficiency in Python and Go
Strong experience with Kubernetes CLI, GitOps workflows, and GitLab CI/CD
Hands-on experience with Azure and/or AWS cloud services
Working knowledge of Snowflake Cortex or equivalent AI/ML-integrated data platform
capabilities
Familiarity with LLM API consumption and prompt engineering for automation use cases
Familiarity with Azure EntraID for identity and access management
Active proficiency with AI-assisted development tools (GitHub Copilot, Cursor, or
equivalent)
Proficiency in Python for data manipulation (Pandas, Snowpark, Polars, or PySpark"
Mandatory Skills :
Experience with Helm, Jinja-based templating, and Kubernetes networking Familiarity with Azure AI Foundry and ML model deployment pipelines Experience integrating ML models into cloud-native services Knowledge of observability stacks: Prometheus, Grafana, Splun