Job overview and responsibilities
The Senior DevOps Engineer will drive the development of our AWS-based cloud platform in close collaboration with application teams. This role will lead key initiatives including Infrastructure as a Code (IaC) library, centralized CI/CD pipelines, service meshes and API gateways, reliability engineering, and platform security. The ideal candidate brings a strong background in software development, deep expertise in AWS cloud technologies, and proven leadership in technical delivery.
- Lead Cloud Platform Development efforts across IaC libraries (AWS CDK), CI/CD Pipelines (GitHub Actions).
- Implement and maintain monitoring, logging, and alerting systems to ensure high availability, performance, and reliability of applications using tools like Dynatrace.
- Develop and maintain observability practices and dashboards to track key metrics, error budgets, service-level indicators (SLIs) and service-level objectives (SLOs)
- Own the design and maintenance of the centralized CI/CD frameworks that support multi-team deployments and governance while enabling self-service capabilities for infrastructure provisioning and deployments.
- Collaborate with application teams to define and implement scalable, secure, and reusable cloud architecture patterns.
- Set coding and architecture standards for platform components, ensure consistency, security, and maintainability.
- Evaluate and integrate new AWS services and tools to continuously improve platform capabilities
- Partner with leadership to align platform roadmap with product and organizational goals
- Lead presentations, Lunch and Learn sessions, and workshops to educate internal application teams on platform capabilities and best practices.
What’s needed to succeed (Minimum Qualifications):
- Bachelor’s degree
- Computer science, systems analysis, or related field preferred
- 4+ years of experience in software engineering, software reliability engineering (SRE), platform engineering, DevOps or related roles
- Hands-on expertise in AWS services and architecture, with deep knowledge of Infrastructure as a Code using AWS CDK, CloudFormation, Pulumi, or Terraform
- Monitoring and Logging: Experience with monitoring and logging tools such as DynaTrace, DataDog, OpenTelemetry, Prometheus, Grafana, ELK Stack, or Splunk
- Scripting and Automation: Proficiency in scripting languages such as Typescript, Javascript, or Python for automation tasks
- CI/CD Tools: Extensive experience with CI/CD tools such as Github Actions, Harness, Jenkins, GitLab CI, CircleCI
- Knowledge in cloud-native development and infrastructure automation
- Proven ability to collaborate with software teams and guide them in adopting platform tools and best practices
- Containerization: Strong knowledge of containerization technologies, reliability and performance of containerized applications.
- Must be legally authorized to work in the United States for any employer without sponsorship
What will help you propel from the pack (Preferred Qualifications):
- Master's degree
- Experience building or contributing to internal platform or developer portals using React and TypeScript
- Proficient in application development languages (.NET, Java, python, JavaScript, or Go), with ability to support cross-functional teams consuming platform tools and libraries
- Strong understanding of cloud security, IAM, cost optimization and operational excellence in an AWS environment
- Experience with defining and implementing Error Budgets, SLOs and SLIs for various workload types
- Deep understanding of distributed system patterns (e.g. circuit breakers, retries with exponential backoff)