Posted 3w ago

AI Infrastructure Engineer

@ Electronic Arts
Vancouver, British Columbia, Canada
HybridFull Time
Responsibilities:Lead design, Build automation, Architect IaC
Requirements Summary:Design and operate AI-powered, cloud-native infrastructure; implement AIOps; IaC (AWS CDK); CI/CD, observability, security; leadership in engineering teams.
Technical Tools Mentioned:AWS CDK, Datadog, Prometheus, Grafana, OpenTelemetry, Kafka, SNS/SQS, Python, TypeScript
Save
Mark Applied
Hide Job
Report & Hide
Job Description
EA Experiences group (XO) is dedicated to ensuring great experiences for our growing communities centered around our world-renowned brands, including fan-favorites like Apex, Battlefield, EA SPORTS FC, Madden NFL and The Sims, just to name a few. We're a multi-functional group, with world-class expertise building fandoms, driving interactive storytelling, and positioning our franchises at the center of the broader entertainment ecosystem. We inspire, connect, and engage fans through culturally relevant content, intentionally architected journeys across channels, and meaningful fan care. Our goal is to provide valuable, easy experiences that fans love – in our games, around our games, and through innovative adjacent experiences to grow and enrich how fans experience EA as we shape the future of entertainment.

To empower more players and fans in new and amazing ways, we need more innovators to join our world-class team. The future of entertainment is interactive, and you can help lead that future, by growing and enriching how hundreds of millions of people (and counting) find joy and belonging, forge friendships, and celebrate their lived experiences through the work we do every single day, together.
AI Infrastructure Engineer
As the AI Infrastructure Engineer, you will design and operate AI-powered, cloud-native infrastructure that is self-monitoring, self-healing, and secure by default. You will lead the implementation of AIOps practices, build infrastructure as software using frameworks like AWS CDK, and establish standards for CI/CD, observability, traceability, and DevSecOps. Working across systems, you will enable reliable event-driven integrations and ensure continuous validation of system health, performance, and security. This is a hands-on leadership role combining deep technical execution with architectural ownership, driving scalable, autonomous, and resilient platform capabilities across the organization.
You will play a key role in transforming our infrastructure from traditional DevOps to AI-driven, autonomous operations. You will define how systems are built, integrated, secured, and operated—enabling teams to move faster while maintaining high reliability, visibility, and control at scale.
Responsibilities
Lead the design and implementation of AI-powered DevOps (AIOps) capabilities, including anomaly detection, predictive alerting, and automated remediation

Build and operate self-monitoring and self-healing systems using event-driven automation

Architect and implement infrastructure as software using programmable IaC frameworks (AWS CDK preferred)

Develop reusable infrastructure patterns, shared libraries, and platform standards across teams

Establish end-to-end observability and traceability across services, pipelines, and data flows

Design and govern CI/CD pipelines that provision, validate, and deploy infrastructure with embedded security and compliance controls (DevSecOps)

Define and implement security best practices, including policy enforcement, identity management, and continuous validation of system posture

Design and support event-driven integration patterns across internal systems, ensuring reliable communication and signal propagation

Define SLIs/SLOs and lead incident response, postmortems, and continuous reliability improvements

Mentor engineers and influence architecture decisions across teams

Qualifications
7+ years of experience in DevOps, Site Reliability Engineering (SRE), platform engineering, or systems engineering

Strong expertise in AWS across compute, storage, networking, and IAM

Experience designing and operating systems supporting both traditional services and AI/ML workloads

Advanced experience with Infrastructure as Code, with emphasis on AWS CDK and reusable infrastructure patterns

Deep experience with observability tools (Datadog, Prometheus, Grafana, OpenTelemetry) and distributed systems debugging

Strong experience building CI/CD pipelines with integrated infrastructure provisioning, testing, and security controls

Experience with event-driven architectures (Kafka, SNS/SQS) and system integrations

Proficiency in Python, Typescript, or similar programming languages

Strong understanding of security principles, including least privilege, identity controls, and secure system design

Nice to Have
Experience with SageMaker, Bedrock and AgentCore, Kubernetes (EKS) and container orchestration at scale

Experience with Data Lakehouse platforms (e.g., Databricks) and data pipeline integration

Experience building internal developer platforms or shared infrastructure frameworks

Experience creating custom CDK constructs or platform tooling

Familiarity with policy-as-code and continuous compliance frameworks

Exposure to real-time systems or high-scale consumer platforms

Background in gaming, media, or large-scale consumer ecosystems