Posted 8h ago

Principal Site Reliability Engineer

@ Atlassian
North America
RemoteFull Time
Responsibilities:drive reliability, build platform, mentor engineers
Requirements Summary:10+ years in Java/Go/Python; 7+ years in public cloud (2+ years on GCP); 7+ years in HA distributed software; strong communication; mentoring
Technical Tools Mentioned:Java, Go, Python, Google Cloud Platform, REST, GraphQL, JVM, Performance Tuning, Monitoring, Runbooks
Save
Mark Applied
Hide Job
Report & Hide
Job Description

Overview

We are looking for a reliability expert who is passionate about scaling Cloud services to join our growing Site Reliability Engineering (SRE) teams. You are someone who is aware of current industry trends (particularly those related to reliability) and who values working with a diverse set of partners, who can articulate the business impact of a problem and can also dive deep into the technical solution.

Responsibilities

We'd love it if you brought a deep understanding of modern Cloud infrastructure, programming expertise, operational experience and a desire to change the status quo.

We're looking for an engineer who can analyse and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency. You'll achieve this by crossing team and functional boundaries to advocate for reliability methodologies and will work with a variety of platform, product and SRE teams to both build reliability into our platform and drive adoption of those practices into our products. In other words, you'll be the driving force for change! You will report to a regional Senior Engineering Manager in SRE.

Qualifications

We’ll expect you to have:

  • Expert-level proficiency with 10+ years experience in one or more prominent languages such as Java, Go or Python.

  • Expert-level proficiency with 7+ years experience in public cloud offerings (with at least 2+ years specifically on GCP).

  • Expert-level proficiency with 7+ years experience in operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.

  • Excellent communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients).

  • An ability and desire to mentor and coach engineers.

It would be great, but not mandatory if you had:

  • Experience in datastores (RDBMS, time-series-database, NoSql, search, analytics).

  • Experience in microservice architecture.

  • Experience building web-services and clients using REST/GraphQL.

  • Expertise in JVM, Garbage Collection, and Performance Tuning.