Posted 1mo ago

Senior DevOps/SRE Architect ( Global Leadership Rotation Program)

@ OptiSigns
Taipei, Taiwan, Taiwan
NT$2000k-NT$3000k/yrOnsiteFull Time
Responsibilities:Lead reliability, Design infrastructure, Automate deployment
Requirements Summary:10+ years in DevOps/SRE, cloud platforms (AWS/GCP/Azure), Kubernetes, CI/CD, distributed systems, proactive ownership mindset.
Technical Tools Mentioned:AWS, GCP, Azure, Kubernetes, CI/CD, Terraform, Infrastructure as Code
Save
Mark Applied
Hide Job
Report & Hide
Job Description

About OptiSigns

OptiSigns is a fast-scaling cloud platform powering digital signage for 30,000+ businesses across 100+ countries, with 190,000+ active screens worldwide.

Founded in Houston, Texas in 2016 and now expanding aggressively in Asia and Europe, we help companies transform ordinary screens into powerful, dynamic communication tools. Our Vietnam engineering team is central to our next phase of growth.

Why This Role

This is not a typical architect role.

We are looking to bring on a senior DevOps/SRE architect from Taiwan to relocate to Ho Chi Minh City, Vietnam, and lead our growing engineering hub. This is a hands-on technical leadership role, with real ownership over both system scalability and team development. This role includes a full relocation package and offers a global career path, including the opportunity to work from our US headquarters as part of our rotation program.

You will lead by example—mentoring engineers, raising the technical bar, and ensuring the team can move fast while building reliable systems.

Take full responsibility for the reliability and scalability of our global SaaS digital signage platform. Experience real-world scale with over 100 million database records, terabyte-level data storage, and continuously increasing global traffic.

What You’ll Do

  • Take full responsibility for production reliability, encompassing uptime, latency, performance, and the overall health of systems
  • Design and manage scalable, resilient cloud infrastructures on platforms such as AWS, GCP, or Azure
  • Develop, optimize, and maintain CI/CD pipelines to ensure dependable and frequent deployments
  • Implement thorough observability by utilizing monitoring, logging, tracing, and alerting mechanisms
  • Lead incident management efforts, including root cause analysis and conducting blameless postmortems
  • Enhance system resilience through strategies like redundancy, failover, disaster recovery, and chaos engineering practices
  • Automate infrastructure and operational tasks using Terraform, Infrastructure as Code (IaC), and custom tools
  • Minimize operational effort and increase scalability by proactively implementing automation
  • Work closely with engineering teams to integrate reliability principles into system architecture and workflows
  • Establish and monitor SLOs and SLIs to maintain a balance between innovation and system stability