Staff Site Reliability Engineer at Illumio

Summary

Join Illumio, a leader in Zero Trust Segmentation, as a Product Site Reliability Engineer (SRE). This remote role in Australia requires expertise in AWS and Azure cloud platforms, application performance, and operational excellence. You will investigate and resolve production incidents, monitor application health, develop automation scripts, conduct root cause analyses, and collaborate with cross-functional teams. The ideal candidate possesses a Bachelor's degree or equivalent experience, 6+ years of relevant SRE experience, and proficiency in programming and scripting. Illumio offers a wide range of benefits, varying by location, including health insurance, paid time off, retirement savings, and more.

Requirements

Bachelor's degree in Computer Science, Engineering, or related field; or equivalent work experience
6+ years of relevant SRE experience
Strong hands-on experience with AWS and Azure
Familiarity with Kubernetes and containerized environments
Knowledge of networking concepts, such as DNS, load balancing, and firewalls
Proficient in diagnosing and resolving complex issues in SaaS environments, including performance bottlenecks and application errors
Proficiency in at least one programming language (e.g., Python, Go, Java) and scripting languages (e.g., Bash, PowerShell)
Experience with tools like Datadog, New Relic, Prometheus, Grafana, ELK, or Azure Monitor
Familiarity with tools like Ansible, Terraform, or CloudFormation
Knowledge of debugging and optimizing relational databases (e.g., PostgreSQL, MySQL) and caching systems (e.g., Redis, Memcached)
Experience with incident management tools and processes, including conducting RCAs and improving on-call processes

Responsibilities

Investigate and resolve production incidents and escalations to ensure minimal downtime and impact to customers
Work closely with engineering and support teams to troubleshoot application and infrastructure issues
Proactively monitor application health, performance, and reliability using modern observability tools
Analyze trends in system behavior and suggest performance improvements
Develop and maintain automation scripts and tools to improve operational efficiency and incident resolution
Create and enhance runbooks to streamline troubleshooting and reduce mean time to resolution (MTTR)
Conduct thorough post-incident reviews to identify root causes and implement preventive measures
Drive a culture of continuous improvement by documenting lessons learned and improving system designs
Partner with software engineers, QA, and product teams to improve application stability and user experience
Act as a bridge between development and operations, ensuring smooth and reliable service delivery

Benefits

Medical, Dental, Vision Coverage
Health and Dependent Savings Accounts
Life and Disability Programs
Paid Parental Leave
Voluntary Benefit Programs
Company Sponsored Wellness Program
Wellness Reimbursement Program
Retirement Savings
Equity Opportunities
Paid time off and Paid Holidays
Employee Incentive Program

Staff Site Reliability Engineer

Illumio

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Similar Remote Jobs

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Addepar

Remote

DevOps

Mid-level