Principal Site Reliability Engineer at Boomi

Summary

Join Boomi as a Principal Site Reliability Engineer and contribute to the development of sophisticated systems and software. You will collaborate with various teams to enhance Boomi offerings, participate in incident detection and remediation, and ensure adherence to SLAs/SLOs. Responsibilities include on-call rotation, process implementation, and driving DR exercises. You will also collaborate on tooling automation, implement observability best practices, and improve system scalability and reliability. The role involves mentoring other engineers and working independently with minimal guidance. Boomi offers a chance to work on world-changing technologies and be part of a fast-growing company.

Requirements

Passionate about SRE, DevOps, Automation and infrastructure platforms
Expert in developing Ansible playbooks and automation for Infrastructure as code using Terraform and Cloud Formation Templates
Expert in defining, measuring, and improving Reliability Metrics (SLO/SLI/ Error budgets)
Strong in implementing observability practices (Monitoring, Logging, Distributed Tracing etc.) preferably using Splunk and New Relic. Experience not limited to using the dashboards, but creating them from scratch
Experience in conducting and automating DR exercise in AWS cloud thus validating RPOs and RTOs
Strong understanding and working experience with AWS components
Ability to design and implement API’s for use by internal teams

Responsibilities

Participate actively in detecting, remediating and reporting on Production incidents, ensuring the SLAs/ SLOs are defined and met
Participate in on-call rotation to ensure coverage for planned/unplanned events
Engage with other Engineering organizations to implement processes, identify improvements, and drive consistent results
Working with your SRE and Engineering counterparts for driving DR exercises, Game days, training and other response readiness efforts
Collaborate with Service Engineering organizations to build and automate tooling, implement best practices on Observability and manage the Boomi services in production and consistently achieve our market leading SLA
Improving the scalability and reliability of Boomi’s systems in production
Automate the provisioning and maintenance of Boomi’s infrastructure
Work independently with a minimal level of guidance from technical leadership
Mentor other Boomi engineers, including design collaboration and code reviews

Preferred Qualifications

5 to 8 years of related experience in the software engineering industry, with experience supporting large scale software systems in production
Certified in Cloud (AWS/Azure/GCP), experience in using services such as computers, containers and databases
Experience in Ansible/Terraform and Python
A grasp of Cloud Native concepts, containerization best practices and security awareness in Cloud will be a strong plus
Experience in Observability, creating dashboards for SLA/SLI/SLO

Principal Site Reliability Engineer

Boomi

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Principal

Similar Remote Jobs

Remote

DevOps

Principal

Remote

DevOps

Principal

Remote

DevOps

Senior

Hyperproof

Remote

Software Development

Principal

Clear Ballot Group

Remote

Software Development

Principal

Remote

Software Development

Principal

Remote

Cybersecurity

Principal

Remote

DevOps

Principal

Remote

DevOps

Principal

Remote

DevOps

Principal