Remote Senior Site Reliability Engineer
Netlify
Job highlights
Summary
Join Netlify's Infrastructure SRE team as a Site Reliability Engineer and play a key role in designing, developing, and delivering solutions that enhance the scalability, availability, and efficiency of our platform. You will manage the full infrastructure lifecycle, participate in on-call rotations, automate tasks, conduct performance tuning and troubleshooting, and participate in disaster recovery planning. The role requires several years of experience in SRE or DevOps, expertise in hyperscale cloud environments, and strong understanding of network protocols and automation tools. Netlify offers a remote-first, globally distributed work environment with a focus on asynchronous communication and a commitment to diversity and inclusion. The company prioritizes a healthy work-life balance and offers competitive compensation and benefits.
Requirements
- Several years of experience in SRE, DevOps, or related roles
- Proven experience working in hyperscale cloud environments
- Demonstrated ability to lead infrastructure projects
- Strong understanding of network protocols and configurations
- Experience with automation tools (e.g., Ansible, Terraform) and scripting languages (e.g., Python, Bash, Golang)
- Experience automating component deployment across multiple environments using tools like Jenkins, CircleCI, or GitHub Actions
- Proficient observability and log analysis techniques to detect and resolve system issues
- Effective communication skills for both technical and non-technical stakeholders
- Familiarity with compliance requirements and frameworks: PCI, ISO 2701, HIPAA, SOC
Responsibilities
- Manage full infrastructure lifecycle from design to decommission, ensuring systems are reliable and efficient
- Participate in an on-call rotation for the compute platform and related systems
- Automate routine tasks and develop tools to improve system efficiency and reduce the human intervention time on any tasks
- Conduct system performance tuning and troubleshooting, as well as capacity planning, to ensure system reliability and efficiency
- Participate in the creation and testing of disaster recovery plans
- Monitor and maintain observability systems to ensure issues are identified and resolved proactively
- Educate team members on security best practices and emerging threats
Benefits
- Remote work, flexible hours
- Competitive compensation and benefits
- Equity plan
Share this job:
Similar Remote Jobs
- π°$60k-$120kπAsia
- π°$177k-$213kπUnited States
- πUnited Kingdom
- πUnited States
- πCanada
- πPoland
- π°$167k-$201kπUnited States
- Nπ°$68k-$98kπWorldwide
- π°$125k-$150kπCanada
- π°$154k-$258kπWorldwide