Remote Senior Site Reliability Engineer

Logo of Netlify

Netlify

πŸ’΅ $136k-$184k
πŸ“Remote - Worldwide

Job highlights

Summary

Join Netlify's Infrastructure SRE team as a Site Reliability Engineer and play a key role in designing, developing, and delivering solutions that enhance the scalability, availability, and efficiency of our platform. You will manage the full infrastructure lifecycle, participate in on-call rotations, automate tasks, conduct performance tuning and troubleshooting, and participate in disaster recovery planning. The role requires several years of experience in SRE or DevOps, expertise in hyperscale cloud environments, and strong understanding of network protocols and automation tools. Netlify offers a remote-first, globally distributed work environment with a focus on asynchronous communication and a commitment to diversity and inclusion. The company prioritizes a healthy work-life balance and offers competitive compensation and benefits.

Requirements

  • Several years of experience in SRE, DevOps, or related roles
  • Proven experience working in hyperscale cloud environments
  • Demonstrated ability to lead infrastructure projects
  • Strong understanding of network protocols and configurations
  • Experience with automation tools (e.g., Ansible, Terraform) and scripting languages (e.g., Python, Bash, Golang)
  • Experience automating component deployment across multiple environments using tools like Jenkins, CircleCI, or GitHub Actions
  • Proficient observability and log analysis techniques to detect and resolve system issues
  • Effective communication skills for both technical and non-technical stakeholders
  • Familiarity with compliance requirements and frameworks: PCI, ISO 2701, HIPAA, SOC

Responsibilities

  • Manage full infrastructure lifecycle from design to decommission, ensuring systems are reliable and efficient
  • Participate in an on-call rotation for the compute platform and related systems
  • Automate routine tasks and develop tools to improve system efficiency and reduce the human intervention time on any tasks
  • Conduct system performance tuning and troubleshooting, as well as capacity planning, to ensure system reliability and efficiency
  • Participate in the creation and testing of disaster recovery plans
  • Monitor and maintain observability systems to ensure issues are identified and resolved proactively
  • Educate team members on security best practices and emerging threats

Benefits

  • Remote work, flexible hours
  • Competitive compensation and benefits
  • Equity plan

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Netlify know you found this job on JobsCollider. Thanks! πŸ™