Summary
Join GoDaddy Engineering team as a Senior SRE to build and maintain core platform services. You will be responsible for designing, deploying, and operating scalable, resilient, and secure Linux systems. This role involves diagnosing and solving complex system issues, writing Python automation, championing SRE best practices, and collaborating with cross-functional teams. You will also own incident response, mentor other engineers, and implement infrastructure-as-code and CI/CD pipelines. This is a remote position with occasional office visits. The ideal candidate will have extensive experience managing Linux-based systems and a strong background in Python development and DevOps tools.
Requirements
- 7+ years of hands-on experience managing Linux-based systems in high-scale production environments
- Deep understanding of Linux internals, performance tuning, and system-level debugging
- Strong experience in Python development, especially for automation, tooling, and backend scripting
- Working knowledge of cloud platforms such as AWS, Azure, or GCP
- Familiarity with DevOps tools (Terraform, Ansible, Jenkins, Git) and modern CI/CD workflows
- Solid foundation in observability and monitoring tools like Prometheus, Grafana, ELK, or Datadog
- Experience working with containerized environments (Docker, Kubernetes)
- Clear communicator and strong collaborator across time zones and teams
Responsibilities
- Lead the design, deployment, and operation of scalable, resilient, and secure Linux systems across our global platform
- Diagnose and solve complex system-level issues using your expert knowledge of Linux internals and tools like strace, tcpdump, and systemd
- Write Python automation for provisioning, observability, self-healing, and system performance optimization
- Champion SRE best practicesโSLOs, error budgets, and blameless postmortems are part of your toolkit
- Collaborate with cross-functional teams to improve infrastructure architecture and developer experience
- Implement and evolve infrastructure-as-code and CI/CD pipelines using Terraform, Ansible, Git, and more
- Own incident response and lead post-incident analysis with a focus on resilience and continuous improvement
- Mentor other engineers and help drive a culture of reliability, learning, and operational excellence
Preferred Qualifications
- Contributions to open-source projects in the Linux, SRE, or DevOps ecosystem
- Experience with Go or other scripting languages like Bash
- Exposure to chaos engineering, resilience testing, or advanced fault tolerance strategies
- Passion for improving developer experience and reducing operational toil
Benefits
- Paid time off
- Retirement savings (e.g., 401k, pension schemes)
- Bonus/incentive eligibility
- Equity grants
- Participation in our employee stock purchase plan
- Competitive health benefits
- Other family-friendly benefits including parental leave
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.