Site Reliability Engineer

Strike Logo

Strike

πŸ“Remote - Worldwide

Summary

Join Strike, a company building a global Bitcoin app, and become a Site Reliability Engineer based in Europe. Lead technical initiatives to improve system reliability, performance, and scalability. Architect and implement advanced, resilient solutions leveraging your deep understanding of distributed systems. Master troubleshooting and optimization, and build automation frameworks. Elevate observability practices and provide leadership in incident management. Mentor and guide other engineers. This role requires extensive experience in SRE, systems engineering, or software development with a strong operational focus.

Requirements

  • Extensive experience with minimum 5 years in SRE, systems engineering, or software development with a strong operational focus
  • Demonstrated experience in providing technical leadership, guidance, or mentorship to engineering teams
  • Expert-level practical knowledge of cloud platforms, especially GCP
  • Deep hands-on experience with container orchestration (Kubernetes) and infrastructure-as-code (Terraform, Helm, ArgoCD)
  • Strong command of multiple scripting and programming languages (Python, Go, Bash)
  • Proven expertise in building and leveraging advanced monitoring and observability tools (Prometheus, Grafana, ELK stack)
  • Exceptional analytical, problem-solving, and debugging skills at a senior level
  • Excellent communication, collaboration, and influencing skills

Responsibilities

  • Lead Technical Initiatives: Drive key technical initiatives focused on improving the reliability, performance, and scalability of our critical systems, often leading technical aspects within projects
  • Architect and Implement Advanced Solutions: Design and implement sophisticated resilient and scalable solutions, leveraging your deep understanding of distributed systems
  • Master Troubleshooting and Optimization: Lead complex troubleshooting efforts, identify deep-seated root causes, and implement advanced optimizations
  • Build and Evangelize Automation: Develop and champion the adoption of robust automation frameworks and tools, potentially guiding more junior engineers in their development
  • Elevate Observability Practices: Design and implement comprehensive and insightful monitoring and logging solutions, ensuring actionable insights are available across teams
  • Provide Leadership in Incident Management: Take a leadership role in incident response, providing critical technical direction and mentorship during high-pressure situations
  • Champion Post-Mortem Excellence: Lead and contribute to in-depth blameless post-mortem analyses, driving significant improvements based on learnings
  • Mentor and Guide Team Members: Share your extensive knowledge and experience to mentor and guide other SREs and engineers, fostering their technical growth

Benefits

Compensation for services is location dependent

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.