Remote Principal Site Reliability Engineer

Logo of Gemini

Gemini

💵 $198k-$247k
📍Remote - United States

Job highlights

Summary

Join our team as a Principal Site Reliability Engineer and lead Gemini's engineering teams towards modern DevOps practices. You will provide primary operational support, improve reliability, and guide engineering teams onto supported services provided by Platform.

Requirements

  • 10+ years using monitoring, alerting, and automation tooling to understand and remediate performance and health issues in systems at scale
  • Good knowledge for various cloud technology providers like AWS, GCP, or Azure
  • Expert in an infrastructure as code environment (Terraform), developing automated solutions to solve support and operational issues
  • Experience as a Technical Leader within a team, helping evaluating and making tech decisions for the team
  • Expert working with containerization such as Nomad, EKS (k8s), Docker, etc
  • Expert working with Configuration Management such as Ansible, Chef, Puppet
  • Proficient at writing scripts or cli tools that help increase Developer Productivity in high-level languages like Python, Go, etc
  • Expert analyzing system and application performance, identifying bottlenecks, and recommending architectural or systemic improvements
  • Experience working with Engineering teams, teaching, training, and mentoring on how to implement best-practice technical solutions

Responsibilities

  • Provide primary operational support and engineering for various Gemini services
  • Improve reliability, quality and time-to-market across all Gemini services and offerings
  • Guide engineering teams onto the various supported services provided by Platform
  • Run on-going performance evaluations and improvements for Gemini systems
  • Provide architecture recommendations and engagement as part of SDLC
  • Create “Production-ready Scorecards” to evaluate the health of systems pre-launch
  • Implement and teaching monitoring, alerting and automated resolution best practices
  • Define SLIs, SLOs with Engineering teams
  • Educate and guide Engineering teams on reliability and resiliency best practices, like statelessness, chaos testing, blue/green deployments, etc
  • Design, build, and maintain operational tooling and automation that streamline processes and enhance system reliability

Benefits

  • Competitive starting salary
  • A discretionary annual bonus
  • Long-term incentive in the form of a new hire equity grant
  • Comprehensive health plans
  • 401K with company matching
  • Paid Parental Leave
  • Flexible time off

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Gemini know you found this job on JobsCollider. Thanks! 🙏