Staff Site Reliability Engineer
Gemini
Summary
Join Gemini, a global crypto and Web3 platform, as a Staff Site Reliability Engineer. You will play a key role in leading engineering teams towards modern DevOps practices, developing automation and operational tooling, and influencing development practices. Responsibilities include providing operational support, improving reliability, guiding engineering teams, and creating production-ready scorecards. The ideal candidate possesses 7+ years of experience with monitoring, alerting, and automation tooling, along with expertise in cloud technologies, containerization, and configuration management. Gemini offers a competitive compensation and benefits package, including a competitive salary, annual bonus, equity grant, comprehensive health plans, 401k matching, paid parental leave, and flexible time off.
Requirements
- 7+ years using monitoring, alerting, and automation tooling to understand and remediate performance and health issues in systems at scale
- Good knowledge for various cloud technology providers like AWS, GCP, or Azure
- Experience in a code-first environment, developing automated solutions to solve support and operational issues
- Experience as a Technical Leader within a team, helping evaluating and making tech decisions for the team
- Experience working with containerization such as Nomad, EKS (k8s), Docker, etc
- Experience working with Configuration Management such as Ansible, Chef, Puppet
- Experience writing scripts or cli tools that help increase Developer Productivity in high-level languages like Python, Go, etc
- Experience analyzing system and application performance, identifying bottlenecks, and recommending architectural or systemic improvements
- Experience working with Engineering teams, teaching, training, and mentoring on how to implement best-practice technical solutions
- Experience working in a code-drive, automation-first public cloud infrastructure (Terraform)
Responsibilities
- Provide primary operational support and engineering for various Gemini services
- Improve reliability, quality and time-to-market across all Gemini services and offerings
- Guide engineering teams onto the various supported services provided by Platform
- Run on-going performance evaluations and improvements for Gemini systems
- Provide architecture recommendations and engagement as part of SDLC
- Create โProduction-ready Scorecardsโ to evaluate the health of systems pre-launch
- Implement and teaching monitoring, alerting and automated resolution best practices
- Define SLIs, SLOs with Engineering teams
- Educate and guide Engineering teams on reliability and resiliency best practices, like statelessness, chaos testing, blue/green deployments etc
- Build operational tooling and automations
Benefits
- Competitive starting salary
- A discretionary annual bonus
- Long-term incentive in the form of a new hire equity grant
- Comprehensive health plans
- 401K with company matching
- Paid Parental Leave
- Flexible time off