DFINITY is hiring a
Site Reliability Engineer, Web3 - Switzerland

Logo of DFINITY

Site Reliability Engineer

🏢 DFINITY

💵 ~$82k-$120k
📍Switzerland

Summary

The job is for a Site Reliability Engineer at DFINITY to ensure the stability of the Internet Computer by creating tools, processes, and frameworks. The role involves collaborating with various teams, participating in on-call duties, and having experience with Unix systems, Kubernetes, Rust coding, observability tools, incident response, reliability engineering, security background, and community interaction.

Requirements

  • Proven experience in monitoring and maintaining large production systems using tools such as Prometheus, Victoria Metrics, Elastic Search, and Grafana
  • Proficiency in managing multiple observability stacks across various availability zones, leveraging Kubernetes for deployment orchestration
  • Extensive experience in designing and developing moderate-sized applications (up to ~10K lines of code) in Rust. Skilled in setting up automated testing and CI/CD environments
  • Capable of approaching problems methodically and systemically, especially during troubleshooting
  • Expertise in coordinating incident response across multiple teams, with excellent communication skills to clearly understand the situation, next steps, and team responsibilities
  • Preferable experience in Site Reliability Engineering (SRE) within a crypto environment where decisions are governed by DAOs
  • Experience in building security-sensitive tools and managing security risks in such environments. A background in DevSecOps is highly desirable
  • Proven experience in engaging with community members of large open-source projects. Ideally, the candidate is already active within the ICP community

Responsibilities

  • Design, build, deploy, and maintain services to ensure the high availability and reliability of DFINITY's products and the Internet Computer Protocol (ICP)
  • Automate processes through coding, enhancing efficiency and reducing manual intervention
  • Integrate reliability and operability into the product from the start by participating in design and code reviews, identifying risks, and proposing mitigations
  • Work with engineering and security teams to establish processes that align with the goals of the Internet Computer while remaining operationally feasible and automatable
  • Collaborate with product owners to define Service Level Objectives (SLOs) and implement them in code and observability infrastructure
  • Participate in on-call duties for production services on a 12/7 schedule, split across two sites. On-call duty is approximately 1 week every 6 weeks. Coordinate incident response and ensure resolution, involving engineers from other teams as necessary

Benefits

On-call work is compensated with a monetary and a time off compensation

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs

Please let DFINITY know you found this job on JobsCollider. Thanks! 🙏