Remote Site Reliability Engineer

Logo of Alchemy

Alchemy

πŸ“Remote - Romania

Job highlights

Summary

Join our team of Site Reliability Engineers at Alchemy, where you'll convert manual operational tasks into automated processes while building and maintaining tools and infrastructure. As an SRE, you'll tackle problems methodically, considering systems scalability, high availability, latency, and resilience. With strong experience in operations, networking, infrastructure, software development, observability, and troubleshooting, you'll be one of the most versatile roles anyone can grow into.

Requirements

  • Experience writing efficient code in one or more programming languages (e.g. Python, Golang, Java, Rust)
  • Experience developing software applications and tools from scratch that can be expanded and used by other team members by offering a clear structure, reusable code patterns and guidance
  • Past experience designing and managing the lifecycle of complex systems while taking into account multiple factors such as costs, systems performance, scalability, resilience and disaster recovery
  • Expertise in all aspects of operating Linux-based systems with focus on troubleshooting, configuration and monitoring
  • Past experience managing large scale infrastructures running on Baremetal, Public and Private cloud (e.g AWS, GCP, Azure) and Container-based infrastructure (Kubernetes, Openshift, Docker etc.)
  • Knows the insides of different protocols across the stack such as HTTP, DNS, DHCP, routing protocols, etc
  • Leverages programming languages and different automation tools to reduce toil and automate repetitive tasks
  • Past experience with IaaC such as Terraform or Pulumi, and Configuration Management tools (e.g. Ansible, Puppet, Chef)
  • Experience with one or more CI/CD solutions (e.g. Jenkins, ArgoCD, Gitlab pipelines, Spinnaker, Harness) is a must
  • Experience implementing monitoring and logging solutions for infrastructure and applications
  • Must have experience with monitoring and logging tools such as Prometheus, Thanos, Splunk, Grafana, Graphite, Loki, etc
  • Past experience leading a team is a big plus
  • Has great communication skills and is able to express his ideas to other team members effectively

Responsibilities

  • Design, build, and refactor major software components that improve the availability, resilience, performance and efficiency of our system
  • Is part of our on-call rotation and responds to our infrastructure incidents in accordance with our policy
  • Proactively addresses bugs and bottlenecks as part of our infrastructure
  • Can define and choose the best SLI/SLOs in accordance to our system needs
  • Is able to choose the best tools for different problems and can adapt to our ever-changing specifications and growth
  • Addresses issues in our Incident Management process by reducing and fixing noisy alerts, reducing MTTD and MTTR and is able to support other team members on this aspect
  • Able to identify and address design bottlenecks in our infrastructure
  • Able to mentor new hires and onboard them to our tools and infrastructure
  • Able to address code complexity and efficiency issues while constantly addressing software bugs
  • Able to support and guide other team members with code-related problems and participate in and offer effective code reviews

Benefits

  • Attractive salary package
  • Opportunity to work with the latest cloud and blockchain technologies
  • Fully remote work or hybrid depending on candidate preferences
  • Token allocation similar to equity packages in traditional companies
  • Growth budget, to be spent at the candidate's discretion
  • Equipment stipend
  • Flexible time away
  • Private Medical Insurance
  • Start-up environment: internal off-site hackathons, access to company-rented hacker house during summer
  • Crypto market investment opportunities and guidance

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Alchemy know you found this job on JobsCollider. Thanks! πŸ™