Remote Staff Site Reliability Engineer

Logo of Vultr

Vultr

💵 $120k-$135k
📍Remote - Worldwide

Job highlights

Summary

Join Vultr as a Staff Site Reliability Engineer to help automate and make an impact working with cross-functional teams, designing state-of-the-art cloud provider solutions, and enhancing the resilience and stability of our systems.

Requirements

  • 3+ years of experience in a hands-on SRE role delivering distributed architectures
  • 2+ years working with and maintaining Kubernetes clusters for highly available and regulated environments
  • 2+ years of hands-on experience with a modern Grafana stack, including Mimir, Loki, and Tempo
  • Comfortable working with complex CI/CD Pipelines (Gitlab/Jenkins), configuration management (Puppet/Salt), and IaC solutions such as Terraform
  • Experience working with observability pipelines or Open Telemetry is a plus
  • A background in performance optimization for Webstacks, including components such as PHP-FPM, Ningx, and Mysql
  • Boasts strong programming chops in Python, Golang, or PHP and thrives when picking up new technologies

Responsibilities

  • Collaborate with cross-functional teams to craft and implement a modern observability stack and refine our incident-handling processes
  • Design and contribute to state-of-the-art cloud provider solutions for high-performance computing, AI training, and inference workloads, focusing on Observability and MLOps
  • The platform team aims to enhance the resilience and stability of our systems through thoughtful software improvements, architecture, and automation
  • Contribute to solutions for various challenges ranging in nature from low-level hardware issues to high-level distributed application scale challenges and everything in between
  • Champion DevOps and SRE principles through automation, thought leadership, and close collaboration within our engineering team
  • Enhance customer experience by improving case handling—strive for proactive responses, rich insights, and automated resolutions
  • Develop robust documentation to streamline the handling of recurring reliability issues, paving the way for junior SREs to take the helm confidently
  • Identify and implement scalable solutions to address technical challenges within our stack, setting new benchmarks for innovation

Job description

Who We Are

Vultr is on a mission to make high-performance cloud computing easy to use, affordable, and locally accessible for businesses and developers around the world.  With 32 cloud data center locations around the world, Vultr has served over 1.5 million customers across 185 countries with flexible, scalable, global Cloud Compute, Cloud GPU, Bare Metal, and Cloud Storage solutions. Founded by David Aninowsky and completely bootstrapped, Vultr has become the world’s largest privately-held cloud computing company without ever raising equity financing.

Why Vultr

Simply put, Vultr is committed to providing businesses worldwide with the best price-to-performance of any cloud computing platform. Our global reach of data centers and strategic new partnerships provide the foundation to maximize the impact of our existing services, new product improvements, and releases, which in turn, is a catalyst for your own success. Vultr is taking flight, and this is your opportunity to leave your mark on the future of Cloud Infrastructure!

Vultr Cares

  • A 100% remote work environment + a company-wide virtual get together
  • 401(k) plan that matches 100% up to 4% with immediate vesting
  • Professional Development Reimbursement of $2,500 each year
  • 11 Holidays + Paid Time Off Accrual + Rollover Plan + take off your birthday!
  • Commitment matters to Vultr! Increased PTO at 3 year anniversary + 1 month sabbatical at 5 year anniversary + Anniversary Bonus each year
  • $500 first year remote office setup + $400 each year following for new equipment
  • Monthly internet reimbursement up to $75
  • $50 per month for a gym membership

Join Vultr

The Platform team is a central pillar of our growth strategy, and we are looking for a Staff Site Reliability Engineer to help automate. You’ll make an impact working with a cross-functional group of SREs, Platform Developers, and automation engineers to instill best practices, solve challenging problems across various disciplines, and work on some of the most cutting-edge technology in the industry.

What to expect:

  • Collaborate with cross-functional teams to craft and implement a modern observability stack and refine our incident-handling processes.
  • Design and contribute to state-of-the-art cloud provider solutions for high-performance computing, AI training, and inference workloads, focusing on Observability and MLOps.
  • The platform team aims to enhance the resilience and stability of our systems through thoughtful software improvements, architecture, and automation.
  • Contribute to solutions for various challenges ranging in nature from low-level hardware issues to high-level distributed application scale challenges and everything in between.
  • Champion DevOps and SRE principles through automation, thought leadership, and close collaboration within our engineering team.
  • Enhance customer experience by improving case handling—strive for proactive responses, rich insights, and automated resolutions.
  • Develop robust documentation to streamline the handling of recurring reliability issues, paving the way for junior SREs to take the helm confidently.
  • Identify and implement scalable solutions to address technical challenges within our stack, setting new benchmarks for innovation.

Our ideal candidate will have:

  • 3+ years of experience in a hands-on SRE role delivering distributed architectures.
  • 2+ years working with and maintaining Kubernetes clusters for highly available and regulated environments.
  • 2+ years of hands-on experience with a modern Grafana stack, including Mimir, Loki, and Tempo.
  • Comfortable working with complex CI/CD Pipelines (Gitlab/Jenkins), configuration management (Puppet/Salt), and IaC solutions such as Terraform
  • Experience working with observability pipelines or Open Telemetry is a plus.
  • A background in performance optimization for Webstacks, including components such as PHP-FPM, Ningx, and Mysql
  • Boasts strong programming chops in Python, Golang, or PHP and thrives when picking up new technologies.

Compensation

$120,000 - $135,000 + Bonus

This salary can vary based on location, years of experience, background, and skill set.

Vultr is committed to an inclusive workforce where diversity is celebrated and supported. All employment decisions at Vultr are based on business needs, job requirements, and individual qualifications.

Vultr regards the lawful and correct use of personal information as important to the accomplishment of our objectives, to the success of our operations and to maintaining confidence between those with whom we deal and ourselves. As such the use of various key privacy controls enables Vultr’s treatment of personal information to meet current regulatory guidelines and laws.

Workforce members have the right under US state law where and when applicable and certain other privacy and data protection laws, as applicable, to: fair and equal treatment, knowing what personal data we gather and retain, for what purpose, and the ability to access and/or delete such data. You also have the right to opt out of communications from Vultr and approved third- parties at any time.

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Vultr know you found this job on JobsCollider. Thanks! 🙏