Remote Site Reliability Engineer

Logo of Baseten

Baseten

πŸ“Remote - United States

Job highlights

Summary

The job description is for a Site Reliability Engineer position at Baseten, a growing ML infrastructure company backed by top-tier investors. The role involves building and maintaining scalable infrastructure to ensure the reliability and efficiency of the infrastructure, with no prior ML experience required but openness to learning about it.

Requirements

  • Have experience building and maintaining scalable infrastructure
  • Extensive experience with Kubernetes
  • Know when automation is relevant, e.g. for managing CI/CD pipelines
  • Establish standards and best practices for reliability and performance

Responsibilities

  • Envision and build robust systems and processes that ensure our infrastructure is scalable, reliable, and efficient
  • Automate deployments and monitoring systems
  • Optimize performance and manage incidents

Preferred Qualifications

Relevant OSS observability experience (e.g. prometheus, ELK stack, grafana stack, opentelemetry)

Benefits

  • Can own products and projects end-to-end
  • Are comfortable with navigating ambiguity and enjoy the journey as much as the destination
  • Are motivated by customer problems and find joy in creating simple, elegant solutions that avoid unnecessary complexity
  • Exercise good judgment on tradeoffs and tools needed to solve the problem and don't over index on trendy/fashionable tech unless it's the right tool for the job
  • Demonstrate pride, ownership, and accountability for your work and expect the same from your teammates

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Baseten know you found this job on JobsCollider. Thanks! πŸ™