Remote Senior Site Reliability Engineer

closed
Logo of Collectivei

Collectivei

πŸ’΅ $150k-$185k
πŸ“Remote - United States

Job highlights

Summary

Join Collective[i], a private 100% remote company, as a Senior Site Reliability Engineer and contribute to building a platform for prosperity that helps companies generate sales and people expand their professional connections.

Requirements

  • Proficiency with AWS, Terraform, Packer, Ansible, and container technologies
  • Expertise in AWS services
  • Experience with other cloud providers is a plus
  • Strong knowledge of Ubuntu 24.04 , Bash, Python, systemd, podman, docker, and auditd
  • Familiarity with GitHub, GitHub Actions, GitHub Container Registry, and Copilot
  • Experience with monitoring and logging tools like DataDog, OpenTelemetry, and Graylog
  • Proficiency in working with databases and platforms such as Snowflake, Okta, Postgres, MongoDB, and ElasticSearch
  • Familiarity with security tools like Snyk, Tenable.io , and 1Password
  • Experience with SOC 2 or other compliance standards is highly desirable

Responsibilities

  • Manage AWS infrastructure across multiple accounts using Terraform with extensive experience in deployment and automation
  • Utilize Linux and open-source tooling as the foundation of your work, being proficient across various Linux distributions, scripting languages, clustering technologies, database engines, and configuration management tools, with a preference for Ansible
  • Develop and implement containerization strategies, ensuring well-crafted container builds. Must be capable of creating original containers and not just relying on third-party containers from public repositories
  • Assess and apply Kubernetes knowledge selectively, understanding when and why it is appropriate to useβ€”note, we are not a Kubernetes-focused environment
  • Collaborate closely with development teams, providing support in building and optimizing distributed systems
  • Maintain expertise in Git workflows, including proficiency in CI/CD automation tools such as GitHub Actions
  • Implement and manage monitoring and logging solutions, with hands-on experience in tools like DataDog and OpenTelemetry
  • Strive to prevent issues like log diving, incident response, root cause analysis, and late-night pages by proactively managing system stability and reliability
This job is filled or no longer available

Similar Remote Jobs