Collectivei is hiring a
Senior Site Reliability Engineer

Logo of Collectivei

Collectivei

πŸ’΅ $150k-$185k
πŸ“Remote - United States

Summary

Join Collective[i], a private 100% remote company, as a Senior Site Reliability Engineer and contribute to building a platform for prosperity that helps companies generate sales and people expand their professional connections.

Requirements

  • Proficiency with AWS, Terraform, Packer, Ansible, and container technologies
  • Expertise in AWS services
  • Experience with other cloud providers is a plus
  • Strong knowledge of Ubuntu 24.04 , Bash, Python, systemd, podman, docker, and auditd
  • Familiarity with GitHub, GitHub Actions, GitHub Container Registry, and Copilot
  • Experience with monitoring and logging tools like DataDog, OpenTelemetry, and Graylog
  • Proficiency in working with databases and platforms such as Snowflake, Okta, Postgres, MongoDB, and ElasticSearch
  • Familiarity with security tools like Snyk, Tenable.io , and 1Password
  • Experience with SOC 2 or other compliance standards is highly desirable

Responsibilities

  • Manage AWS infrastructure across multiple accounts using Terraform with extensive experience in deployment and automation
  • Utilize Linux and open-source tooling as the foundation of your work, being proficient across various Linux distributions, scripting languages, clustering technologies, database engines, and configuration management tools, with a preference for Ansible
  • Develop and implement containerization strategies, ensuring well-crafted container builds. Must be capable of creating original containers and not just relying on third-party containers from public repositories
  • Assess and apply Kubernetes knowledge selectively, understanding when and why it is appropriate to useβ€”note, we are not a Kubernetes-focused environment
  • Collaborate closely with development teams, providing support in building and optimizing distributed systems
  • Maintain expertise in Git workflows, including proficiency in CI/CD automation tools such as GitHub Actions
  • Implement and manage monitoring and logging solutions, with hands-on experience in tools like DataDog and OpenTelemetry
  • Strive to prevent issues like log diving, incident response, root cause analysis, and late-night pages by proactively managing system stability and reliability

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Collectivei know you found this job on JobsCollider. Thanks! πŸ™