Site Reliability Engineer

Logo of Coalition, Inc.

Coalition, Inc.

๐Ÿ’ต $108k-$163k
๐Ÿ“Remote - United States

Job highlights

Summary

Join Coalition as a Site Reliability Engineer and play a pivotal role in ensuring the performance, availability, and efficiency of our cloud-based systems. You will design, implement, and manage robust cloud solutions, automate infrastructure, and build developer-friendly platforms. This role involves working on impactful projects, optimizing cloud resources, and improving system observability. You'll also participate in on-call rotation to maintain system reliability. The position is remote and based in the US or Canada. This is a great opportunity for software engineers transitioning to SRE or experienced SREs seeking challenging technical problems.

Requirements

  • 3+ years of experience in SRE/DevOps/Cloud engineering or Software Development roles in a full stack engineering environment
  • Strong understanding of AWS services (e.g., EC2, S3, RDS, Lambda, VPC, etc.) and best practices for building scalable, secure, and cost-effective infrastructure
  • Hands-on experience with IaC tools like Terraform, CloudFormation, or CDK to automate cloud infrastructure deployment and management
  • Experience working with containerization and orchestration tools such as ECS, Kubernetes etc
  • Experience working with fault tolerant services and the iterative development of highly-available systems
  • Exposure to full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
  • Understanding of CI/CD pipelines to accelerate deployments and improve both security and auditability (e.g. Github Actions, Jenkins, Travis, or CircleCI)
  • Some knowledge of software engineering design patterns, agile development, and architecture principles
  • Strong analytical and problem-solving skills, with experience in debugging and resolving infrastructure or application issues
  • Ability to work closely with cross-functional teams, effectively communicate complex ideas, and advocate for best practices
  • Bachelorโ€™s or Masterโ€™s degree in Computer Science, related field, or equivalent experience

Responsibilities

  • Design, implement, and manage robust cloud solutions
  • Automate infrastructure
  • Build developer-friendly platforms and paved roads
  • Optimize cloud resources
  • Improve system observability
  • Drive operational excellence across the organization
  • Participate in a low-volume on-call rotation to ensure our systems remain highly reliable and available
  • Isolate, trap, and respond to system failure
  • Develop strategies for continuous monitoring and analysis to minimize downtime and reduce the need for manual intervention
  • Solicit systems requirements, design, and implement new platform components leveraging infrastructure or SaaS services

Preferred Qualifications

  • Experience working with Hashicorp Nomad and/or Vault
  • Experience in Go or Python, writing libraries and tooling
  • Familiarity with cloud networking concepts (e.g., VPC, DNS, Load Balancers, NAT) and cloud security principles, including IAM, role-based access control, and encryption
  • Exposure to Kafka or other event streaming platforms

Benefits

  • 100% medical, dental and vision coverage
  • Flexible PTO policy
  • Annual home office stipend and WeWork access
  • Mental & physical health wellness programs (One Medical, Headspace, Wellhub, and more)!
  • Competitive compensation and opportunity for advancement
  • Remote position

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Coalition, Inc. know you found this job on JobsCollider. Thanks! ๐Ÿ™