Site Reliability Engineer

Logo of Coalition, Inc.

Coalition, Inc.

๐Ÿ“Remote - Canada

Job highlights

Summary

Join Coalition as a Site Reliability Engineer and play a pivotal role in ensuring the performance, availability, and efficiency of our cloud-based systems. You will design, implement, and manage robust cloud solutions, automate infrastructure, build developer-friendly platforms, and optimize cloud resources. This remote position, based in the US or Canada, involves working on impactful projects, participating in on-call rotation, and focusing on system failure response and continuous monitoring. The ideal candidate is a software engineer transitioning to SRE or an experienced SRE seeking challenging technical problems. We offer competitive compensation and a remote-first, mission-driven team environment.

Requirements

  • 3+ years of experience in SRE/DevOps/Cloud engineering or Software Development roles in a full stack engineering environment
  • Strong understanding of AWS services (e.g., EC2, S3, RDS, Lambda, VPC, etc.) and best practices for building scalable, secure, and cost-effective infrastructure
  • Hands-on experience with IaC tools like Terraform, CloudFormation, or CDK to automate cloud infrastructure deployment and management
  • Experience working with containerization and orchestration tools such as ECS, Kubernetes etc
  • Experience working with fault tolerant services and the iterative development of highly-available systems
  • Exposure to full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
  • Understanding of CI/CD pipelines to accelerate deployments and improve both security and auditability (e.g. Github Actions, Jenkins, Travis, or CircleCI)
  • Some knowledge of software engineering design patterns, agile development, and architecture principles
  • Strong analytical and problem-solving skills, with experience in debugging and resolving infrastructure or application issues
  • Ability to work closely with cross-functional teams, effectively communicate complex ideas, and advocate for best practices
  • Bachelorโ€™s or Masterโ€™s degree in Computer Science, related field, or equivalent experience

Responsibilities

  • Design, implement, and manage robust cloud solutions
  • Automate infrastructure
  • Build developer-friendly platforms and paved roads
  • Optimize cloud resources
  • Improve system observability
  • Drive operational excellence across the organization
  • Participate in a low-volume on-call rotation to ensure our systems remain highly reliable and available
  • Isolate, trap, and respond to system failure
  • Develop strategies for continuous monitoring and analysis to minimize downtime and reduce the need for manual intervention
  • Solicit systems requirements, design, and implement new platform components leveraging infrastructure or SaaS services

Preferred Qualifications

  • Experience working with Hashicorp Nomad and/or Vault
  • Experience in Go or Python, writing libraries and tooling
  • Familiarity with cloud networking concepts (e.g., VPC, DNS, Load Balancers, NAT) and cloud security principles, including IAM, role-based access control, and encryption
  • Exposure to Kafka or other event streaming platforms

Benefits

  • 100% medical, dental, and vision coverage
  • Flexible PTO
  • Annual home office stipend and WeWork access
  • Mental & physical health wellness programs like Headspace, Lumino, and more!
  • Competitive compensation and opportunity for advancement

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Coalition, Inc. know you found this job on JobsCollider. Thanks! ๐Ÿ™