Site Reliability Engineer

closed
Coalition, Inc. Logo

Coalition, Inc.

πŸ’΅ $108k-$163k
πŸ“Remote - United States

Summary

Join Coalition as a Site Reliability Engineer and play a pivotal role in ensuring the performance, availability, and efficiency of our cloud-based systems. You will design, implement, and manage robust cloud solutions, automate infrastructure, and build developer-friendly platforms. This role involves working on impactful projects, optimizing cloud resources, and improving system observability. You'll also participate in on-call rotation to maintain system reliability. The position is remote and based in the US or Canada. This is a great opportunity for software engineers transitioning to SRE or experienced SREs seeking challenging technical problems.

Requirements

  • 3+ years of experience in SRE/DevOps/Cloud engineering or Software Development roles in a full stack engineering environment
  • Strong understanding of AWS services (e.g., EC2, S3, RDS, Lambda, VPC, etc.) and best practices for building scalable, secure, and cost-effective infrastructure
  • Hands-on experience with IaC tools like Terraform, CloudFormation, or CDK to automate cloud infrastructure deployment and management
  • Experience working with containerization and orchestration tools such as ECS, Kubernetes etc
  • Experience working with fault tolerant services and the iterative development of highly-available systems
  • Exposure to full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
  • Understanding of CI/CD pipelines to accelerate deployments and improve both security and auditability (e.g. Github Actions, Jenkins, Travis, or CircleCI)
  • Some knowledge of software engineering design patterns, agile development, and architecture principles
  • Strong analytical and problem-solving skills, with experience in debugging and resolving infrastructure or application issues
  • Ability to work closely with cross-functional teams, effectively communicate complex ideas, and advocate for best practices
  • Bachelor’s or Master’s degree in Computer Science, related field, or equivalent experience

Responsibilities

  • Design, implement, and manage robust cloud solutions
  • Automate infrastructure
  • Build developer-friendly platforms and paved roads
  • Optimize cloud resources
  • Improve system observability
  • Drive operational excellence across the organization
  • Participate in a low-volume on-call rotation to ensure our systems remain highly reliable and available
  • Isolate, trap, and respond to system failure
  • Develop strategies for continuous monitoring and analysis to minimize downtime and reduce the need for manual intervention
  • Solicit systems requirements, design, and implement new platform components leveraging infrastructure or SaaS services

Preferred Qualifications

  • Experience working with Hashicorp Nomad and/or Vault
  • Experience in Go or Python, writing libraries and tooling
  • Familiarity with cloud networking concepts (e.g., VPC, DNS, Load Balancers, NAT) and cloud security principles, including IAM, role-based access control, and encryption
  • Exposure to Kafka or other event streaming platforms

Benefits

  • 100% medical, dental and vision coverage
  • Flexible PTO policy
  • Annual home office stipend and WeWork access
  • Mental & physical health wellness programs (One Medical, Headspace, Wellhub, and more)!
  • Competitive compensation and opportunity for advancement
  • Remote position
This job is filled or no longer available