πWorldwide
Site Reliability Engineer
closed
Coalition, Inc.
π΅ $108k-$163k
πRemote - United States
Summary
Join Coalition as a Site Reliability Engineer and play a pivotal role in ensuring the performance, availability, and efficiency of our cloud-based systems. You will design, implement, and manage robust cloud solutions, automate infrastructure, and build developer-friendly platforms. This role involves working on impactful projects, optimizing cloud resources, and improving system observability. You'll also participate in on-call rotation to maintain system reliability. The position is remote and based in the US or Canada. This is a great opportunity for software engineers transitioning to SRE or experienced SREs seeking challenging technical problems.
Requirements
- 3+ years of experience in SRE/DevOps/Cloud engineering or Software Development roles in a full stack engineering environment
- Strong understanding of AWS services (e.g., EC2, S3, RDS, Lambda, VPC, etc.) and best practices for building scalable, secure, and cost-effective infrastructure
- Hands-on experience with IaC tools like Terraform, CloudFormation, or CDK to automate cloud infrastructure deployment and management
- Experience working with containerization and orchestration tools such as ECS, Kubernetes etc
- Experience working with fault tolerant services and the iterative development of highly-available systems
- Exposure to full-stack monitoring from system level metrics to SLOs, failure-based testing approaches, and monitoring strategies
- Understanding of CI/CD pipelines to accelerate deployments and improve both security and auditability (e.g. Github Actions, Jenkins, Travis, or CircleCI)
- Some knowledge of software engineering design patterns, agile development, and architecture principles
- Strong analytical and problem-solving skills, with experience in debugging and resolving infrastructure or application issues
- Ability to work closely with cross-functional teams, effectively communicate complex ideas, and advocate for best practices
- Bachelorβs or Masterβs degree in Computer Science, related field, or equivalent experience
Responsibilities
- Design, implement, and manage robust cloud solutions
- Automate infrastructure
- Build developer-friendly platforms and paved roads
- Optimize cloud resources
- Improve system observability
- Drive operational excellence across the organization
- Participate in a low-volume on-call rotation to ensure our systems remain highly reliable and available
- Isolate, trap, and respond to system failure
- Develop strategies for continuous monitoring and analysis to minimize downtime and reduce the need for manual intervention
- Solicit systems requirements, design, and implement new platform components leveraging infrastructure or SaaS services
Preferred Qualifications
- Experience working with Hashicorp Nomad and/or Vault
- Experience in Go or Python, writing libraries and tooling
- Familiarity with cloud networking concepts (e.g., VPC, DNS, Load Balancers, NAT) and cloud security principles, including IAM, role-based access control, and encryption
- Exposure to Kafka or other event streaming platforms
Benefits
- 100% medical, dental and vision coverage
- Flexible PTO policy
- Annual home office stipend and WeWork access
- Mental & physical health wellness programs (One Medical, Headspace, Wellhub, and more)!
- Competitive compensation and opportunity for advancement
- Remote position
This job is filled or no longer available
Similar Remote Jobs
πJapan
π°$60k-$120k
πAsia
πIndia
πArgentina, Spain
πUnited States
π°$129k-$200k
πWorldwide
π°$129k-$200k
πWorldwide
πUnited Kingdom
πUnited States, Canada