Summary

Join Echo360 as a Site Reliability Engineer and play a critical role in ensuring the reliability, scalability, cost, and security of our cloud infrastructure. You will design and implement automated monitoring and alerting systems, collaborate with development teams, and conduct failure testing. Leveraging your AWS expertise, you will optimize performance, automate infrastructure provisioning, and enforce security best practices. Beyond technical skills, you will engage in incident response, mentorship, and continuous improvement. This fully remote position offers a competitive salary and comprehensive benefits. If you thrive in a fast-paced environment and are passionate about cloud optimization, this is an exciting opportunity to make a significant impact.

Requirements

5+ years of experience as a Site Reliability Engineer or similar role
Strong understanding of AWS cloud services, including DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, EKS, ECS and EC2
Experience with infrastructure automation tools like Ansible, Terraform, or CloudFormation
Experience with monitoring and alerting tools like CloudWatch, DataDog, Prometheus, Grafana, Kibana, and PagerDuty
Experience with GitHub actions, Cl/CD pipelines and deployment strategies
Strong problem-solving and analytical skills
Excellent communication and collaboration skills
Ability to work independently and take ownership of complex tasks
Passion for technology and a desire to learn and grow

Responsibilities

Ensure service reliability and SLO/SLA adherence to production, preventing incidents by proactively conducting failure testing
Implement automated monitoring and alerting systems for early detection of potential problems
Collaborate with development teams to perform deployments and rollbacks with minimal disruption
Optimize the performance and scalability of our AWS infrastructure, including RDS, DynamoDB, MySQL, S3, CloudSearch, OpenSearch, Kafka, Presto, SES, EKS, ECS, and EC2
Automate infrastructure provisioning and deployment processes using Terraform, CI/CD pipelines, and configuration management tools
Proactively identify and address potential security vulnerabilities to maintain compliance, IAM best practices, and secrets management
Participate in incident response and post-mortem analysis activities to identify root causes and prevent future occurrences
Help onboard and mentor junior team members, sharing your knowledge and expertise
Stay up to date on the latest cloud technologies and best practices for SRE
Participate in a well-structured on-call rotation with other Site Reliability Engineers
Explore new technologies and innovative solutions to improve service quality and speed to market
Participate in technical discussions and deep dives with the other engineering and product teams

Preferred Qualifications

Experience with Jenkins, PostgreSQL, and MongoDB
Experience with cloud cost optimization, security best practices and tools
Experience working in a fast-paced, agile environment
Experience Rancher, Cattleprod, and TeamCity a plus

Benefits

Medical, dental, vision, life & disability insurance
A 401(k) plan with company match
An unlimited PTO policy
Fully remote

Senior Site Reliability Engineer

Echo360

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

BeyondTrust

Remote

DevOps

Senior

Natera

Remote

DevOps

Senior

Wisp

Remote

DevOps

Senior

ServiceNow

Remote

DevOps

Senior

Loadsmart

Remote

DevOps

Senior