Senior Infrastructure Site Reliability Engineer

Logo of Crisis Text Line

Crisis Text Line

πŸ’΅ $135k-$163k
πŸ“Remote - Worldwide

Job highlights

Summary

Join Crisis Text Line as a Site Reliability Engineer (SRE) and play a vital role in ensuring the optimal performance, availability, and security of our cloud infrastructure. You will design, implement, and maintain our AWS-based infrastructure, collaborating with engineering and operations teams. Responsibilities include leading incident response, automating tasks, and mentoring junior team members. This role requires experience in SRE, AWS services (especially Fargate and CloudWatch), IaC tools, and scripting languages. A Bachelor's degree in a related field or equivalent experience is needed. Crisis Text Line offers a competitive salary, robust benefits package, and a commitment to employee well-being.

Requirements

  • Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred) or equivalent experience
  • Experience in site reliability engineering (SRE) or related roles, with a focus on cloud infrastructure management
  • Hands-on experience with AWS services, particularly AWS Fargate, CloudWatch, and related tools
  • Proficiency with infrastructure as code (IaC) tools such as Terraform or CloudFormation
  • Strong scripting and automation skills using languages such as Python, Bash, or PowerShell
  • Experience with container orchestration platforms such as Kubernetes or Amazon ECS
  • Solid understanding of networking concepts, security best practices, and DevOps principles
  • Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment
  • Reliable High-Speed Internet Required: Must have a stable high-speed internet connection to support seamless remote collaboration, virtual meetings, online job tasks, etc

Responsibilities

  • Lead, and maintain highly available, scalable, and secure infrastructure on AWS Fargate
  • Design and maintain CloudWatch alerting and monitoring configurations to proactively identify and resolve potential issues
  • Mentor and guide junior team members, sharing best practices and promoting a culture of excellence
  • Collaborate with cross-functional teams to define and implement best practices for infrastructure as code (IaC), continuous integration/continuous deployment (CI/CD), and site reliability engineering (SRE) methodologies
  • Lead in incident response and resolution, including troubleshooting complex system issues and implementing preventive measures to minimize downtime
  • Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention
  • Conduct performance tuning and optimization of infrastructure components to ensure optimal resource utilization and cost efficiency
  • Stay up-to-date with emerging technologies and industry trends to drive innovation and continuous improvement

Preferred Qualifications

AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer) are a plus

Benefits

  • 20 paid holidays including: Federal holidays like Juneteenth and Labor Day, Election day, Holiday break from Dec 24 through January 1, 2 renewal days, 2 floating holidays
  • Flexible paid time off, including: 15 vacation days, 3 personal days, 7 sick days
  • Medical, dental, and vision benefits for the staff member and family at no cost to the employee
  • 403B retirement plan (the nonprofit equivalent of a 401K): 3% contribution by Crisis Text Line to support building financial wellness, regardless of personal contribution
  • 12 weeks paid parental leave (after 6 months of employment)
  • Student loan repayment (after 2 years of continuous full time service)
  • Family support through a virtual childcare platform
  • Stipends/Allowances: Mental health (Monthly), Internet Service (Monthly), Professional Development (Annual), Wellness (Annual), Home office setup (One time/First year)

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.