Summary

Join TextNow, the nation's largest free phone service provider, and become a key member of our Site Reliability Engineering (SRE) team. We're on a mission to democratize phone service, and you'll play a vital role in ensuring the reliability and scalability of our infrastructure. Your responsibilities will encompass designing, building, and maintaining highly available systems, automating infrastructure using tools like Terraform and Ansible, and participating in incident response and on-call support. You'll also focus on performance monitoring and optimization, collaborating with cross-functional teams, and driving continuous improvement initiatives. We offer a strong work-life blend, flexible work arrangements, competitive pay and benefits, and a culture that values collaboration and innovation.

Requirements

Experienced in SRE/DevOps: You have 2+ years of experience in an operationally focused role, such as SRE, DevOps, or Infrastructure Engineering, with a deep understanding of reliability, scalability, and performance optimization
Proficient with Key Technologies: Hands-on experience with AWS, GitHub, Terraform, Ansible, or similar tools to build and manage cloud infrastructure efficiently
Incident Management Expert: You are comfortable handling production incidents, analyzing root causes, and implementing long-term fixes to prevent recurrence
Automation & Observability Focused: Passionate about reducing toil through scripting and automation while ensuring robust observability using logging, metrics, and monitoring tools
Collaborative & Impact-Driven: You enjoy working cross-functionally with engineers, product teams, and leadership to drive meaningful improvements to system reliability

Responsibilities

Ensure System Reliability: Design, build, and maintain scalable, resilient, and highly available systems to support TextNow’s infrastructure and services
Automation & Infrastructure as Code: Develop and maintain automation using Terraform, Ansible, and other tools to enable efficient deployment, scaling, and operations of cloud-based systems (AWS preferred)
Incident Response & On-Call Support: Participate in an on-call rotation, troubleshoot issues, and drive incident resolution to minimize downtime and improve system performance. Conduct post-mortems and implement corrective actions to enhance reliability
Performance Monitoring & Optimization: Implement and improve observability tools, logging, and monitoring solutions to identify and mitigate potential system issues proactively
Collaboration & Cross-Team Engagement: Work closely with software engineers, DevOps, and product teams to align technical efforts with business objectives and improve system reliability from development to production
Continuous Improvement: Identify areas for improvement in architecture, automation, and operational practices. Contribute to the design and implementation of new SRE best practices

Benefits

Strong work life blend
Flexible work arrangements (wfh, remote, or access to one of our office spaces)
Employee Stock Options
Unlimited vacation
Competitive pay and benefits
Parental leave
Benefits for both physical and mental well being (wellness credit and L&D credit)
We travel a few times a year for various team events, company wide off-sites, and more

Site Reliability Engineer

TextNow

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

theScore

Remote

DevOps

Mid-level

theScore

Remote

DevOps

Mid-level

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Fetch

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level