Summary

Join Twilio as a Site Reliability Engineer on our Data Infrastructure Platform! This role involves designing, building, and optimizing our platform to support various data-driven initiatives. You will collaborate with cross-functional teams, architect scalable solutions, and implement data solutions and infrastructure. The ideal candidate is passionate about leveraging data, possesses strong technical skills, and has experience with modern data technologies. You will be responsible for designing and implementing data streaming solutions, ensuring data quality and security, and staying current with emerging technologies. Mentoring junior engineers and contributing to a culture of continuous learning are also key aspects of this position. This remote role offers competitive pay and benefits.

Requirements

Bachelor's or Master's degree in Computer Science, Engineering, or a related field
8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with a focus on infrastructure or backend systems
Strong production experience , including operational management, scaling, partitioning strategies, and tuning for performance and reliability
Hands-on experience with Kubernetes (preferably EKS) , including deploying and managing stateful services and operators in Kubernetes environments
Deep understanding of AWS cloud services , particularly those relevant to data infrastructure (e.g., EC2, EBS, S3, IAM, MSK, CloudWatch, VPC, ALB/NLB)
Proficiency in infrastructure-as-code tools , such as Terraform or CloudFormation, for managing and automating infrastructure
Expertise in observability tools (e.g., Prometheus, Grafana, OpenTelemetry, Datadog) to monitor distributed systems and set up alerting for reliability and latency
Proficient in at least one programming language (e.g., Go, Python, Java, or similar) for building automation, tooling, and contributing to platform services
Experience designing and implementing incident response processes , SLOs/SLIs, runbooks, and participating in on-call rotations
Strong understanding of distributed systems principles , including consensus, durability, throughput, and availability tradeoffs
Proven track record of driving reliability improvements in high-scale, data-intensive systems and collaborating with platform and data engineering teams
Excellent problem-solving and analytical skills
Strong verbal & written communication skills, with the ability to work effectively in a cross-functional team environment

Responsibilities

Design, build, and maintain infrastructure and scalable frameworks to support data ingestion, processing, and analysis
Collaborate with stakeholders, analysts, and product teams to understand business requirements and translate them into technical solutions
Architect and implement data streaming solutions using modern data technologies such as Kafka, AWS MSK, Terraform, Hive, Hudi, Presto, Airflow, and cloud-based services like AWS EKS, Lakeformation, Glue and Athena
Design and implement frameworks and solutions for performance, reliability, and cost-efficiency
Ensure data quality, integrity, and security throughout the data lifecycle
Stay current with emerging technologies and best practices in big data technologies
Mentor early in career engineers and contribute to a culture of continuous learning and improvement

Preferred Qualifications

Data technologies like Apache Kafka, AWS MSK, Flink, Clickhouse etc
Bias to action, ability to iterate and ship rapidly
Passion to build data products, prior projects in this area

Benefits

Competitive pay
Generous time off
Ample parental and wellness leave
Healthcare
A retirement savings program

Site Reliability Engineer

Twilio

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Manager