Senior Site Reliability Engineer - Network Operations

Fastly
Summary
Join Fastly's Technical Operations team as a Senior Site Reliability Engineer (Networking) and contribute to building and operating the infrastructure powering the Fastly Edge Cloud Platform. You will build, operate, and maintain the global network, respond to traffic incidents, and innovate monitoring methods. Collaborate with partner teams to maintain network performance and advocate for operational stability. Mentor team members on global routing complexities and partner with engineering teams to shape software solutions. This role requires extensive experience in internet protocols (IP, BGP, Anycast, DNS), proficiency with Tier 1 ISPs, and experience with monitoring and visibility tools. The position offers a hybrid or remote work location and requires occasional travel.
Requirements
- Extensive experience in the protocols and practices that make up the fabric of the global internet, including IP, BGP, Anycast and DNS
- Proficiency in Tier 1 Internet service providers, Internet exchanges and cloud providers
- Ability to analyze traffic patterns across multiple dimensions using flow-based tools
- Experience working with alerting, monitoring and visibility tools (such as Graphite/Grafana, Prometheus, or Splunk)
- Experience in code and design reviews and Scripting abilities in a common language such as Python, etc
- Experience with Linux/Unix
- Knowledge across cloud hosting solutions (i.e., GCP, AWS and Azure)
- Knowledge of DevOps practices and CI / CD pipelines (ie. Git, Jenkins, Ansible)
- Adept at knowledge sharing and creating comprehensive documentation to empower teams
- Able to collaborate with cross-functional teams to shape the technical roadmap, prioritizing initiatives to optimize automation tooling and the network
Responsibilities
- Build, operate, and maintain the continually growing global network footprint of Fastlyβs Edge Cloud Platform
- Response to significant traffic incidents and lead network incidents, resolving edge cases and failure scenarios with your expertise in IP routing, particularly BGP
- Innovate new methods for monitoring network performance, focusing on the end-user experience, and proactively address potential issues
- Partner in the development and iteration of tools and automation systems that improve how we operate and build the network
- Continual deep-dive of performance-based analytics and close involvement with partner teams to maintain a performant global network
- Advocate for the operational stability of the network by identifying opportunities and partnering with engineering teams to shape their roadmaps and software solutions
- Mentor team members on the complexities of global routing, especially in an anycast-heavy environment
Benefits
- We offer a comprehensive benefits package including medical, dental, and vision insurance
- Family planning, mental health support along with Employee Assistance Program, Insurance (Life, Disability, and Accident), a Flexible Vacation policy and up to 18 days of accrued paid sick leave are there to help support our employees
- We also offer 401(k) (including company match) and an Employee Stock Purchase Program
- For 2025, we offer 11 paid local holidays, 11 paid company wellness days
Share this job:
Similar Remote Jobs
