Senior SRE - Edge Engineer

iHerb
Summary
Join iHerb's Infrastructure Engineering team as a Sr. SRE (Site Reliability Engineering) Engineer specializing in Edge Engineering. You will design, build, and maintain systems powering iHerb's global edge infrastructure, ensuring high availability, low latency, and robust security for edge services. This role involves scaling edge compute environments, integrating with CDN and DNS providers, and enforcing best practices in observability and incident response. You will collaborate with a global SRE team supporting a multi-cloud infrastructure, working with various technologies and developing team skills. The position requires a forward focus on edge engineering, developing infrastructure as code, and managing globally distributed edge service platforms. You will also lead incident response efforts and mentor team members.
Requirements
- Experience as a SRE Engineer or similar software engineering role
- Experience with CDN platforms (e.g., Cloudflare, Akamai, Fastly) and edge compute technologies (e.g., Cloudflare Workers, Lambda@Edge)
- Strong understanding of Anycast routing, DNS traffic management, HTTP/3, TLS offload, and edge caching
- Experience with Kubernetes and Helm: provisioning, troubleshooting, managing
- Experience with Terraform or IaaS tools
- Familiarity with virtual edge products like Cloudflare
- Proficient in open-source and commercial observability tooling such as Prometheus, Grafana, Jaeger, Datadog, NewRelic
- Ability to program (structured and/or OO) with one or more high level languages, such as Python, Java, Go
- Systematic problem-solving approach, proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Strong communication skills and a sense of ownership and drive
Responsibilities
- Develop and support infrastructure as code based controls and patterns over DNS, Cache rules, Page rules, WAF (Web Application Firewall) and serverless edge compute (e.g., Cloudflare Workers or Lambda@Edge)
- Contribute to and support edge-adjacent data architecture like DynamoDB Global Tables and Cloudflare KV/Durable Objects
- Architect and manage globally distributed edge service platforms leveraging CDN and edge compute capabilities
- Optimize latency, performance, and availability by configuring intelligent caching, traffic steering, and edge-routing strategies
- Collaborate with security teams to implement WAF rules and policies, DDoS protection, bot mitigation, and TLS termination at the edge layer
- Design observability systems to monitor edge latency, health, and traffic patterns, including real-time alerting and historical analysis
- Develop and maintain edge automation and infrastructure-as-code pipelines for deployments, testing, and versioning
- Lead edge-centric incident response efforts, including root cause analysis, blameless postmortems, and resiliency improvements
- Support development teams in leveraging SLIs and SLOs to improve availability and performance
- Collaborate with the ProdOps team on SOPs for incident management and escalation
- Partner with application, network, and platform teams to integrate edge architecture with core systems and services
- Champion and establish KPIs for measurement of the Edge Engineering practiceβs success and improvement
- Provide guidance to other team members on managing end-to-end availability and performance of mission critical applications
- Building automation to prevent problem recurrence, and building automated responses
- Mentor and partner with other team members to design techniques and standards, and to cultivate innovation and collaboration across multiple teams
- Manage individual project priorities, deadlines, and deliverables
- Participate in operational support on-call rotation
Preferred Qualifications
- BS or MS in Computer Science, Engineering, or related field
- 2 - 3 years of experience working with Cloudflare or other virtual edge products
- 3 - 5 years experience working with AWS or GCP
- 2 - 3 years supporting a kubernetes environment
- E-commerce experience with high-traffic websites
- Experience with Service Mesh and API Gateways (e.g., Consul, Envoy, Istio)
- Familiarity with CI/CD systems (e.g., Spinnaker, Jenkins, Harness) and containerization (e.g., Docker, Kubernetes)
Benefits
- Medical, dental, vision, and basic life insurance programs
- 401(k) plan
- Time Off and Paid Sick Leave
- Paid holidays
- Restrict Stock Units
- Annual bonuses
Share this job:
Similar Remote Jobs
