Site Reliability Engineer

NBCUniversal Logo

NBCUniversal

πŸ’΅ $110k-$145k
πŸ“Remote - United States

Summary

Join NBCUniversal's Video Streaming Engineering team as a Site Reliability Engineer, focusing on live channel distribution. You will be part of a 24x7 team supporting and maintaining distribution systems, diagnosing and preventing on-air issues. Responsibilities include investigating broadcast system issues, collaborating with vendors, creating documentation, assisting with deployments, and participating in on-air support. This fully remote position requires a BS in Engineering/Computer Science or related field, 5+ years of DevOps/SRE experience, and extensive experience with AWS, containerization, CI/CD, and Linux system administration. The ideal candidate will also possess experience in the media and entertainment industry and 24x7 production environments. Competitive salary and benefits are offered.

Requirements

  • BS in Engineering/Computer Science or related field
  • A passion for investigating issues, driving towards resolutions and effective problem solving
  • 5+ years of DevOps/SRE experience in the technology sector delivering production-quality software or software-defined infrastructure in a high traffic environment run on a β€œcloud hosting” environment (AWS preferred)
  • 5+ years of experience in a support/analysis role
  • Experience with deployment automation in within AWS-hosted services (Cloud Formation, Terraform, Ansible)
  • Familiarity with containerization and orchestration services such as Kubernetes and Docker
  • Familiarity with CI/CD orchestration tools (e.g., GitHub Actions, or Jenkins)
  • Experience with CI/CD build and deployment practices
  • 5+ years of Linux System Administration
  • 5+ years experience coding in Go, Python, Ruby, Java, or shell languages
  • Experience in designing, analyzing and building automation and tools for large scale systems
  • Professional experience using modern log/metric aggregation software (e.g. Cloudwatch, Elasticsearch + Kibana, Splunk, Grafana)
  • Experience and comfort with continuous delivery/frequent releases of code to production
  • A methodical and logical approach to reasoning about problems and system interactivity
  • Willingness and ability to prioritize business needs to meet short-term demands
  • Working knowledge of the OSI model, comfortable troubleshooting networking issues
  • An unwillingness to tolerate user-facing downtime

Responsibilities

  • Investigate issues within broadcast systems and their integration points to find the root cause of problems or systemic issues
  • As a Level 2 resource, drive and own investigations related to Broadcast issues and report back findings in a timely manner to leadership and operations
  • Follow up with team members & 3rd party vendors if issues found cannot be solved and drive vendors for root cause and solutions if possible
  • Create comprehensive documentation outlining the intricacies of encountered issue, elucidating the root cause and steps for effective issue resolution
  • Assist in the deployment and testing of patches or fixes from vendors both in the Development environment as well as the Production environment until completion and to the satisfaction of the Operations team
  • Assist in the design, analysis, or evaluation of assigned projects using sound engineering principles and adhering to business standards, practices, procedures, and product / program requirements
  • Support and participate in On-air systems integration and on-air rollout
  • Provide 24x7 On-Air systems support and daily operations support; some on-call support may be required from time to time during on-air rollout and special broadcast events
  • Attend daily maintenance and operations review calls to report back to leadership and Operations on findings from new and open issues and their potential fixes and planned deployments of those fixes

Preferred Qualifications

  • 3+ years of experience in the Media & Entertainment industry
  • 3+ years of experience in 24x7 production environments
  • 3+ years of experience supporting IT/Broadcast Systems
  • 5+ years of customer facing experience
  • Experience with Live TV Broadcasting, OTT Streaming, codecs and ARQ technologies a plus

Benefits

  • Medical, dental and vision insurance
  • 401(k)
  • Paid leave
  • Tuition reimbursement
  • A variety of other discounts and perks

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.