Senior NOC Engineer

OfferUp Logo

OfferUp

πŸ“Remote - Chile

Summary

Join OfferUp's Network Operations Center (NOC) night team as a Senior Operations Engineer. Maintain the operational health of complex cloud infrastructure systems using advanced monitoring tools. Prevent and mitigate customer impact by monitoring, identifying, triaging, and resolving issues. Collaborate with engineering teams on escalations and participate in continuous improvement initiatives. This role requires at least 5 years of experience in highly-scaled environments, including 3 years in a SOC or NOC. The position is only available outside the US.

Requirements

  • A proven track record - At least 5 years success with highly-scaled internet/ mobile application environments, including 3 years working in a Security Operations Center (SOC) or Network Operations Center (NOC)
  • Sense of urgency - you rapidly acknowledge and engage on alerts maintaining our excellent team SLAs
  • Knowledge in Incident and Problem Management, ITSM tools (like Jira, Zendesk, Confluence)
  • Hunger to continue learning new technologies - UNIX/Linux and Cloud System administration experience and are eager to expand that skill set
  • Ability to think critically and strategically in a fast-paced, customer-centric environment
  • Expert-level proficiency in industry leading tools for infrastructure and application monitoring (like AWS Cloudwatch, Datadog, Splunk, CloudFlare)
  • Strong communication skills with the ability to convey complex technical issues to both technical and non-technical stakeholders (English is required)
  • Customer obsessed with technical curiosity - You are skilled at breaking down complex technical issues. You enjoy using available tools and data to not only fix issues, for our customers but prevent them from happening again
  • Ability to work in a fast-paced environment and adapt to changing priorities
  • Bachelors in Information Systems, or equivalent experience
  • Excellent high speed connectivity from home
  • AWS CCP Certification required
  • Excellent communication skills both written and spoken (fluency in English required)

Responsibilities

  • Provide first response and act as reference for the team for the monitoring, troubleshooting, and resolution of complex incidents within the cloud infrastructure, working closely to other Engineering teams
  • Develop and implement best practices for monitoring, response and fulfillment of our Incident, Change and Service Request queues
  • Analyze system logs and performance metrics to identify issues and improve overall system reliability
  • Collaborate with engineering teams to optimize and introduce new monitoring solutions; coordinate incident response when service impacts occur and support the Post-Mortem efforts to prevent recurrence
  • Maintain a solid understanding of cloud infrastructure and services, enhancing your technical skills over time

Preferred Qualifications

ITIL Foundation is a plus

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs