Remote Site Reliability Engineer

Logo of Restaurant365

Restaurant365

πŸ’΅ $100k-$130k
πŸ“Remote - Worldwide

Job highlights

Summary

Join Restaurant365 as a Site Reliability Engineer (SRE) to assist in supporting, enhancing, and maintaining our infrastructure and cloud services. The ideal candidate will have extensive experience with SRE methodologies and processes, automation expertise, and strong scripting skills.

Requirements

  • Extensive experience with SRE methodologies and processes
  • Automation expert with coding skills and a mindset to automate manual/repetitive tasks with PowerShell, Bash, Perl, PHP, or containers
  • Extensive scripting experience with Terraform, YAML, Ansible, Python
  • Automation experience in public cloud environments, with a strong understanding of infrastructure as code
  • Experience in continuous deployment and lifecycle management using tools such as Gitlab, Git, stash
  • Linux engineering skills and working knowledge of Windows
  • Working experience with Nginx and Apache Tomcat
  • Azure or AWS: 2+ years hands on administration and automation of various Azure or AWS services (Azure AKS, Azure Functions, Azure Blob, AWS ECS, AWS EKS, LAMDA, S3, ALB/ELB, etc...)
  • Experience with Windows and Linux
  • Ability to effectively prioritize and execute tasks in a high velocity environment
  • Minimum of 2 years of related experience with a bachelor's degree; or equivalent work experience
  • Strong written, oral, and interpersonal communications skills

Responsibilities

  • Responding to production incidents and determining how we can prevent them in the future
  • Triageing and troubleshooting production issues to ensure reliability and performance
  • Identifying and automating manual processes
  • Continuously evolving our monitoring tools and platform
  • Promoting and applying best practices for building scalable and reliable services across engineering
  • Developing and maintaining technical documentation/diagrams, runbooks, and procedures
  • Provide β€˜Always On’ support for a 24x7 online environment, by participating in an on-call rotation providing response to production incidents and participating in root cause analysis and problem management
  • Automate Public cloud environments by utilizing tools such as Terraform, Ansible, and cloud formation
  • Work within strict time frames following change management protocols to provide maximum uptime
  • Implement, review, and adhere to security policies along with working with audit teams
  • Research and remediate system vulnerabilities
  • Interact and coordinate with architects, developers, vendors, and internal business partners
  • Maintain documentation of all Cloud infrastructure related components
  • Maintain a solid working knowledge of current infrastructure and future trends
  • Other duties as assigned

Preferred Qualifications

  • AWS or Azure cloud certification is preferred
  • Preferred experience using: Jira, Prometheus, Grafana, ELK, Site24x7. Nagios a bonus!

Benefits

  • Comprehensive medical benefits
  • 401k + matching
  • Equity Option Grant
  • Unlimited PTO + Company holidays
  • Wellness initiatives

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Restaurant365 know you found this job on JobsCollider. Thanks! πŸ™