Summary

Join Restaurant365 as a Site Reliability Engineer (SRE) to assist in supporting, enhancing, and maintaining our infrastructure and cloud services. The ideal candidate will have extensive experience with SRE methodologies and processes, automation expertise, and strong scripting skills.

Requirements

Extensive experience with SRE methodologies and processes
Automation expert with coding skills and a mindset to automate manual/repetitive tasks with PowerShell, Bash, Perl, PHP, or containers
Extensive scripting experience with Terraform, YAML, Ansible, Python
Automation experience in public cloud environments, with a strong understanding of infrastructure as code
Experience in continuous deployment and lifecycle management using tools such as Gitlab, Git, stash
Linux engineering skills and working knowledge of Windows
Working experience with Nginx and Apache Tomcat
Azure or AWS: 2+ years hands on administration and automation of various Azure or AWS services (Azure AKS, Azure Functions, Azure Blob, AWS ECS, AWS EKS, LAMDA, S3, ALB/ELB, etc...)
Experience with Windows and Linux
Ability to effectively prioritize and execute tasks in a high velocity environment
Minimum of 2 years of related experience with a bachelor's degree; or equivalent work experience
Strong written, oral, and interpersonal communications skills

Responsibilities

Responding to production incidents and determining how we can prevent them in the future
Triageing and troubleshooting production issues to ensure reliability and performance
Identifying and automating manual processes
Continuously evolving our monitoring tools and platform
Promoting and applying best practices for building scalable and reliable services across engineering
Developing and maintaining technical documentation/diagrams, runbooks, and procedures
Provide ‘Always On’ support for a 24x7 online environment, by participating in an on-call rotation providing response to production incidents and participating in root cause analysis and problem management
Automate Public cloud environments by utilizing tools such as Terraform, Ansible, and cloud formation
Work within strict time frames following change management protocols to provide maximum uptime
Implement, review, and adhere to security policies along with working with audit teams
Research and remediate system vulnerabilities
Interact and coordinate with architects, developers, vendors, and internal business partners
Maintain documentation of all Cloud infrastructure related components
Maintain a solid working knowledge of current infrastructure and future trends
Other duties as assigned

Preferred Qualifications

AWS or Azure cloud certification is preferred
Preferred experience using: Jira, Prometheus, Grafana, ELK, Site24x7. Nagios a bonus!

Benefits

Comprehensive medical benefits
401k + matching
Equity Option Grant
Unlimited PTO + Company holidays
Wellness initiatives

Remote Site Reliability Engineer

Restaurant365

Job highlights

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Software Engineer, Site Reliability Engineer

Tailor

Remote

Software Development

Mid-level

Senior Infrastructure Engineer, Site Reliability Engineer

Flex

Remote

DevOps

Senior

Staff Software Engineer, Site Reliability Engineer

Fieldwire by Hilti

Remote

Software Development

Mid-level

Staff Software Engineer, Site Reliability Engineer

Babylist

Remote

Software Development

Mid-level

Staff Software Engineer, Site Reliability Engineer

Babylist

Remote

Software Development

Mid-level

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Site Reliability Engineering Manager

Experian

Remote

DevOps

Manager

Senior Site Reliability Engineer, Incident Excellence

HashiCorp

Remote

DevOps

Mid-level

Site Reliability Engineer

Vercel

Remote

DevOps

Mid-level

Site Reliability Engineer, Compute

Vercel

Remote

DevOps

Mid-level