Summary

Join Appspace's Cloud Operations team as a Site Reliability Engineer and play a key role in maintaining our cloud platform, which includes Kubernetes, Microservices, and various databases. You will automate maintenance tasks, deploy new features, troubleshoot issues, and collaborate with other teams. This mission-critical role requires strong experience in Python, shell scripting, Kubernetes, and Helm. The position offers flexible work schedules, remote work opportunities, and a variety of benefits, including competitive salaries, medical coverage, and paid parental leave. On-call coverage is required weekly during a limited window of US daytime hours over the weekend. This is an opportunity to grow your capabilities within a rapidly growing company.

Requirements

Learn new technologies quickly and have a desire to be a lifelong learner
Communicate well and adapt to working with others across different countries and cultures
Have a strong background in Containers, Kubernetes, Helm, Linux, Python coding, and some experience with Windows Server OS and MacOS
Possess solid troubleshooting experience and the ability to reason through a process workflow to identify faults
Be flexible on occasionally attending “off-hour” meetings
Be open to quarterly travel up to 5%

Responsibilities

Automate maintenance tasks for the Cloud Platform
Deploy new features and releases of software into Kubernetes via Helm
Troubleshoot performance issues or errors, resolving the cause or forwarding research to Engineering
Action Request Tickets from other teams to support their needs and prepare for releases
Monitor application performance, uptime, and cloud infrastructure performance, proactively addressing negative trends
Lead, participate, or execute in incident management, ascertaining root cause, resolving issues, and preventing recurrence
Configure, monitor, research, and evaluate workload performance on Google Cloud Platform and Microsoft Azure Clouds
Collaborate with Development and Quality Assurance teams to address issues
Document new or update existing processes and procedures

Preferred Qualifications

Have experience with Build pipeline tools and the Atlassian suite (JIRA, Confluence, Bitbucket/Git, Bamboo, Octopus)
Have experience with monitoring and alerting platforms, especially StackDriver
Have experience with HashiCorp Terraform
Have experience with IIS
Have experience with administering MySQL & MongoDB
Have experience with administering message brokering systems like RabbitMQ
Have experience with Google Cloud Platform, Google Kubernetes Engine, Google Compute Engine, and Google Storage (comparable experience with AWS or Azure will be considered)

Benefits

Competitive salaries, medical, dental and vision coverage, disability coverage, employer paid life insurance, mental health resources, 401(k) plan and a fully paid parental leave program (US based team members)
Generous PTO
Flexible work schedules
Remote work opportunities
Paid company holidays
Appspace Quiet Fridays (No non-essential internal meetings scheduled)
A casual dress work environment

Site Reliability Engineer

Appspace

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Mid-level

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Remote

DevOps

Senior