Site Reliability Engineer

Appspace
Summary
Join Appspace's Cloud Operations team as a Site Reliability Engineer and play a key role in maintaining our cloud platform, which includes Kubernetes, Microservices, and various databases. You will automate maintenance tasks, deploy new features, troubleshoot issues, and collaborate with other teams. This mission-critical role requires strong experience in Python, shell scripting, Kubernetes, and Helm. The position offers flexible work schedules, remote work opportunities, and a variety of benefits, including competitive salaries, medical coverage, and paid parental leave. On-call coverage is required weekly during a limited window of US daytime hours over the weekend. This is an opportunity to grow your capabilities within a rapidly growing company.
Requirements
- Learn new technologies quickly and have a desire to be a lifelong learner
- Communicate well and adapt to working with others across different countries and cultures
- Have a strong background in Containers, Kubernetes, Helm, Linux, Python coding, and some experience with Windows Server OS and MacOS
- Possess solid troubleshooting experience and the ability to reason through a process workflow to identify faults
- Be flexible on occasionally attending βoff-hourβ meetings
- Be open to quarterly travel up to 5%
Responsibilities
- Automate maintenance tasks for the Cloud Platform
- Deploy new features and releases of software into Kubernetes via Helm
- Troubleshoot performance issues or errors, resolving the cause or forwarding research to Engineering
- Action Request Tickets from other teams to support their needs and prepare for releases
- Monitor application performance, uptime, and cloud infrastructure performance, proactively addressing negative trends
- Lead, participate, or execute in incident management, ascertaining root cause, resolving issues, and preventing recurrence
- Configure, monitor, research, and evaluate workload performance on Google Cloud Platform and Microsoft Azure Clouds
- Collaborate with Development and Quality Assurance teams to address issues
- Document new or update existing processes and procedures
Preferred Qualifications
- Have experience with Build pipeline tools and the Atlassian suite (JIRA, Confluence, Bitbucket/Git, Bamboo, Octopus)
- Have experience with monitoring and alerting platforms, especially StackDriver
- Have experience with HashiCorp Terraform
- Have experience with IIS
- Have experience with administering MySQL & MongoDB
- Have experience with administering message brokering systems like RabbitMQ
- Have experience with Google Cloud Platform, Google Kubernetes Engine, Google Compute Engine, and Google Storage (comparable experience with AWS or Azure will be considered)
Benefits
- Competitive salaries, medical, dental and vision coverage, disability coverage, employer paid life insurance, mental health resources, 401(k) plan and a fully paid parental leave program (US based team members)
- Generous PTO
- Flexible work schedules
- Remote work opportunities
- Paid company holidays
- Appspace Quiet Fridays (No non-essential internal meetings scheduled)
- A casual dress work environment
Share this job:
Similar Remote Jobs
