Alia Services is hiring a
Site Reliability Engineer
closedAlia Services
π΅ ~$133k-$179k
πRemote - Worldwide
Summary
The job is for a Site Reliability Engineer position at Trax Retail, a fast-growing retail merchandising company. The role involves managing and maintaining the cloud infrastructure, implementing DevOps principles, and ensuring stable releases. The ideal candidate should have extensive experience in Linux-based Server Operating Systems, cloud infrastructure, databases, CI/CD systems, container orchestration, monitoring systems, network protocols, web service fundamentals, MySQL Database performance tuning, security systems, and programming.
Requirements
- 5+ years of experience managing Linux-based Server Operating Systems
- 5+ years of experience managing cloud infrastructure (GCP, AWS, or Azure)
- 5+ years of experience managing large high-performance databases and data processing jobs for business critical reporting applications
- 5+ years of experience managing environments using Infrastructure and Configuration-as-Code (Terraform/CloudFormation/Puppet/Chef/Etc)
- 5+ years of experience with CI/CD and test automation systems (Jenkins/Gitlab/Argo/Helm/etc.)
- Excellent written and verbal communication skills and ability to communicate with stakeholders across the business
- Knowledge of monitoring systems including host/OS metrics, logging, and web application performance, using both SaaS products (DataDog/NewRelic/etc.) and open-source solutions (syslog/Loki/Grafana/etc.)
- Knowledge of container orchestration systems such as Kubernetes, including autoscaling, service mesh, rollout strategies, and cost management
- Knowledge of network protocols, including TCP/IP, HTTP/S, DNS, DHCP, and NAT
- Thorough understanding of web service fundamentals, such as caching, CDNs, load balancing, and traffic shaping
- MySQL Database performance tuning and high-availability experience
- Experience with security systems, including WAF, firewall rules, public key infrastructure, and cryptography
- Experience writing code in any programming language
- Experience writing optimized SQL queries
Responsibilities
- Implement cost-effective and scalable solutions to complex cloud infrastructure problems
- Maintain the reliability of our cloud infrastructure while simultaneously improving and upgrading it
- Perform low-level analysis and debugging of problems in both containerized and VM-based Linux workloads
- Automate manual processes to improve developer productivity
- Ensure stable and reliable releases by maintaining and improving our CI/CD systems
- Be an advocate for DevOps best practices in both the Infrastructure team and across the organization
- Manage and participate in a rotating On Call team which is responsible for handling high-priority bugs and issues
Preferred Qualifications
- Production experience with Google Cloud Platform (GCP)
- Ability to code modern, containerized web applications
- Strong understanding of the Python programming language
- Ability to perform low-level network debugging, including packet analysis and an understanding of the Linux network stack
Benefits
- Full time positions with the potential for overtime
- 100% remote positions
- Competitive compensation package
- An inclusive, fast pace exciting environment culture offering accelerated professional growth
- 1-on-1 coaching with feedback sessions, mentorships, and leadership development programs
- Opportunities for cross-functional development
This job is filled or no longer available
Similar Jobs
- π°~$150k-$222kπAsia
- π°$60k-$120kπAsia
- π°$170k-$190kπUnited States
- π°$136k-$170kπUnited States
- π°$172k-$215kπUnited States
- π°$198k-$247kπUnited States
- π°~$82k-$120kπWorldwide
- π°$64k-$74kπUnited Kingdom
- π°$165k-$175kπUnited States
- π°~$82k-$120kπColombia, Costa Rica