Alia Services is hiring a
Site Reliability Engineer

closed
Logo of Alia Services

Alia Services

πŸ’΅ ~$133k-$179k
πŸ“Remote - Worldwide

Summary

The job is for a Site Reliability Engineer position at Trax Retail, a fast-growing retail merchandising company. The role involves managing and maintaining the cloud infrastructure, implementing DevOps principles, and ensuring stable releases. The ideal candidate should have extensive experience in Linux-based Server Operating Systems, cloud infrastructure, databases, CI/CD systems, container orchestration, monitoring systems, network protocols, web service fundamentals, MySQL Database performance tuning, security systems, and programming.

Requirements

  • 5+ years of experience managing Linux-based Server Operating Systems
  • 5+ years of experience managing cloud infrastructure (GCP, AWS, or Azure)
  • 5+ years of experience managing large high-performance databases and data processing jobs for business critical reporting applications
  • 5+ years of experience managing environments using Infrastructure and Configuration-as-Code (Terraform/CloudFormation/Puppet/Chef/Etc)
  • 5+ years of experience with CI/CD and test automation systems (Jenkins/Gitlab/Argo/Helm/etc.)
  • Excellent written and verbal communication skills and ability to communicate with stakeholders across the business
  • Knowledge of monitoring systems including host/OS metrics, logging, and web application performance, using both SaaS products (DataDog/NewRelic/etc.) and open-source solutions (syslog/Loki/Grafana/etc.)
  • Knowledge of container orchestration systems such as Kubernetes, including autoscaling, service mesh, rollout strategies, and cost management
  • Knowledge of network protocols, including TCP/IP, HTTP/S, DNS, DHCP, and NAT
  • Thorough understanding of web service fundamentals, such as caching, CDNs, load balancing, and traffic shaping
  • MySQL Database performance tuning and high-availability experience
  • Experience with security systems, including WAF, firewall rules, public key infrastructure, and cryptography
  • Experience writing code in any programming language
  • Experience writing optimized SQL queries

Responsibilities

  • Implement cost-effective and scalable solutions to complex cloud infrastructure problems
  • Maintain the reliability of our cloud infrastructure while simultaneously improving and upgrading it
  • Perform low-level analysis and debugging of problems in both containerized and VM-based Linux workloads
  • Automate manual processes to improve developer productivity
  • Ensure stable and reliable releases by maintaining and improving our CI/CD systems
  • Be an advocate for DevOps best practices in both the Infrastructure team and across the organization
  • Manage and participate in a rotating On Call team which is responsible for handling high-priority bugs and issues

Preferred Qualifications

  • Production experience with Google Cloud Platform (GCP)
  • Ability to code modern, containerized web applications
  • Strong understanding of the Python programming language
  • Ability to perform low-level network debugging, including packet analysis and an understanding of the Linux network stack

Benefits

  • Full time positions with the potential for overtime
  • 100% remote positions
  • Competitive compensation package
  • An inclusive, fast pace exciting environment culture offering accelerated professional growth
  • 1-on-1 coaching with feedback sessions, mentorships, and leadership development programs
  • Opportunities for cross-functional development
This job is filled or no longer available

Similar Jobs