Staff Site Reliability Engineer

closed
Acquia Logo

Acquia

πŸ“Remote - Costa Rica

Summary

Join Acquia as a Staff Site Reliability Engineer and contribute to designing, implementing, and maintaining CI/CD pipelines, cloud infrastructure, and monitoring solutions. As a key player in the team, you will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

Requirements

  • BS in Computer Science or a comparable field of study, or equivalent practical experience
  • Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript
  • Experience with Unix/Linux systems administration using the CLI
  • Fundamental understanding of TCP/UDP networking concepts
  • Solid oral and written communications skills
  • CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments
  • Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production
  • Cloud Mastery: Proficient in at least one major cloud providerβ€”AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus
  • IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure
  • Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems
  • Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening
  • Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure
  • Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization

Responsibilities

  • Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools
  • Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools
  • Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance
  • Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers
  • Champion the culture of DevOps across teamsβ€”promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams
  • Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals
  • Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management
  • Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production

Preferred Qualifications

  • 8-13 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment
  • Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools
  • Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure
  • Strong scripting skills in Python, Go, or Bash
  • Experience with service mesh architectures like Istio or Linkerd is a plus
  • SRE Certification (or equivalent experience) is a bonus
  • Certified Kubernetes Administrator (CKA) is preferred
  • A passion for automation, observability, and reliability
This job is filled or no longer available

Similar Remote Jobs