Remote Staff Site Reliability Engineer

Logo of Acquia

Acquia

📍Remote - Costa Rica

Job highlights

Summary

Join Acquia as a Staff Site Reliability Engineer and contribute to designing, implementing, and maintaining CI/CD pipelines, cloud infrastructure, and monitoring solutions. As a key player in the team, you will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.

Requirements

  • BS in Computer Science or a comparable field of study, or equivalent practical experience
  • Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript
  • Experience with Unix/Linux systems administration using the CLI
  • Fundamental understanding of TCP/UDP networking concepts
  • Solid oral and written communications skills
  • CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments
  • Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production
  • Cloud Mastery: Proficient in at least one major cloud provider—AWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus
  • IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure
  • Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems
  • Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening
  • Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure
  • Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization

Responsibilities

  • Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools
  • Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools
  • Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance
  • Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers
  • Champion the culture of DevOps across teams—promote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams
  • Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals
  • Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management
  • Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production

Preferred Qualifications

  • 8-13 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment
  • Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools
  • Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure
  • Strong scripting skills in Python, Go, or Bash
  • Experience with service mesh architectures like Istio or Linkerd is a plus
  • SRE Certification (or equivalent experience) is a bonus
  • Certified Kubernetes Administrator (CKA) is preferred
  • A passion for automation, observability, and reliability

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Acquia know you found this job on JobsCollider. Thanks! 🙏