Summary
Join Acquia as a Staff Site Reliability Engineer and contribute to designing, implementing, and maintaining CI/CD pipelines, cloud infrastructure, and monitoring solutions. As a key player in the team, you will work closely with engineering teams to ensure they have the right infrastructure in place to deploy rapidly, safely, and reliably.
Requirements
- BS in Computer Science or a comparable field of study, or equivalent practical experience
- Experience working with one or more of: Go, Python, Ruby, PHP, Java or Javascript
- Experience with Unix/Linux systems administration using the CLI
- Fundamental understanding of TCP/UDP networking concepts
- Solid oral and written communications skills
- CI/CD Expertise: Extensive hands-on experience with CI/CD tools such as ArgoCD, Jenkins, CircleCI, or GitLab CI. Ability to design and implement pipelines that ensure rapid, reliable deployments
- Kubernetes Guru: Strong understanding and experience with Kubernetes, Helm, and container orchestration. Ability to scale and manage microservices in production
- Cloud Mastery: Proficient in at least one major cloud providerβAWS, GCP, or Azure. Experience with multi-cloud or hybrid-cloud architecture is a plus
- IaC Champion: Proficiency in Terraform, Ansible, or CloudFormation to manage infrastructure as code. Familiarity with GitOps workflows and version-controlled infrastructure
- Monitoring & Observability: Strong experience with monitoring tools like Prometheus, Grafana, Datadog, ELK, or New Relic. Ability to build custom dashboards and alerting systems
- Security-Focused: Deep understanding of security best practices in DevOps, including container security, CI/CD pipeline security, and cloud infrastructure hardening
- Problem Solver: Excellent troubleshooting skills with the ability to diagnose issues across a variety of environments, from code to infrastructure
- Collaboration Skills: Ability to work effectively in cross-functional teams, influencing peers and driving adoption of best practices across the organization
Responsibilities
- Design, build, and optimize continuous integration and continuous deployment (CI/CD) pipelines using ArgoCD, Jenkins, or similar tools
- Build and manage scalable, reliable infrastructure using Terraform, Kubernetes, and other IaC tools
- Architect and manage cloud environments (AWS, GCP, or Azure), focusing on cost optimization, scalability, and performance
- Implement comprehensive monitoring solutions using Prometheus, Grafana, ELK, and Datadog to detect and resolve performance bottlenecks before they impact customers
- Champion the culture of DevOps across teamsβpromote best practices, encourage adoption of new technologies, and drive a continuous learning mindset within the engineering teams
- Focus on building systems that are resilient by design, automating processes that improve reliability, and implementing Service Level Objectives (SLOs) to align engineering efforts with operational goals
- Collaborate with security teams to implement robust security practices, from container security to infrastructure hardening. Automate security checks within the pipeline for compliance and vulnerability management
- Work hand-in-hand with product development teams to understand their needs, integrate CI/CD practices into their workflows, and provide a fast, reliable, and secure path from code to production
Preferred Qualifications
- 8-13 years of hands-on experience as a DevOps Engineer, SRE, or related role in a cloud-native environment
- Deep knowledge of CI/CD pipelines, especially using ArgoCD or similar tools
- Proven expertise in cloud platforms (AWS, GCP, Azure), with experience building and managing scalable, reliable infrastructure
- Strong scripting skills in Python, Go, or Bash
- Experience with service mesh architectures like Istio or Linkerd is a plus
- SRE Certification (or equivalent experience) is a bonus
- Certified Kubernetes Administrator (CKA) is preferred
- A passion for automation, observability, and reliability