Senior Site Reliability Engineer

Juniper Square Logo

Juniper Square

💵 $140k-$185k
📍Remote - United States

Summary

Join Juniper Square as a Senior Site Reliability Engineer and contribute to scaling, securing, and improving our cloud infrastructure. You will work with modern cloud-native technologies, automate infrastructure management, and enhance system reliability. Collaborate with software engineers and the platform team to build and maintain self-service tools. This role demands high ownership, a bias for action, and strong problem-solving skills. We offer a variety of work arrangements, from fully remote to working in one of our physical offices. The position includes a competitive salary and a comprehensive benefits package.

Requirements

  • 5+ years of experience in SRE, DevOps, or Infrastructure Engineering with a proven track record of ownership and initiative
  • Strong experience with Kubernetes, Helm, and CNIs, including networking and security
  • Proficiency in AWS services such as RDS, Aurora, IAM, VPC, EKS, and EC2
  • Experience in PostgreSQL administration, including performance tuning and high availability in RDS/Aurora
  • Hands-on experience with GitHub Actions and ArgoCD for secure and scalable CI/CD automation
  • Strong background in Infrastructure as Code (IaC) with Crossplane and Terraform
  • Deep understanding of observability and monitoring with Datadog
  • Experience with Kyverno for Kubernetes policy-based security enforcement
  • Proficiency in Python and Bash scripting for automation and system management
  • Strong understanding of CI/CD security best practices and ability to implement controls for securing deployments
  • Self-starter mentality —actively seeks out and fixes problems without waiting for assignments
  • High ownership and accountability —takes initiative in driving improvements and following through to resolution
  • Strong problem-solving mindset —identifies bottlenecks, inefficiencies, and risks, then delivers scalable solutions
  • Excellent communication skills —documents processes in Confluence, collaborates cross-functionally, and influences engineering teams toward operational excellence

Responsibilities

  • Own reliability and scalability initiatives—identify, prioritize, and implement solutions before issues escalate
  • Participate in an on-call rotation, responding to incidents, performing root cause analysis, and driving long-term fixes
  • Design, deploy, and manage Kubernetes clusters using Helm charts, Cilium, and Karpenter to optimize performance and cost
  • Architect and maintain AWS infrastructure with a focus on RDS/Aurora PostgreSQL, networking, and scaling best practices
  • Implement GitHub Actions CI/CD pipelines, integrating security best practices and automation
  • Define and enforce policy-based security for Kubernetes using Kyverno
  • Automate infrastructure provisioning with Crossplane and Terraform to ensure consistency and scalability
  • Enhance observability and monitoring using Datadog to proactively detect and resolve issues
  • Improve security and reliability by identifying risks in CI/CD, cloud environments, and Kubernetes, then implementing necessary safeguards
  • Lead post-incident reviews, drive lessons learned into long-term improvements, and document best practices in Confluence

Preferred Qualifications

  • Deep experience with GitHub Actions for CI/CD automation, with a focus on security best practices
  • Extensive knowledge of Helm charts for managing Kubernetes applications
  • Strong experience in PostgreSQL, including optimization and high availability in RDS/Aurora
  • Experience with NoSQL databases and best practices for scaling and performance
  • Proven ability to influence engineering culture toward automation, self-service, and operational excellence
  • Experience with Karpenter for Kubernetes autoscaling
  • Previous experience with cost optimization strategies in AWS environments
  • Experience with Atlassian tools (Jira, Confluence) for tracking incidents and documentation
  • Strong experience with and a passion for expanding AI into the SRE and DevOps world

Benefits

  • Health, dental, and vision care for you and your family
  • Life insurance
  • Mental wellness coverage
  • Fertility and growing family support
  • Flex Time Off in addition to company paid holidays
  • Paid family leave, medical leave, and bereavement leave policies
  • Retirement saving plans
  • Allowance to customize your work and technology setup at home
  • Annual professional development stipend

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.