Senior Site Reliability Engineer

Veza Logo

Veza

πŸ“Remote - United States

Summary

Join Veza as a Principal Site Reliability Engineer (SRE) and contribute to the smooth operation and performance of their critical infrastructure and services. You will be responsible for managing solutions, responding to incidents, collaborating with engineers, monitoring infrastructure, automating tasks, and implementing improvements. This role requires 10+ years of experience in SRE, 5+ years with cloud platforms and automation tools, particularly in AWS, and strong experience with Kubernetes, Terraform, Helm, Linux, AWS networking, and GitOps. You will also be involved in on-call rotations and documentation. Veza offers a competitive salary and equity packages, 401(k) retirement plan, pre-tax benefits, flexible medical, dental, and vision benefits, parental leave, flexible time off, and a home setup and monthly connectivity stipend.

Requirements

  • BS degree in Computer Science or related field
  • 10+ years of experience in Site Reliability Engineering
  • 5+ years experience working with cloud platform and cloud automation tools especially in AWS
  • Strong experience with Kubernetes, Terraform, Helm, Linux, AWS networking
  • Experience with the GitOps model for deployment
  • Experience with monitoring and alerting tools (e.g., Prometheus, Grafana)

Responsibilities

  • Managing solutions (rollouts, availability, metrics) for Cloud Prem and SAAS customers
  • Respond to and diagnose system incidents in a timely and efficient manner, minimizing downtime and impact on customers
  • Collaborate with other engineers to establish root causes and implement effective resolutions
  • Work deeply with all other areas of Engineering to assure solutions/designs pursued align with overall deployment and availability objectives
  • Proactively monitor, maintain, metric and improve the health and performance of our infrastructure and services working with the other service teams
  • Automate routine administrative tasks such as system configuration, tenant management, backups and deployments
  • Identify and implement operational/technical improvements to ensure ongoing system security, reliability and efficiency
  • Develop and implement automated solutions to streamline operational tasks and reduce manual workload
  • Participate in the on-call rotation to address critical incidents outside of regular business hours
  • Document processes for support and create, maintain and execute run-books for identified situations
  • Manage and iteratively improve Veza’s CI/CD pipeline

Preferred Qualifications

  • Bazel experience a plus
  • Understanding of software configuration best practices
  • Ability to wear multiple hats in a fast-paced environment
  • Hands-on, β€œcan do” attitude and a bias for action
  • Low ego and high intellectual curiosity
  • Demonstrated ability to manage large complex projects, drive consensus among stakeholders with strong positive outcomes for the company
  • Demonstrated ability to drive process and architectural change delivering strong positive outcomes

Benefits

  • Competitive salary and equity packages
  • 401(k) retirement plan
  • Pre-tax health care, dependent care, and commuter benefits (FSA)
  • Flexible medical, dental, and vision benefits
  • Parental leave
  • Flexible Time Off
  • Home Set up and Monthly Connectivity Stipend

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.