Site Reliability Developer

WatchGuard Technologies Logo

WatchGuard Technologies

πŸ“Remote - Spain

Summary

Join WatchGuard's Site Reliability Engineering (SRE) team and contribute to the reliability and security of our production cloud environments. As an SRE, you will collaborate with development teams, ensuring smooth production operations and leading large-scale event responses. You will define operational and security policies and guide development teams in establishing and monitoring service level agreements. A typical day involves working with application teams in AWS, Azure, and hybrid cloud environments, driving operational excellence, championing security best practices, and participating in on-call rotations to troubleshoot production issues. You will leverage your programming skills for automation and debugging, and share your knowledge through documentation and presentations. WatchGuard offers a flexible work philosophy, allowing for a blend of office and remote work.

Requirements

  • You are a customer-focused, data-driven developer who has a passion for delivering the best customer experience possible
  • You enjoy the thrill of coordinating and troubleshooting production issues and want to proactively find and fix issues
  • You have an understanding of cloud technologies, automation, everything-as-code, networking, microservice architectures, object-oriented design, SRE and DevOps cultures, proficiency in Python, Java, or Go programming and a desire to learn others
  • You come with proven knowledge of software engineering best practices for the full software development lifecycle including coding standards, code reviews, security, source control management, build processes, automated testing, deployment, monitoring, chaos engineering, and automated self-healing operations
  • As well as knowledge of tools and technologies like CloudFormation, Terraform, New Relic, Lambda, Serverless, Elasticsearch, Docker, Kubernetes, Spark, Flink, Jenkins, GitHub, Artifactory, Jira, etc
  • You are able to lead production incident response and postmortems through your strong analytical and problem-solving abilities as well as verbal and written communication skills

Responsibilities

  • Ensuring smooth production operations with development teams and leading large-scale event response
  • Defining operational and security policies, standards, and processes for our development teams to follow
  • Guiding our development teams through the process of establishing, monitoring, and achieving their service level agreements through the definition of service level indicators and objectives
  • Working side-by-side with our application teams in production AWS, Azure, and hybrid cloud environments to ensure proper monitoring, security, reliability, automation, and support are in place
  • Driving an operational excellence culture throughout WatchGuard with the simplification, automation, analysis, and evolution of our activities and processes
  • Championing security and operational best practices to become known as a cloud expert by the rest of our development teams located across the globe
  • Striving to provide the best possible customer experience even when things go wrong by participating in our on-call rotation and then coordinating and leading the production troubleshooting efforts
  • Using your programming skills to develop automation or assist with debugging and fixing complex production issues
  • Being curious, learning new things, and then sharing your knowledge through documentation, presentations, and guidance to other teams

Benefits

  • Parental leave
  • Family care resources
  • Flexible work arrangements

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs