Lightspeed is hiring a
Senior Site Reliability Engineer

Logo of Lightspeed

Lightspeed

πŸ’΅ ~$150k-$222k
πŸ“Remote - United Kingdom

Summary

Join our NuOrder by Lightspeed team as a Staff Site Reliability Engineer and contribute to building software solutions that help merchants grow their business. You will be part of a team responsible for supporting cross-cutting concerns, such as cloud infrastructure, reliability, and incident management, and support our growing Dev teams with the infrastructure and tools needed to scale.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or equivalent real-world experience
  • 6+ years of experience in site reliability engineering, systems administration, and/or software engineering
  • Expertise in container orchestration platforms, specifically Kubernetes
  • Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis)
  • Familiarity with network protocols and IP networking, along with experience in network troubleshooting
  • Proficiency in at least one programming language such as Bash, Python, Go, etc
  • Proven track record of managing large-scale infrastructure in cloud environments like Google Cloud, AWS, or Azure
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack)
  • Strong understanding of security best practices
  • Excellent problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues
  • Excellent communication skills for effective collaboration with cross-functional teams

Responsibilities

  • Design, build, and maintain robust infrastructure on GCP, leveraging cloud-native technologies such as GKE, Cloud SQL, BigQuery, etc
  • Develop and manage CI/CD pipelines for efficient deployment and release using various technologies (GitLab, GitHub, Helm, Terraform, etc.)
  • Work closely with development teams to provide tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets
  • Build platform solutions and apply software engineering principles to improve software reliability and accelerate delivery
  • Support the incident management process and conduct post-mortem analysis to prevent future outages
  • Mentor junior SREs and developers, offering guidance on best practices in cloud architecture, data management, and software development
  • Manage infrastructure changes through infrastructure as code (IaC) using Terraform
  • Participate in the on-call rotation
  • Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices to improve product quality and team efficiency

Benefits

  • Work in a talented global team with strong role growth opportunities
  • Flexible Working policy
  • Lightspeed share scheme (we are all owners)
  • Company pension program
  • Private medical insurance
  • Health and wellness benefit
  • Mental health online platform and counseling & coaching services
  • Paid leave and assistance for new parents
  • Language classes & LinkedIn Learning license
  • Volunteer day

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Jobs

Please let Lightspeed know you found this job on JobsCollider. Thanks! πŸ™