Senior Site Reliability Engineer at Lightspeed

Summary

Join our NuOrder by Lightspeed team as a Staff Site Reliability Engineer and contribute to building software solutions that help merchants grow their business. You will be part of a team responsible for supporting cross-cutting concerns, such as cloud infrastructure, reliability, and incident management, and support our growing Dev teams with the infrastructure and tools needed to scale.

Requirements

Bachelor’s degree in Computer Science, Engineering, or equivalent real-world experience
6+ years of experience in site reliability engineering, systems administration, and/or software engineering
Expertise in container orchestration platforms, specifically Kubernetes
Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis)
Familiarity with network protocols and IP networking, along with experience in network troubleshooting
Proficiency in at least one programming language such as Bash, Python, Go, etc
Proven track record of managing large-scale infrastructure in cloud environments like Google Cloud, AWS, or Azure
Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack)
Strong understanding of security best practices
Excellent problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues
Excellent communication skills for effective collaboration with cross-functional teams

Responsibilities

Design, build, and maintain robust infrastructure on GCP, leveraging cloud-native technologies such as GKE, Cloud SQL, BigQuery, etc
Develop and manage CI/CD pipelines for efficient deployment and release using various technologies (GitLab, GitHub, Helm, Terraform, etc.)
Work closely with development teams to provide tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets
Build platform solutions and apply software engineering principles to improve software reliability and accelerate delivery
Support the incident management process and conduct post-mortem analysis to prevent future outages
Mentor junior SREs and developers, offering guidance on best practices in cloud architecture, data management, and software development
Manage infrastructure changes through infrastructure as code (IaC) using Terraform
Participate in the on-call rotation
Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices to improve product quality and team efficiency

Benefits

Work in a talented global team with strong role growth opportunities
Flexible Working policy
Lightspeed share scheme (we are all owners)
Company pension program
Private medical insurance
Health and wellness benefit
Mental health online platform and counseling & coaching services
Paid leave and assistance for new parents
Language classes & LinkedIn Learning license
Volunteer day

Senior Site Reliability Engineer

Lightspeed

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

ServiceNow

Remote

DevOps

Senior

Remote

DevOps

Senior

Playson

Remote

DevOps

Senior