Remote Senior Site Reliability Engineer
closedLightspeed
πRemote - United Kingdom
Job highlights
Summary
Join our NuOrder by Lightspeed team as a Staff Site Reliability Engineer and contribute to building software solutions that help merchants grow their business. You will be part of a team responsible for supporting cross-cutting concerns, such as cloud infrastructure, reliability, and incident management, and support our growing Dev teams with the infrastructure and tools needed to scale.
Requirements
- Bachelorβs degree in Computer Science, Engineering, or equivalent real-world experience
- 6+ years of experience in site reliability engineering, systems administration, and/or software engineering
- Expertise in container orchestration platforms, specifically Kubernetes
- Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis)
- Familiarity with network protocols and IP networking, along with experience in network troubleshooting
- Proficiency in at least one programming language such as Bash, Python, Go, etc
- Proven track record of managing large-scale infrastructure in cloud environments like Google Cloud, AWS, or Azure
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack)
- Strong understanding of security best practices
- Excellent problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues
- Excellent communication skills for effective collaboration with cross-functional teams
Responsibilities
- Design, build, and maintain robust infrastructure on GCP, leveraging cloud-native technologies such as GKE, Cloud SQL, BigQuery, etc
- Develop and manage CI/CD pipelines for efficient deployment and release using various technologies (GitLab, GitHub, Helm, Terraform, etc.)
- Work closely with development teams to provide tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets
- Build platform solutions and apply software engineering principles to improve software reliability and accelerate delivery
- Support the incident management process and conduct post-mortem analysis to prevent future outages
- Mentor junior SREs and developers, offering guidance on best practices in cloud architecture, data management, and software development
- Manage infrastructure changes through infrastructure as code (IaC) using Terraform
- Participate in the on-call rotation
- Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices to improve product quality and team efficiency
Benefits
- Work in a talented global team with strong role growth opportunities
- Flexible Working policy
- Lightspeed share scheme (we are all owners)
- Company pension program
- Private medical insurance
- Health and wellness benefit
- Mental health online platform and counseling & coaching services
- Paid leave and assistance for new parents
- Language classes & LinkedIn Learning license
- Volunteer day
This job is filled or no longer available
Similar Remote Jobs
- π°$60k-$120kπAsia
- π°$177k-$213kπUnited States
- πUnited Kingdom
- πUnited States
- πCanada
- πPoland
- π°$167k-$201kπUnited States
- Nπ°$68k-$98kπWorldwide
- π°$125k-$150kπCanada
- π°$154k-$258kπWorldwide