Summary
Join Lightspeed Retail's growing team as a Senior SRE, contributing to the stability and scalability of their POS systems infrastructure. You will be responsible for designing, implementing, and managing Kubernetes clusters, ensuring high availability and reliability. The role requires expertise in cloud platforms, CI/CD pipelines, containers, and Infrastructure as Code. You will act as a subject matter expert and incident lead, driving continuous improvement in software delivery processes. This position offers a flexible work culture, amazing benefits, and significant growth opportunities within a fast-paced, high-growth company.
Requirements
- A passion for scalability, reliability and observability and a desire to share that passion with others in a positive, solutions-oriented way
- Comfortable with leading projects which require coordination and collaboration with other development teams to reach a common goal
- A desire to quickly grow your ability to champion process changes in the pursuit of the SRE mandate
- Proven track record of driving optimization of cloud services, including, but not limited to data pipelines, storage, databases, caching layer, cores, memory, etc
- Understanding different types of SLAs/SLOs and different types of resource contracts, such as reserved instances and savings plans
- Analytical mindset: live by the metrics, deeply understand data and use it to drive technical decisions
- Good understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods and testing
- Primary ownership of customer-facing, zero-downtime production environments using the following toolsets
- Major cloud platforms (Amazon Web Services, Google Cloud Platform, Azure)
- CI/CD pipelines (CircleCI, Jenkins, Github, ArgoCD, Helm)
- Containers (Docker, Kubernetes, EKS, AKS, GKE & Linux Systems)
- Infrastructure as Code (Terraform)
- Programming or Scripting languages (Bash, Python, Ruby, Java, Golang, etc.)
- You are a problem solver who does not shy away from tackling complexity and critical thinking
- You have a strong will to learn, grow and get out of your comfort zone
- You have great energy and passion for technology
- You can express yourself flawlessly in English
- You have strong interpersonal skills
- You are a team player and a bar raiser
Responsibilities
- Being an active member of the Retail Platform team, where you will be responsible for the observability, scalability and reliability of the Retail Platform
- Designing and implementing Kubernetes clusters for various use cases, ensuring scalability, reliability, and security
- Configuring and managing Kubernetes clusters, including nodes, networking, and storage
- Performing updates to multi-platform Kubernetes clusters in critical production environments
- Act as both a subject matter expert and an incident lead during the incident response process
- Initiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
- Obsess over reliability, help teams deliver reliable software
- Adhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
- Provide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time)
Benefits
- Amazing benefits & perks, including equity for all Lightspeeders
- Constant development of both your skill-set and business acumen with limitless growth opportunities
- Lots of autonomy, flexible work culture
- Innovation time to explore and learn at work
- Shaping the company by joining cultural & technical committees
- Tons of growth opportunities into technical or people management roles
- Opportunity to join a fast-paced, high-growth company
- Opportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story
- Lightspeed equity scheme (we are all owners)
- Flexible paid time off and remote work policies
- Health insurance
- Contributions to your pension plan - RRSP
- Health and wellness benefit of $500 per year
- Paid leave and assistance for new parents
- Mental health online platform and counseling & coaching services
- Training opportunities to grow your skills and career
- Volunteer day
- Fully stacked kitchen (hot and cold beverages, meals served)
- Happy hours to build your relationships with colleagues after work
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.