Staff Site Reliability Engineer

Thinkific
Summary
Join Thinkific as a Staff Site Reliability Engineer (SRE) and contribute to the performance, reliability, and security of our systems. Collaborate with engineers and stakeholders to improve our platform. As a core contributor, you will own a technical domain, contribute to project planning and execution, define system requirements, and champion operational excellence. You will also mentor other engineers, participate in on-call rotations, and promote a culture of continuous improvement. This role requires 6+ years of experience in software or infrastructure engineering, including experience with production services, infrastructure as code, and cloud-native development. Thinkific offers a competitive compensation package, flexible paid time off, comprehensive health benefits, flexible work options, and professional development opportunities.
Requirements
- Has 6+ years of experience in the software or infrastructure engineering profession, including time spent in a reliability or platform-focused role
- Has experience owning services in production, and feels comfortable with infrastructure as code, container orchestration, and cloud-native development practices
- Understands the operational needs of complex distributed systems and has experience with monitoring, observability, incident management, and system hardening
- Writes infrastructure code in tools like Terraform with an eye toward security, modularity, and collaboration
- Has experience with languages like Ruby, Python, or Bash, and is proficient in working with relational and non-relational databases such as Postgres or AWS Aurora
- Can identify root causes of complex issues across multiple systems and work with stakeholders to develop resilient solutions
- Has experience with queueing systems like SNS, SQS, or Sidekiq and understands patterns for asynchronous processing and fault tolerance
- Enjoys collaborating across teams, sharing knowledge, and helping shape the team’s technical roadmap
- Is a thoughtful communicator who proactively shares context, feedback, and plans with their team and stakeholders
- Brings a continuous improvement mindset by seeking out opportunities to streamline workflows, reduce toil, and enable team success
- Loves to learn and grow. They’ve found (and keep looking for) ways to level up their skills in this field, whether that’s through formal education, gaining professional experience, or maybe even building their own business
Responsibilities
- Own a technical domain within our system and be accountable for operations and SLOs related to performance, reliability, and security, as well as architectural evolution and technical documentation aligned with broader strategy
- Contribute to the planning and execution of technical projects within and across your team, helping ensure that initiatives are well-scoped, aligned with organizational priorities, and effectively delivered
- Partner with product managers, designers, and other engineers to define system requirements, propose implementation strategies, and make tradeoffs visible
- Champion operational excellence, observability, and incident response across your team and adjacent services
- Write high-quality, maintainable, and efficient code with a focus on long-term scalability and performance
- Share your expertise by mentoring other engineers, supporting code reviews, and guiding others through architectural and debugging challenges
- Promote a culture of continuous improvement by encouraging experimentation, learning from failure, and driving engineering best practices in reliability, performance, and software quality
- Participate in our on-call rotation and incident response processes to help maintain a high level of service reliability
Preferred Qualifications
- Experience working with AWS services and infrastructure at scale
- Knowledge of networking fundamentals and related cloud services such as Cloudflare, load balancing, and TLS
Benefits
- A competitive compensation package including base salary, equity, team-wide bonuses, and an Employee Share Purchase Plan
- Flexible Paid Time Off to maintain mental and physical health. Our team is encouraged to take a minimum 4 weeks of vacation, plus Thinker Holidays (extended long weekends in the summer) and time off for the December holiday season
- Health Benefits and Wellness: Comprehensive benefits starting on Day 1 include health, vision, and dental coverage for you and your family, $3,000 for mental health care, a short-term health plan, and an additional health or personal spending account. Plus, family friendly benefits include generous parental leave top-ups for up to 32 weeks, as well as fertility coverage and personalized return to work options
- Flexible Work . Choose to work from home from anywhere in Canada, at our Vancouver HQ, a co-working space, or anywhere there’s wifi for a change of scenery
- Learning & Growth. An annual $1500 USD Learn and Grow fund for conferences, seminars, or courses, plus training, mentorship, coaching, and internal promotion opportunities
- A home office setup so you’re ready to succeed with a company-owned Macbook Pro and a budget to order a desk, chair, or any accessories to help you work comfortably and productively