Cloud Site Reliability Engineer
Smile Digital Health
Summary
Join Smile Digital Health's Cloud Hosting Services team as a Cloud SRE, supporting the building, operating, and automating of infrastructure services for SaaS-based solutions on Azure/AWS. You will bridge development and operations, applying a software engineering mindset to system administration. Responsibilities include collaborating with security teams, developing multi-tenant approaches, cost tracking, documentation, and maintaining relationships with cloud providers. You will also ensure SLAs are met, create automation tools, participate in on-call rotations, and provide customer support. Approximately 50% of your time will be spent on deployment and infrastructure, with the remaining time allocated to patching, documentation, customer interaction, and solution development. Smile Digital Health offers a remote work environment, flexible time off, competitive salary and benefits, and various professional development opportunities.
Requirements
- Demonstrated expertise of cloud service providers and best practices around implementation and configuration, preferably managing Azure on behalf of multiple teams for a company that delivers SaaS products
- Experience with Kubernetes, Openshift, Kafka, Elastic stack
- Proven experience with Security and Compliance (SOC2, HIPAA, ISO27001) best practices and how to implement controls that support high-velocity software delivery teams
- Proficiency in Terraform, Ansible or Chef
- Expertise in troubleshooting support escalation, on-Call process optimization and documenting knowledge
- Passionate about Infrastructure as code, automation, and developing solutions that help developers move quickly and safely
- Familiarity with infrastructure management and operations lifecycle concepts and ecosystem
- Experience operating and maintaining production systems in a Linux and public cloud environment
- You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels
- Working knowledge of industry best practices with regard to information security
- Previous experience building or maintaining a large scale Cloud service
- Proven ability to prioritize and track multiple projects in parallel
- Proven ability to be highly responsive and customer-focused
Responsibilities
- Collaborate with our Security Operations teams to help define and implement best practices around Cloud Service Provider configuration for AWS, Azure and other cloud providers
- Develop, implement and coordinate a multi-tenant approach around service offerings for DB, Container platform, Authentication, Certificates, and Product Registries etc
- Develop and maintain cost/utilization tracking and attribution processes for all Cloud Service Providers
- Create documentation around Cloud Service Provider offerings detailing use cases, best practices, and implementation details
- Develop and maintain technical relationships with our core Cloud Service Providers
- Implement and maintain a secure and scalable infrastructure platform for delivering Cloud Services applications
- Ensure that internal and external SLAβs meet and exceed expectations, and ensure that system centric KPIs are continuously monitored and improved
- Create tools for automating deployment, monitoring and operations of the overall platform
- Participate in an on-call rotation to provide application support, incident management, and troubleshooting
- Provide ongoing maintenance and support of internal tools, improve system health and reliability
- Assist customers with the On-premise deployments when needed
- Ongoing compliance with organizational policies, procedures and practices (such as but not limited to security policies) are an ongoing requirement of the employment or contractual agreement
- Comply with the privacy, security and confidentiality policies
Benefits
- Remote Work Environment
- Flexible Time Away From Work Policy including PTO, Personal and Sick Days
- Competitive Salary and Health/Medical Benefits
- RRSP/TFSA/401K Employee Contribution
- Life and Disability
- Employee Assistance Program
- FHIR Study Program and Skillsoft Learning
- Super HAPI Fun Club