Site Reliability Engineer

Aetion Logo

Aetion

📍Remote - Canada

Summary

Join Aetion's engineering team as a Site Reliability Engineer and play a critical role in scaling our cloud-based, containerized infrastructure. You will own Aetion's infrastructure, support day-to-day operations, and contribute to engineering projects focused on improving infrastructure and automation. This role requires extensive experience in systems engineering, DevOps, and software development, particularly with AWS and Kubernetes. You will collaborate with cross-functional teams, ensuring seamless experiences for end-users and stakeholders. Aetion offers a comprehensive benefits package including 25 vacation days, a daily lunch stipend, sabbatical opportunities, professional development, and comprehensive health coverage.

Requirements

  • Hold a Bachelor's Degree in Computer Science, Engineering, or a related field, or equivalent experience
  • Have 5+ years of experience in Systems Engineering, DevOps, or developing distributed systems with strong knowledge of cloud architecture, particularly AWS and Kubernetes
  • Have 5+ years of experience in software development with proficiency at least in Python or Java. Proficiency in unix based shell scripting is required
  • Have 3+ years of experience with tools and languages such as Pulumi, Terraform, Ansible, or GitHub Actions (GHA)
  • Have 5+ years of experience with cloud platforms (AWS required)
  • Have 3+ years of experience with Docker and Kubernetes
  • Have experience with proactive threat prevention, incident response, and implementing compliance programs
  • Have experience working with SQL databases, big data platforms, and supporting big data pipelines
  • Have in-depth experience solving complex issues on Linux systems and/or within the JVM
  • Demonstrate empathy for end-users and a strong service mindset when supporting day-to-day operations, ensuring a positive experience for stakeholders
  • Possess a strong understanding of cloud infrastructure design with a focus on security, reliability, and scalability
  • Have detailed knowledge of configuration, implementation, and maintenance of CI/CD pipelines and tooling (e.g., GitHub Actions or Jenkins)
  • Possess strong English language skills, both written and verbal, with the ability to communicate effectively across teams (e.g., commercial and science/analytics teams)
  • Ability to prioritize, communicate effectively, design for repeatability and scalability, exude ownership, and dig beneath the hood with technology
  • Flexibility to improve existing systems and innovate on new capabilities
  • Be collaborative, open-minded, and able to quickly grasp complex concepts to contribute to the team’s overall effectiveness

Responsibilities

  • Perform delivery and production support tasks, including monitoring, troubleshooting, and resolving infrastructure and application issues to ensure system reliability and uptime
  • Continually streamline automation and processes to improve operational maturity and efficiency
  • Provision, configure, and maintain Aetion’s infrastructure with a focus on simplicity, innovation, automation, reliability, scalability, security, cost-effectiveness, and ease of support
  • Build and maintain Aetion’s development and deployment pipelines, supporting CI/CD and long-term-stable testing and release cycles
  • Collaborate with cross-functional teams to provide timely and effective production support, ensuring a seamless experience for end-users and internal stakeholders
  • Develop automation frameworks to support other development teams and reduce manual intervention in operational tasks
  • Effectively contribute to complex engineering projects while balancing operational responsibilities

Preferred Qualifications

  • Experience with debugging, tracing, and profiling Java applications
  • Experience provisioning and operating SQL databases and big data platforms (Spark)
  • Experience in the healthcare or banking industry, or other fields where information security is a concern
  • Privacy (HIPAA, GDPR) and security (SOC 2, Hitrust) certifications
  • Experience with Google Cloud Platform
  • Experience with lean and agile ways of working

Benefits

  • 25 vacation days
  • Daily in-office lunch stipend (and a fully stocked kitchen!)
  • Sabbatical opportunity after five years of employment
  • Commitment to professional development opportunities with access to Skillsoft learning experience platform
  • Employee-led initiatives including annual company-wide innovation day & DEI resource groups
  • Comprehensive private health coverage w/ out-of-network reimbursements options
  • Peer & company recognition programs
  • Mental Health & Wellness Benefits
  • Monthly educational lunch & learn

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.