Summary

Join Milk Moovement as a Software Reliability Engineer (SRE) and contribute to the smooth operation of our dairy industry platform. You will proactively monitor and resolve platform issues, implement monitoring solutions, investigate performance anomalies, and refine our incident response process. This critical role ensures high system availability and performance through collaboration with various teams. The ideal candidate possesses at least 3 years of SRE or DevOps experience with a focus on reliability, along with expertise in log aggregation, cloud-deployed applications, and incident management platforms. Milk Moovement offers a remote work environment, flexible hours, and unique perks.

Requirements

Strong experience with log aggregation and monitoring solutions. (Datadog, Splunk, ELK)
Experience working with monitoring cloud deployed applications. (AWS, GCP, Azure)
Familiarity with configuring incident management platforms. (Squadcast, PagerDuty)
Experience using IaC for deployment and management. (Terraform, CloudFormation, CDK)
Proficiency in JavaScript or Python for automation and debugging
Extensive experience in troubleshooting & triaging performance issues and incidents
At least 3 years prior SRE or DevOps experience, with a focus on the reliability side

Responsibilities

Implement and maintain monitoring solutions using Datadog, focusing on proactive detection and resolution of platform issues
Develop alerting mechanisms that trigger based on symptoms rather than just outages, ensuring early detection of problems
Analyze system metrics, logs, and performance data to identify trends and potential reliability concerns
Lead incident response efforts, including triaging, troubleshooting, and post-mortem analysis for continuous improvement
Manage and optimize logging and monitoring infrastructure to ensure observability across all services
Work closely with development teams to ensure features are deployed with minimal impact on platform reliability
Participate in on-call rotations and incident management workflows, ensuring rapid issue response and resolution
Assist in cloud engineering tasks where necessary, particularly in reliability-focused automation and infrastructure improvements

Preferred Qualifications

Datadog certification or extensive experience configuring and tuning monitoring solutions
Related AWS certifications or ample experience administering AWS environments
Proficiency building internal tooling and APIs leveraging serverless infrastructure (Lambda)
Experience working with container-based services. (Docker, ECS, Kubernetes)
Working knowledge of both SQL and NoSQL databases, including troubleshooting and performance tuning. (MongoDB, PostgreSQL, DynamoDB)
Familiarity with CI/CD processes and automation frameworks

Benefits

Remote work environment - work from home or from one of our hubs in Halifax and St. John’s
Flexible hours - night owl or early riser? No problem
Tools - need the latest and great software to perform more efficiently? Ask and you shall receive
Quarterly guest speakers - from shark trainers and graffiti artists to astronomers and sandwich aficionados. The more unique, the better

Software Reliability Engineer

Milk Moovement

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Software Development

Mid-level

Share this job:

Similar Remote Jobs

Remote

Software Development

Mid-level

Remote

Software Development

Mid-level

Affirm

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

Software Development

Senior

Canva

Remote

DevOps

Mid-level

Remote

Software Development

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior