Senior Site Reliability Engineer
Nylas
Job highlights
Summary
Join Nylas' Site Reliability Engineering (SRE) team and ensure the reliable and efficient operation of our products, serving billions of API calls daily. You will support the engineering team, maintain and scale infrastructure across AWS and GCP, configure alerts and dashboards, manage CI/CD pipelines, and participate in on-call rotations. This role requires extensive experience in production engineering, Linux, cloud services, and automation. Nylas offers excellent benefits, including extended healthcare, unlimited PTO, RRSP contributions, an education stipend, cell phone reimbursement, and fully paid parental leave.
Requirements
- Experience: Minimum of 5 years in production engineering, with hands-on experience in managing and scaling Linux-based production servers
- Communication and Empathy: Exceptional communication skills and a strong empathetic approach, understanding that effective teamwork and problem-solving require more than just technical skills
- Linux Proficiency: Advanced proficiency in navigating the Linux command line
- Logging and Observability: Demonstrated experience with platforms like New Relic, Coralogix, Grafana, and Prometheus
- Configuration Management: Experience in automating systems using modern tools such as Chef, Ansible, or Puppet
- Containerization and Orchestration: Proven track record of deploying and managing services using Kubernetes and Docker
- Cloud Services: Practical experience with major cloud services like AWS, GCP, or Azure, focusing on deploying and maintaining scalable applications
- Programming Skills: Capability to write reliable code in at least one programming language such as Python, GoLang, or JavaScript
- Learning Agility: Ability to rapidly learn and adapt to new technologies and frameworks
- Automation and Infrastructure: Passion for building modern, scalable infrastructure and automating routine tasks to improve efficiency and reliability
Responsibilities
- Support our engineering team with best practices and provisioning new infrastructure as necessary
- Maintain and scale a legacy system in AWS with Ansible, Python, MySQL, Terraform
- Maintain our new Infrastructure in GCP with Kubernetes, Helm, ArgoCD, Terraform, GoLang, OpenSearch, Spanner, Redis
- Configuring and adjusting alerts and dashboards in NewRelic and Coralogix. Leveraging Fluent-Bit and OpenTelemetry
- Managing and improving our CI/CD pipelines using ArgoCD and Helm
- Take part in an on-call rotation and assist in debugging and resolving incidents
Preferred Qualifications
Candidates with expertise in tuning alerts, synthetics, and creating comprehensive health dashboards and reports will be preferred
Benefits
- Healthcare: Extended healthcare coverage for you and your family
- Unlimited Paid Time Off (PTO): We take this very seriously as we care about the well-being of our employees
- RRSP with 3% employer contribution
- Education Stipend: $1,250 annual education & development benefit
- Cell Phone: $60 per month stipend towards cell phone reimbursement
- Fully Paid Parental Leave: 12 weeks parental leave (maternity & paternity)
Share this job:
Similar Remote Jobs
- π°$60k-$120kπAsia
- π°$177k-$213kπUnited States
- π°$167k-$201kπUnited States
- Nπ°$68k-$98kπWorldwide
- π°$154k-$258kπWorldwide
- π°$140k-$160kπUnited States
- πUnited States
- πUnited Kingdom
- π°$122k-$129kπCanada