Senior Site Reliability Engineer at Cordial

Summary

Join Cordial as a Site Reliability Engineer to monitor, develop, and scale the platform, ensuring a delightful client experience. You will collaborate with DevOps and Product teams to optimize performance, identify and resolve issues, and implement comprehensive monitoring. Responsibilities include administering and troubleshooting application and network components, designing and deploying Kubernetes manifests, contributing to infrastructure design, debugging code, providing production support, and participating in on-call rotations. The ideal candidate possesses extensive experience in Unix/Linux systems, AWS, Kubernetes, and various monitoring tools. Cordial offers a competitive salary, equity, bonus, robust benefits, and perks such as wellness stipends and education reimbursements.

Requirements

5+ years UNIX/Linux Systems (Unix/Linux) & Network Administration (DNS, IPsec, VPN, Load Balancing, process tracing)
Experience with AWS (we use EC2, EKS)
Experience deploying and/or maintaining Kubernetes/EKS clusters
Hands on experience writing & maintaining custom Helm charts
Experience working with one or more service meshes (app-mesh, Istio, Linkerd)
Experience with monitoring, logging and alerting tools
Previous positions held as a SRE and/or DevOps role
Development experience in PHP
Extensive experience with Docker/containers & Kubernetes
Experience with Hashicorp products such as Consul and Vault
Comfortable working in a globally distributed team across time zones
Strong teamwork and communication skills
A genuine desire to learn new technologies and grow
Fluent in verbal and written English
Experience with large-scale distributed systems
Proficiency in infrastructure as code (IaC) tools (e.g., Terraform, CloudFormation)
Understanding of observability principles and tools (e.g., Prometheus, Grafana, ELK stack, distributed tracing)
Familiarity with CI/CD pipelines (e.g., Jenkins, GitLab CI, ArgoCD)
A strong grasp of networking fundamentals
Security best practices in a cloud environment

Responsibilities

Utilize your knowledge of Web, App, Network, Server, Storage and Security technologies to administer, monitor and troubleshoot application and network components in our cloud based environment. (We are AWS hosted and make extensive use of Kubernetes, Consul, and Vault clusters)
Help design, author, deploy, and monitor manifests for our multiple Kubernetes clusters, helm charts/repos, and service mesh configurations
Actively contribute to platform Infrastructure Design and Implementation discussions
Use your software engineering skills to trace/debug code and identify root causes of production data corruption and/or performance issues
Provide production support for the Product Development teams
Participate in an on-call rotation
Work with the team to develop and deploy monitoring and alerting architecture, and implement monitoring/logging solutions
Troubleshoot complex issues in a timely manner as necessary to maintain the performance and stability of our Production Application environment
Help build out SLOs and document and monitor SLAs

Benefits

$135,000.00-$170,000.00 annually
Equity and bonus
Robust benefit plan (medical/dental/vision/life)
401k match
Flexible time off
Monthly wellness and cell phone stipends
Childcare and continued education yearly reimbursements

Senior Site Reliability Engineer

Cordial

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

DevOps

Senior

Trase

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior