Senior Site Reliability Engineer

Lumin Digital
Summary
Join Lumin Digital as a Senior Site Reliability Engineer (SRE) and ensure the reliability, availability, and scalability of our applications. You will focus on automation, maintaining Service Level Objectives (SLOs), and collaborating with Software Engineers. The ideal candidate excels at problem-solving, automation, and building resilient systems. This role involves designing, implementing, and managing CI/CD pipelines, monitoring and resolving issues, and collaborating on feature design and deployment. You will also participate in capacity planning, perform change management, generate reports, and engage in SRE scrum team activities. The position requires a strong understanding of networking, cloud platforms (AWS preferred), and experience with various technologies like Terraform, Kubernetes, and Docker.
Requirements
- Exceptional full-stack troubleshooting skills, with a focus on resolving system-level issues
- Expertise in at least one configuration management system (e.g., Chef, Ansible, Puppet)
- Strong understanding of networking protocols and components such as HTTP, DNS, TCP/IP, and Load Balancing
- Experience with cloud hosting platforms, with AWS preferred (Google Cloud and Azure also valued)
- Hands-on experience with Terraform, Kubernetes, and containerization technologies like Docker
- Solid understanding of CI/CD workflows and the ability to architect robust pipelines
- Familiarity with monitoring and alerting strategies, including self-healing and escalation processes
- Commitment to improving on-call experiences by creating resilient and automated systems
- Strong problem-solving skills with a focus on automation and operational efficiency
- Security mindset with a focus on protecting data integrity and resilience
- Excellent written and verbal communication skills
- Proven ability to work within an agile scrum team
- Ability to participate in a 24x7 on-call rotation
- 2+ years of experience as a software engineer, with C#, Angular, or JavaScript preferred
- Bachelorβs degree or higher in Computer Science, or equivalent experience required
Responsibilities
- Design, implement, and manage CI/CD pipelines to improve deployment efficiency
- Monitor and resolve issues in all environments, ensuring SLO and uptime targets are consistently met
- Collaborate with Software Engineers to address SRE concerns during feature design and deployment
- Participate in capacity planning and demand forecasting to proactively address performance bottlenecks and scalability needs
- Perform change management to maintain system stability and minimize disruptions
- Generate uptime and SLO reports for internal review and leadership visibility
- Engage in SRE scrum team activities to drive agile development processes
- Ensure security best practices are followed, safeguarding data integrity and system resilience
- Perform other duties as assigned
Preferred Qualifications
- AWS certifications such as SysOps or Solutions Architect (preferred but not essential)
- Experience with Amazon RDS, EKS, and CloudWatch
Share this job:
Similar Remote Jobs
