Senior Site Reliability Engineer

Logo of Lumin Digital

Lumin Digital

πŸ’΅ $170k-$200k
πŸ“Remote - United States

Job highlights

Summary

Join Lumin Digital as a Senior Site Reliability Engineer (SRE) and ensure the availability, scalability, and performance of our digital banking platform. You will leverage your deep understanding of development and operations, utilizing automation to enhance reliability. Collaborate with Software Engineers to implement best practices, ensuring Service Level Objectives (SLOs) are met. Responsibilities include developing and managing CI/CD pipelines, monitoring and troubleshooting issues, collaborating with development and security teams, and engaging in capacity planning. You will also provide performance metrics and implement monitoring and alerting strategies. This role involves a 24x7 on-call rotation.

Requirements

  • Strong problem-solving skills with an operations mindset and an ability to anticipate issues in large-scale systems
  • Proficiency with configuration management tools such as Chef, Ansible, or Puppet
  • Knowledge of standard networking protocols and components (HTTP, DNS, TCP/IP, ICMP)
  • Expertise in AWS or other cloud hosting environments, with a security-focused approach to data integrity and availability
  • Hands-on experience with containerization and orchestration technologies, including Docker and Kubernetes
  • Advanced understanding of Terraform, CI/CD architecture, and the ability to automate workflows
  • Ability to respond to incidents during off hours

Responsibilities

  • Develop and manage CI/CD pipelines, ensuring efficient deployment and system updates
  • Monitor and troubleshoot application and infrastructure issues across all environments, proactively ensuring SLOs and uptime requirements are met
  • Collaborate with development and security teams to integrate best practices and ensure system resilience
  • Engage in capacity planning and demand forecasting to anticipate performance bottlenecks and proactively scale the environment
  • Manage change and configuration, ensuring stability and consistency across deployments
  • Provide metrics to track system performance and identify areas for improvement
  • Implement monitoring and alerting strategies that promote automation, self-healing, and effective incident response
  • Participate in a 24x7 on-call rotation to support system reliability and availability
  • Perform other duties as assigned

Benefits

$170,000 - $200,000 a year

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Please let Lumin Digital know you found this job on JobsCollider. Thanks! πŸ™