Cloud Operations Engineer

Lumin Digital
Summary
Join Lumin Digital as a Cloud Operations Engineer and play a key role in supporting the Operations Center. You will manage incident response, enhance proactive monitoring, and drive process automation, focusing on improving Incident Command practices and reducing on-call toil. The ideal candidate excels at collaboration and communication while maintaining a commitment to continuous improvement. Responsibilities include performing operational tasks to ensure the reliability and availability of cloud services, triaging and resolving incidents, developing and maintaining process automation, and collaborating with cross-functional teams. This role requires a Bachelor’s degree or equivalent experience and 3–5 years of experience in cloud operations or related fields. Lumin Digital offers a collaborative and innovative work environment.
Requirements
- Bachelor’s degree or higher in a relevant field or equivalent experience required
- 3–5 years of experience in cloud operations, site reliability engineering (SRE), DevOps, or related technical roles
- Proven track record of managing and resolving production incidents in a high-availability environment
- Hands-on experience automating operational processes and workflows using scripting languages or automation tools (e.g., Python, Terraform, Ansible)
- Experience building, configuring, and maintaining monitoring and alerting systems for cloud infrastructure and applications
- Experience working in cloud-native environments, particularly AWS (preferred) or other major cloud providers (Azure, GCP)
- Familiarity with CI/CD pipelines, infrastructure as code (IaC), and version control systems (e.g., Git)
- Experience collaborating with cross-functional engineering teams to drive incident postmortems and continuous improvement efforts
- Strong problem-solving skills with a detail-oriented approach
- Exceptional written and verbal communication skills
- Proven ability to collaborate across teams to achieve shared goals
- Demonstrated cultural alignment with Lumin Digital’s values, including humility, ownership, and integrity
- Experience with monitoring platforms such as CloudWatch, Splunk, Grafana, or Azure Monitor
- Familiarity with automation and orchestration tools
Responsibilities
- Perform operational tasks to ensure the reliability and availability of Lumin Digital’s cloud services
- Triage and resolve incidents by gathering logs, identifying root causes, and implementing solutions
- Develop and maintain process automation to reduce manual tasks and increase efficiency
- Implement and manage proactive monitoring of both critical and non-critical services
- Collaborate with cross-functional teams to improve Incident Command practices and workflows
- Identify opportunities to enhance operational processes and drive improvements
- Perform other duties as assigned
Preferred Qualifications
- Ability to work effectively in a cloud environment, with AWS experience preferred
- Proficiency with tools like Atlassian or similar platforms
- Hands-on experience with AWS or other cloud providers
- Experience with process automation tools and techniques
Benefits
$100,000 - $125,000 a year
Share this job:
Similar Remote Jobs

