Summary

Join MetroStar as a Site Reliability Engineer and lead the design, implementation, and management of highly scalable and available systems. Collaborate with cross-functional teams to optimize system performance and implement monitoring and alerting strategies. Drive automation initiatives and participate in on-call rotations to ensure uninterrupted service. This role requires a Secret U.S. Government security clearance, a Bachelor's degree, and at least 3 years of experience in a similar role. Strong experience with cloud technologies, infrastructure as code, and various programming languages is essential. MetroStar offers a generous benefits package, professional growth opportunities, and a supportive work environment.

Requirements

Possess an active Secret U.S. Government security clearance or higher
Bachelor’s degree in Computer Science, Information Technology, or a related field
Minimum of 3 years of professional experience in a Site Reliability Engineering role or similar capacity
Strong experience with cloud technologies (e.g., AWS, Azure, GCP) and infrastructure as code (e.g., Terraform, Ansible)
Proficiency in managing, leading, and engineering incident and outage response
Strong engineering experience in network protocols (e.g., TCP/IP, DNS, HTTP/HTTPS, Load Balancing, etc.)
Proficiency in programming and scripting languages (e.g., Python, Go, Bash) and RPA (e.g. Blue Prism, UIPath) to automate tasks and develop tools
Deep understanding of containerization and orchestration technologies (e.g., Kubernetes, Docker)
Expertise in implementing and managing monitoring and logging solutions (e.g., Splunk, Prometheus, Grafana, ELK stack)
Familiarity with CI/CD pipeline development and management (e.g., GitLab CI, Azure DevOps, AWS Lambda, Jenkins)
Proven track record of designing, building, and maintaining highly available and scalable systems
Expert proficiency in developing automated functional, regression and performance tests and developing automated testing standards for development teams
Experience facilitating change and configuration management processes to drive reliability
Strong problem-solving skills, with the ability to diagnose complex issues and implement effective solutions
Excellent communication skills, with the ability to collaborate effectively across diverse teams

Responsibilities

Collaborate with cross-functional teams to identify performance bottlenecks, troubleshoot complex issues, and optimize system performance to meet defined service level objectives
Design and implement monitoring, alerting, and incident response strategies to proactively identify and mitigate potential issues, ensuring uninterrupted service availability
Drive automation initiatives to streamline deployment, configuration management, and infrastructure provisioning processes
Develop and maintain comprehensive documentation for system configurations, processes, and procedures
Participate in on-call rotations and respond to incidents, working diligently to resolve issues and prevent recurrence

Benefits

Generous benefits package
Professional growth
Valuable time to recharge

Site Reliability Engineer

MetroStar

Summary

Requirements

Responsibilities

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

theScore

Remote

DevOps

Mid-level

theScore

Remote

DevOps

Mid-level

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Senior

OLX

Remote

DevOps

Mid-level