Remote Senior Site Reliability Engineer at Tech Holding

Summary

Tech Holding is seeking a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of critical infrastructure and applications. The role involves collaboration with various teams, incident management, defining SLAs, automation, and mentorship. Required skills include 5-8 years of SRE experience, proficiency with GCP, monitoring tools, incident management best practices, alerting tools, scripting languages, communication skills, problem-solving skills, and a passion for building reliable systems.

Requirements

5-8 years of experience as a Site Reliability Engineer (SRE) or related role
Experience with cloud platform GCP
Proven experience with monitoring tools like Prometheus and Grafana
Strong understanding of incident management best practices
Experience with alerting tools like PagerDuty
Experience with scripting languages like Python or Bash for automation
Excellent communication and collaboration skills
Ability to work independently and as part of a team
Strong problem-solving and analytical skills
Passion for building reliable and scalable systems

Responsibilities

Ensure the reliability, scalability, and performance of critical infrastructure and applications
Partner with development teams to implement best practices for building reliable and scalable systems
Stay up-to-date on the latest SRE trends and technologies
Design, implement, and maintain robust monitoring solutions using tools like Prometheus and Grafana
Develop and configure alerts within tools like PagerDuty to ensure timely notification of potential issues
Analyze and troubleshoot issues using collected application and infrastructure metrics
Lead incident response, ensuring timely resolution and minimizing downtime
Document and communicate incident details effectively to stakeholders
Conduct post-incident reviews to identify root causes and implement preventative measures
Collaborate with product and engineering teams to define clear and measurable SLAs for SaaS offerings
Establish Service Level Objectives (SLOs) for key metrics based on SLA requirements
Define Service Level Indicators (SLIs) to track progress towards achieving SLOs
Monitor SLO compliance and proactively identify potential SLA breaches
Identify opportunities for automation to improve efficiency and reliability
Develop and implement automation scripts using tools like Python or Bash
Automate routine tasks and incident response workflows
Act as a liaison between SRE, Product, Security, Application Engineering, and Customer Operations teams
Facilitate communication and information sharing across teams to ensure smooth operations
Work collaboratively to define and implement solutions that meet the needs of all stakeholders

Preferred Qualifications

Experience with container orchestration platforms like Kubernetes
Experience with chaos engineering principles
Experience with configuration management tools like Ansible or Chef

Benefits

Remote Work Opportunities
Flexible Work Hours

Tech Holding is hiring a Senior Site Reliability Engineer

Tech Holding

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Similar Jobs

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Senior Site Reliability Engineering Engineer

Binance

Remote

DevOps

Senior

Site Reliability Engineer Senior

Granicus

Remote

DevOps

Senior

Senior Site Reliability Engineer

Gemini

Remote

DevOps

Mid-level

Senior Site Reliability Engineer

Dayshape

Remote

DevOps

Senior

Senior Site Reliability Engineer

MasteryPrep

Remote

DevOps

Senior

Senior Site Reliability Engineer

Tyk

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

Senior Site Reliability Engineer

MongoDB

Remote

DevOps

Senior

Tech Holding is hiring a
Senior Site Reliability Engineer