Summary
Join Encora as a Senior SRE Azure and be responsible for designing, developing, and maintaining high-quality software solutions. You will collaborate with cross-functional teams, lead technical projects, mentor junior engineers, and continuously improve software development practices. This remote position, based in Peru, Colombia, Costa Rica, or Bolivia, requires 5+ years of IT experience and 3+ years in a Site Reliability Engineering role. Proficiency in Azure Cloud, DevOps, and various tools is essential. The role involves system reliability, performance monitoring, automation, incident management, and capacity planning. Encora is an equal opportunity employer.
Requirements
- Bachelorβs degree in Computer Science, Engineering, or a related field, or equivalent work experience
- 5+ years of experience in IT
- +3 years of experience a Site Reliability Engineering, DevOps, or similar role
- Proficiency in SRE practices
- Proficiency in Terraform IaC
- Proficiency in DevOps process and SDLC
- Proficiency in Azure Cloud
- Strong understanding of containerization and orchestration technologies like Docker and Kubernetes
- Knowledge of configuration management tools such as Ansible, Puppet, or Chef
- Familiarity with CI/CD tools and processes
- Experience working with GitHub, using GitHub Flow & GitHub Actions
- Experience with monitoring and observability tools like Prometheus, Grafana, Datadog, or similar
- Strong knowledge of networking concepts and protocols
- Excellent problem-solving and troubleshooting skills
- Critical Thinking and out-of-the-box thinking
- Strong communication and collaboration skills
- Ability to work in a fast-paced, dynamic environment
- Detail-oriented with a focus on quality and reliability
- Proactivity and collaboration with multiple teams
Responsibilities
- Design, implement, and maintain systems that are highly available, resilient, and scalable
- Define and implement SLIs to seek uptime and ensure SRE team success
- Develop and implement monitoring, alerting, and incident response strategies to ensure system health and performance
- Automate repetitive tasks to improve efficiency and reduce manual intervention using tools and scripts
- Respond to and resolve incidents, conducting post-mortem analyses to identify and address root causes
- Work closely with development teams to ensure that applications are designed with reliability and scalability in mind
- Perform capacity planning and demand forecasting to ensure systems can handle future growth
- Create and maintain detailed documentation of system architecture, processes, and procedures, consolidating a strong knowledge base
- Continuously seek ways to improve system reliability, performance, and overall efficiency. Measure constantly all the process, the team, seek for gaps and help the customer to achieve success through refining
Preferred Qualifications
- Knowledge of SRE best practices and methodologies
- Experience working with Support Service and shifts
- Familiarity with security best practices in a cloud environment. Knowledge of Zero Trust Security
- Knowledge of Microsoft Cloud Adoption Framework
Benefits
Remote work
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.