Site Reliability Engineer at Axiom Software Solutions Limited

Summary

Join us as a Site Reliability Engineer (Ex - Fidelity Exp) in a remote contract position. Design, implement, and manage Kubernetes environments, building and maintaining scalable infrastructure using infrastructure as code. Develop comprehensive monitoring solutions, analyze system performance, and implement improvements. Implement and maintain CI/CD pipelines, conduct incident response and root cause analysis, and create automation tools leveraging AI/ML. Collaborate with development teams to enhance application reliability and performance. This role requires strong expertise in Kubernetes, Linux/Unix, and database administration, along with programming skills in Python, Go, Java, or Node.js.

Requirements

5-7 years of experience in SRE or DevOps roles
Strong expertise with Kubernetes ecosystem and container orchestration
Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.)
Experience with log analysis, monitoring systems, and observability tools
Proficiency in database administration and performance tuning (Oracle, SQL Server)
Strong programming skills in at least one of: Python, Go, Java, or Node.js
Experience developing automation tools and frameworks
Proven track record of proactive problem identification and resolution

Responsibilities

Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting
Build and maintain scalable and reliable infrastructure using infrastructure as code principles
Develop comprehensive monitoring solutions and implement alerting strategies
Analyze system performance bottlenecks and implement improvements
Implement and maintain CI/CD pipelines for seamless deployments
Conduct incident response, root cause analysis, and implement preventative measures
Create and enhance automation tools leveraging AI/ML where applicable
Collaborate with development teams to improve application reliability and performance

Preferred Qualifications

Experience with AI/ML integration into operational workflows
Cloud platform experience (AWS, GCP, Azure)
Knowledge of service mesh technologies
Experience with distributed systems architecture
Familiarity with security best practices and compliance requirements

Site Reliability Engineer

Axiom Software Solutions Limited

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior