Summary

Join our dynamic engineering team as a Site Reliability Engineer (SRE) supporting critical application deployments in a global environment. Leverage your expertise in cloud provisioning, infrastructure as code, and container orchestration to ensure service reliability, scalability, and performance. Collaborate with development teams to design and implement robust infrastructure solutions using AWS, Azure, and containerized technologies. This role requires a self-starter with a thoughtful and strategic approach to problem-solving. You will be responsible for managing cloud infrastructure, implementing IaC, and ensuring the reliability of containerized applications. Mentorship opportunities are available for junior engineers.

Requirements

Have a Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
Have proven work experience as a Site Reliability Engineer, DevOps Engineer, or in a similar role within a high-availability environment
Have strong experience with AWS and Azure cloud services, including a deep understanding of cloud architecture and services
Have expertise in Infrastructure as Code (IaC) using Terraform (HCL) and AWS CloudFormation
Have experience with AWS CDK for programmatic management of cloud resources, primarily using TypeScript
Have hands-on experience with container orchestration technologies, particularly Kubernetes
Have familiarity with version control systems (e.g., Git) and CI/CD pipelines for efficient code deployment
Have knowledge of monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack) to ensure system observability
Have strong experience with SQL databases and AWS DynamoDB, focusing on performance tuning and optimization
Have proven ability to design and manage RESTful APIs, ensuring their reliability and scalability
Have excellent troubleshooting skills, with a proactive approach to resolving complex technical issues
Have strong communication and teamwork skills, enabling effective collaboration across cross-functional teams
Have a curious and open-minded attitude, committed to challenging the status quo and exploring innovative solutions

Responsibilities

Design, implement, and manage cloud infrastructure in AWS and Azure, ensuring alignment with best practices and organizational standards
Utilize Terraform (HCL), AWS CDK, and AWS CloudFormation for scalable and maintainable IaC, enabling safe and efficient infrastructure builds, changes, and versioning
Deploy, manage, and provide ongoing support for containerized applications using Kubernetes, including Amazon EKS (Elastic Kubernetes Service) and Azure Kubernetes Service (AKS), ensuring their reliability, availability, and performance
Monitor application performance and system health through observability tools (e.g., Prometheus, Grafana, ELK stack), proactively identifying and resolving issues to ensure high availability and rapid incident response
Implement security best practices, managing Identity and Access Management (IAM) policies across cloud environments. Utilize technologies such as OpenID Connect (OIDC), OAuth2, and SAML Single Sign-On (SSO) to ensure secure authentication and authorization across services
Manage and optimize database systems, including SQL databases and AWS DynamoDB, ensuring high availability, performance tuning, and data security
Automate manual processes to enhance operational efficiency, employing Continuous Integration/Continuous Deployment (CI/CD) best practices for efficient code deployment
Demonstrate proficient scripting skills in languages such as Java, TypeScript, and Python to automate tasks and manage configurations
Implement and maintain load balancing solutions to ensure optimal distribution of application traffic and high availability
Collaborate with software engineering teams to design, develop, and maintain robust systems and solutions, including RESTful APIs, ensuring seamless integration across platforms
Conduct comprehensive post-mortem analyses following incidents, identifying root causes and recommending improvements to enhance system reliability and performance
Mentor and guide junior engineers, fostering a culture of knowledge sharing and continuous improvement within the engineering team

Preferred Qualifications

Have experience with networking concepts and troubleshooting in cloud environments
Have knowledge of security best practices in cloud computing
Have contributions to open-source projects or the creation of technical articles/blog posts to share knowledge with the community
Have familiarity with service mesh technologies
Have exposure to Agile methodologies and project management tools
Have financial serviced domain knowledge

Benefits

Open to hybrid or remote working

Site Reliability Engineer

Delta Capita

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Mid-level

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Remote

DevOps

Mid-level

Aviatrix

Remote

DevOps

Senior

Delta Capita

Remote

DevOps

Mid-level

Delta Capita

Remote

DevOps

Mid-level

Remote

DevOps

Mid-level

Mindrift

Remote

DevOps

Mid-level