Senior Site Reliability Engineer

Invisible Technologies
Summary
Join Invisible Technologies, a leading AI training and scaling partner, as a professional engineer. You will play a key role in ensuring the availability, performance, and scalability of production systems for one of our core products. As an owner, you will focus on deploying, configuring, and managing cloud-based infrastructure, optimizing for performance and cost efficiency. You will design and maintain monitoring systems, define service level objectives, and build automated solutions. Collaboration with engineering teams to improve application reliability is crucial. The role requires strong cloud architecture understanding, experience with Kubernetes and Terraform, and expertise in relational databases and security principles. Compensation includes a salary range of $68,000-$80,000 USD, plus bonuses and equity for roles above entry level. Invisible is a remote-first organization.
Requirements
- Strong understanding of cloud architecture including expertise with major cloud providers (GCP, AWS, Azure)
- Understand underlying networking and security considerations when developing the architecture of our deployment environments
- Strong understanding of Relational Databases (PostgreSQL) and be comfortable optimizing and advising the broader engineering team on optimization techniques to ensure the data layer of our deployed services run smoothly
- Strong understanding of authentication and authorization principles such as IAM, Security Groups, RBAC, etc
- Understanding of software engineering fundamentals, practices, and patterns with distributed cloud services
- Strong experience with production systems troubleshooting and optimization
- Experience with Kubernetes and be able to point to deployments they have architected or managed
- Strong understanding of the operating model of Kubernetes and be able to explain the requirements for designing deployments for new applications
- Experience with infrastructure as code tools such as Terraform or CloudFormation
Responsibilities
- Ensure the availability, performance, and scalability of production systems
- Deploy, configure, automate, and manage cloud-based infrastructure using tools like Kubernetes, Terraform, and Argo
- Identify and resolve system bottlenecks, optimizing for performance and cost efficiency across engineering teams
- Design, support, and manage deployment pipelines to enable world class delivery of applications
- Design, develop, and maintain comprehensive monitoring and observability systems using Datadog and Sentry
- Define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure reliability and performance
- Design and implement automated solutions to reduce manual operational tasks
- Build tools for system provisioning, monitoring, deployment, and scaling
- Collaborate closely within engineering teams to improve application reliability, resilience, and maturity
Benefits
Bonuses and equity are included in offers above entry level