Summary
Join Emi Labs, a Y Combinator-backed startup on a mission to improve access to professional opportunities for frontline workers. We're building Emi, an AI recruitment assistant, and need a skilled SRE Engineer to centralize our microservices telemetry and improve our monitoring and observability stack. You'll be a key player in streamlining our tooling and ensuring high uptime for our services. This role requires significant experience with microservices, distributed systems, Kubernetes, and observability platforms. The ideal candidate is a proactive problem-solver with a strong DevOps mindset and excellent teamwork skills. Emi Labs offers a dynamic startup environment with ample growth opportunities.
Requirements
- 4 years of experience working with microservices and distributed systems: tracing, load balancing, concurrency, event-driven architecture patterns
- 4 years of experience working as an SRE or DevOps engineer in cloud solution architecture
- 3 years of experience administrating Kubernetes clusters in EKS/AKS/GKE
- 2 years of experience administrating any observability platform
- Fluent in scripting, Linux, Docker and CI pipelines
- Experience dealing with production incidents, troubleshooting and remediation
- Familiar with any modern Infra as Code solution, Terraform is a plus
- Advanced English level
Responsibilities
- Be a key player in the project of centralizing our microservices telemetry (metrics, traces and logs) in a single platform
- Become the technical owner of our monitoring and observability stack
- Move fast, streamlining our tooling to unlock the full potential of our development teams
- Identify, propose, and implement improvements in our cloud infrastructure and Kubernetes clusters
- Ensure the best possible experience for our developers and the highest uptime rate for our services
Preferred Qualifications
- Experience with OpenTelemetry, Grafana, Sentry, and/or New Relic
- Avid to train developers and new colleagues in the DevOps mindset
- Strong background in designing backend systems on any high-level language
- Enterprise integrations experience: webhooks, public APIs
- Application performance optimization: SQL queries, profiling, memory heap, garbage collection
- Deployment strategies: rolling update, blue/green, canary
- FinOps experience
- Experience designing and implementing Disaster Recovery Plans
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.