Member of Technical Staff

Reka AI
Summary
Join Reka, a globally distributed foundation model startup, as a Member of Technical Staff on Infrastructure. You will be responsible for the reliability, performance, and scalability of our compute infrastructure, designing, building, and maintaining the tools that keep our systems running smoothly. Monitor system performance, troubleshoot issues, and implement solutions. Collaborate with engineering and research teams to ensure infrastructure meets their needs. Manage machine and storage resources efficiently and implement cost-reduction strategies. Reka offers an elite team, cutting-edge infrastructure, a massive market opportunity, an inclusive culture, and visa support.
Requirements
- Experience managing and troubleshooting large-scale distributed systems
- Strong scripting and automation skills (e.g., Python, Bash)
- Experience with containerization and orchestration technologies (e.g., Docker, Kubernetes)
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana)
- A deep understanding of cloud computing platforms (e.g., AWS, GCP, Azure)
Responsibilities
- Be responsible for the reliability, performance, and scalability of our compute infrastructure
- Design, build, and maintain the tools that keep our systems running smoothly
- Monitor system performance, troubleshoot issues, and implement solutions to prevent future problems
- Collaborate with engineering and research teams to ensure our infrastructure meets their needs
- Manage machine and storage resources efficiently, and implement strategies to reduce infrastructure costs
Preferred Qualifications
- Experience with HPC/GPU cluster management tools (e.g., Slurm, GPU monitoring tools, distributed file systems)
- The ability to build in a fast-paced environment under some uncertainty
Benefits
Visa Support : We provide visa assistance, including H1B and OPT transfers, for US employees to ensure a smooth transition and support your career with us