Senior Cloud Performance Engineer

ClickHouse
Summary
Join ClickHouse's Cloud Performance Engineering team and build the cloud-native ClickHouse Cloud Platform. This role requires 6+ years of experience in building and operating scalable, fault-tolerant, distributed systems, proficiency in languages like Go, C/C++, or Java, and expertise with cloud infrastructure (preferably Kubernetes and a public cloud provider). You will benchmark system performance, analyze database performance, optimize capacity, troubleshoot errors, and collaborate with various teams. The ideal candidate is a strong problem solver with excellent communication skills and a passion for efficiency and scalability. ClickHouse offers a remote-first work environment, healthcare contributions, stock options, flexible time off, a home office setup allowance, and opportunities for international mobility.
Requirements
- 6+ years of relevant software development industry experience building and operating scalable, fault-tolerant, distributed systems
- Software development experience in Go, C/C++, Java, or similar
- Experience with concurrency, multithreading, and the deployment of distributed system architectures
- Experience developing cloud infrastructure services, preferably with Kubernetes
- Experience leading and shipping large scope technical projects in collaboration with multiple experienced engineers
- Expertise with a public cloud provider (AWS, GCP, Azure) and their infrastructure as a service offering (e.g. EC2)
- Excellent communication skills and the ability to work well within a team and across engineering teams
- Strong problem solver and solid production debugging skills
- Passionate about efficiency, availability, scalability and data governance
- Thrive in a fast paced environment, and see yourself as a partner with the business with the shared goal of moving the business forward
- High level of responsibility, ownership, and accountability
Responsibilities
- Benchmark system performance, database performance analysis, capacity sizing and optimization
- Ability to troubleshoot and debug application and server errors and logs and triage accordingly
- Recommend configuration tuning/optimizations for performance bottlenecks
- Work closely with ClickHouse core development team, cloud team, security team and partner with them to improve the performance of ClickHouse Cloud
- Plan, enable, and drive Chaos initiatives across Engineering teams, based upon internal priorities
- Develop, deploy and manage tools to systematically run chaos experiments and measure impact
- Enjoy working on, and gaining a deep understanding of, large scale distributed systems
- Study the problems in the software resilience, operational, and delivery spaces
- Extend our entire backend to enable Chaos Engineering techniques in the system
- Observe running systems, and determine/prioritize innovative ways to disrupt them
Benefits
- Flexible work environment - ClickHouse is a distributed company offering remote-first work to all employees
- Healthcare - Employer contributions towards your healthcare
- Equity in the company - Every new team member who joins our company receives stock options
- Time off - Flexible time off in the US, generous entitlement in all countries
- A $500 Home office setup if youβre a remote employee
- Employee-driven international mobility - we enable you to relocate internationally if you wish (within certain countries and timelines and subject to role requirements, time zones and work permit considerations)
- Cash compensation and a stock options grant