Summary
Join Uniswap Labs as a Senior Site Reliability Engineer and play a critical role in architecting, building, and maintaining our infrastructure. You will lead initiatives to improve the reliability and scalability of our services, working with development, product, and security teams. This role requires experience in automation, operational excellence, and driving system reliability at scale. The position can be partially or fully remote. You will design, implement, and maintain systems and processes to enhance reliability, availability, and performance. You will also collaborate with various teams and communicate effectively with stakeholders.
Requirements
- Bachelorβs or Masterβs degree in Computer Science, Engineering, or a related field
- 5+ years of experience in site reliability engineering, DevOps, or a related field
- Strong understanding of reliability engineering principles, practices, and tools
- Proficiency in monitoring and alerting tools (e.g., Prometheus, Grafana, Nagios)
- Experience with cloud platforms (AWS, Azure, GCP) and container orchestration systems (Kubernetes, Docker)
- Proficiency in scripting and automation tools, such as Python, Bash, Ansible, or Terraform
- Excellent problem-solving skills and the ability to work under pressure in a fast-paced environment
- Strong communication and interpersonal skills, with the ability to influence and lead teams
Responsibilities
- Design, implement, and maintain systems and processes that enhance the reliability, availability, and performance of our services
- Design, implement and maintain CICD tools and processes to increase reliability
- Design, implement and maintain cloud constructs to increase reliability
- Develop and manage monitoring, alerting, and incident response strategies to minimize downtime and ensure rapid recovery from incidents
- Conduct root cause analysis of system failures and implement preventative measures
- Optimize system performance and automate repetitive tasks to improve operational efficiency
- Work closely with software engineering, infrastructure, and product teams to integrate reliability practices into the development lifecycle
- Advocate for SRE best practices and foster a culture of reliability and operational excellence across the organization
- Communicate effectively with stakeholders, providing regular updates on reliability metrics, incidents, and improvement initiatives
- Stay abreast of the latest industry trends and technologies in SRE, reliability, and performance
- Continuously evaluate and improve existing systems and processes to enhance reliability and efficiency
- Drive the adoption of new tools and technologies that can improve operational capabilities
Preferred Qualifications
- Experience with continuous integration and continuous deployment (CI/CD) practices and tools
- Knowledge of configuration management tools (e.g., Puppet, Chef)
- Experience with database management and optimization
- Familiarity with compliance frameworks and security best practices
- Relevant certifications such as AWS Certified DevOps Engineer, Google Professional SRE, or equivalent
Benefits
- Company-paid medical, dental, & vision for you and your dependents
- Gym subsidy
- 401(k) with 4% employer contribution
- Annual $1,500 education stipend
- Unlimited and encouraged time off
- Up to 16 weeks paid parental leave
- Home office setup stipend for remote employees
- Daily lunches at NY headquarters
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.