Senior Infrastructure Engineer

Tripadvisor
Summary
Join Tripadvisor's Technical Operations team as an Operations Engineer and become a force multiplier for engineering and operations teams. You will build stability in production environments, empower end users through workflow and infrastructure automation, and ensure service uptime predictability. Responsibilities include maintaining consistent deployment solutions across diverse environments, participating in on-call duties, and resolving incidents to prevent recurrence. The ideal candidate possesses experience in highly available production environments, proficiency in coding/scripting (Python), and practical experience with containerization and orchestration (Kubernetes). A strong understanding of Linux, web servers, configuration management, and hybrid-cloud environments is also required. This role offers the opportunity to contribute to the evolution of Tripadvisor's infrastructure and engineering.
Requirements
- Experience working in a highly available, dynamic production environment
- Proficiency in coding/scripting languages such as Python. Driven towards automation, removing manual process bottlenecks to increase efficiency
- Practical experience with containerization and orchestration (e.g., Kubernetes)
- Strong understanding of Linux (RedHat/CentOS) and Web Servers (Tomcat/Apache)
- Experience with configuration management (e.g., Puppet, Ansible) and infrastructure-as-code (e.g., Terraform, CDK)
- Experience working in hybrid-cloud environments (AWS preferred) as well as on-premise data centers
- Familiarity with database technologies (PostgreSQL preferred)
- Excellent problem solver, ability to action high-level business needs and adapt to changing requirements
- Strong knowledge of networking fundamentals
Responsibilities
- Build Stability: Strengthen our production environments, implement best practices, and when appropriate drive change across existing workflows
- Empower end users: Collaborate across engineering & operation teams to improve automation of workflows and infrastructure
- Ensure predictability: Establish SLAs for service uptime and build the necessary telemetry and alerting to reach them
- Maintain Consistency: Develop and maintain solutions for deploying critical production infrastructure across diverse environments
- Practice Accountability: Participate in periodic on-call duties and ensure that incident root causes are identified, debugged, and resolved to prevent recurrence