Summary
Join Zuora's Cloud Engineering team as a technical leader, driving customer-impacting incident mitigation and cloud reliability strategy execution. You will leverage your expertise in cloud technologies, system design, and automation to achieve transformative results in availability and cost. This hybrid role requires 10-12 years of experience in architecting and operating large-scale cloud infrastructure. Responsibilities include providing technical leadership, driving observability and automation, mentoring junior engineers, and improving operational efficiency. Zuora offers competitive compensation, benefits including health insurance, generous time off, parental leave, and professional development opportunities.
Requirements
- 10-12 years years of experience architecting/Designing/Implementing and Operating large scale private or public cloud infrastructure that was hosting customer facing software applications
- Indepth experience in AWS Cloud services - EC2, ECS, EKS, S3, Elastic Cache, RDS etc
- In depth experience in trouble shooting availability and performance issues across the technology stack
- Strong experience in Monitoring & Observability
- Strong experience in capacity management of diverse infrastructure and platform components
- In-depth experience in automating Implementation and operational activities using CI/CD Pipelines and micro services architecture
- Excellent communication, problem solving, critical thinking, Challenge status quo and outcome driven
- Good experience in agile practice
- Good programming knowledge in python or any modern language
- Experience with automation sub-systems (Shell, Ansible, Puppet, Jenkins, Terraform, ECS, Kubernetes)
- Experience in infrastructure services (DNS, Mail Relays, NTP, CDN, LBs, SSL Certificates)
- Experience in Database environments such as Oracle and MySQL
- Experience in Data refresh and Data sync services and technologies that implements those services
- Good knowledge in Messaging Queues, APIs, Application servers
- Experience in using ELK stack, Grafana
- Knowledge in API Gateways and Caching technologies
Responsibilities
- Be an Escalation point to provide technical leadership and drive customer impacting Incidents to Faster Mitigation
- Drive initiatives to execute on our cloud reliability strategy to achieve transformative results in availability, Time to Mitigate, Customer detected & Impacted Incidents, and cost
- Drive Full stack observability and diagnostic automations
- Provide technical leadership to transform the team to run the operations with AI/ automations and Nurture partnerships
- Help in Hiring and Mentoring Junior Engineers
- Drive Self service automations
- Own and be accountable for efficiency improvements in operations
- Technical leadership to drive quality in problem management
- Lead Operational readiness reviews, define the controls and automate
- Ideate and action initiatives to operationalize cost efficiencies and improve customer experience
- Develop teamβs delivery roadmap, drive PI planning and own the delivery of the roadmap through the sprints
- Partner across product engineering, Product management , support and field to improve productivity and experience
Benefits
- Competitive compensation, variable bonus and performance reward opportunities, company equity and retirement programs
- Medical, dental and vision insurance
- Generous, flexible time off
- Paid holidays, βwellnessβ days and company wide end of year break
- 6 months fully paid parental leave
- Learning & Development stipend
- Opportunities to volunteer and give back, including charitable donation match
- Free resources and support for your mental wellbeing
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.