Site Reliability Engineer

Zimperium
Summary
Join Zimperium, a leader in enterprise mobile security, as a Senior Site Reliability Engineer (SRE) and contribute to building a robust engineering discipline. You will focus on optimizing existing systems, building infrastructure, and automating tasks to ensure reliable and scalable applications. As an SRE, you will be responsible for designing, coding, testing, and delivering software to automate manual operational work, troubleshooting priority incidents, and collaborating with development teams to ensure software reliability and scalability. This role requires expertise in at least one technology stack, proficiency in one or more technology domains, and hands-on experience with various technologies like Kubernetes, Docker, and Continuous Delivery tools. You will also be involved in mentoring junior developers and participating in 24x7 support coverage as needed.
Requirements
- Expertise in at least one technology stack designing, coding, testing, and delivering software
- Proficiency in one or more technology domains, may be a cross-domain expert able to solve complex and mission critical problems within a business or across the company
- Working knowledge of infrastructure components (e.g. routers, load balancers, cloud products , container systems , compute, storage and networks)
- Excellent debugging and trouble shooting skills
- Prior experience in DevOps and/or application development teams
- Hands on experience using large scale software development, preferably in one of these languages: Java, Python, scripting languages
- Hands on experience of Kubernetes, Docker, Docker Swarm style deployments
- Exposure on data-dog and data-dog monitoring
- Hands on experience of Continuous Delivery tools
- Hands on experience in Unix: Linux and Solaris
- Exposure to Orchestration and configuration management tools for applications
- Experience with infrastructure components utilized in data warehousing or big data environments
- Excellent communication skills, both written and oral appropriately scaled for senior technical and senior business audience
- Ability to work and effectively prioritize in a highly dynamic work environment that includes a global focus
Responsibilities
- Design, code, test and deliver software to automate manual operational work
- Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
- Engage with development team throughout the life cycle to help develop software for reliability and scale, ensuring minimal refactoring or changes
- Identify application patterns and analytics in support of better service level objectives
- Design self-healing and resiliency patterns
- Design automated software and product upgrades, change management, and release management solutions
- Participate in the 24Γ7 support coverage as needed
- Mentor and guide junior developers