Summary
Join Health Gorilla as the Director, Infrastructure, reporting to the EVP of Engineering. You will set the strategic direction for the platform infrastructure, encompassing hardware, software, servers, services, databases, storage, and networks. Ensure 24x7 reliability across the SaaS platform is critical. This role demands building and maintaining a best-in-class operations function, fostering a Cloud-first (AWS) and DevOps/SRE mindset. Lead platform infrastructure, operations, and IT teams across cloud and physical assets. You will also manage incident response, drive observability and reliability, and establish relationships with external vendors.
Requirements
- 5+ yrs experience in cloud-based platform infrastructure, across DevOps & SRE roles, with 2+ years managing technical teams
- Hands-on experience in some combination of AWS, Kubernetes, Docker, Mongo, Postgres, Redis, ElasticSearch, ActiveMQ Beanstalk, Ansible, Terraform and Linux
- Extensive experience with Kubernetes tooling and workloads, including understanding of scaling, monitoring, alerting, disaster recovery, networking, API gateways and service meshes
- 5+ years of experience in Core Java
- Hands-on experience with cloud-based CICD platforms (Github Actions) and Observability platforms (New Relic, Grafana + Prometheus)
- Experience with DBA ops: scaling, migrating, restoring and maintaining with near-zero downtime very large production databases
- Experience with scripting languages (sh, python) and a deep understanding of running software in production environments under heavy workloads
- Experience working closely with SecOps and Security Teams in implementing and driving cybersecurity policies and efforts across infrastructure assets
- Experience working cross-functionally with business, support, software engineering members in delivering a robust platform and resolving client reliability and implementation issues
- Hands-on experience procuring and managing third-party external vendors / services
- Experience working with a Managed Service Provider / IT contracting to provide help desk support and security oversight
- Experience working in a fast-paced start-up environment with quickly evolving requirements and needs
Responsibilities
- Lead the platform infrastructure, ops and IT organizations across all cloud-based and physical assets
- Maintain and evolve the platform infrastructure strategy that continuously supports business growth and scaling needs of our clients
- Direct, manage and individually contribute to the DevOps, SRE, DBA and IT teams & domains
- Be a technical role model & provide on-going technical leadership across all responsible domains, regularly pair and support direct reports in execution and individually lead technical analysis and strategies for implementation of new solutions
- Participate in and manage the 24/7 on-call incident response efforts as well as drive observability and reliability efforts across our full technical stack to deliver 99.9% or higher uptime
- Establish relationships, partnerships with external vendors to support all cloud-based operations
- Establish service level agreements (SLAs) and operating level agreements according to our goals and customer need
Benefits
- New hire stock option grant
- 401(k) plan with discretionary matching (historically matched at 3%)
- Unlimited PTO plus 12 Holidays
- Medical, dental, and vision insurance for you and your family
- Short-, long-term disability, and life insurance
- Optional: Pet Insurance, Legal Services, and Credit Monitoring
- Mental health and wellness support
- Paid parental leave (up to 12 weeks)
- Monthly stipend for phone and internet
- Use of an Apple Laptop plus a $400 stipend for additional equipment
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.