Site Reliability Engineer

Nextiva
Summary
Join Nextiva as a Site Reliability Engineer to enhance, support, and troubleshoot our SaaS platform. You will provide critical support for compliance-driven environments, contributing to initiatives related to FedRAMP authorization, security hardening, and industry-specific compliance standards. This role requires triaging, troubleshooting, and fixing production problems; designing, developing, and improving monitoring systems; identifying and automating tasks; and writing software to improve system reliability. You will collaborate with compliance and security teams and ensure platform reliability for regulated customer environments. The ideal candidate is a generalist comfortable working between development and systems, with experience in compliance-focused environments and strong communication skills.
Requirements
- Bachelor's degree in Computer Science or related field, or equivalent work experience
- Bilingual Spanish and English
- 0β2 years of software development experience
- 0β2 years of Linux system administration experience
- 0β2 years of performance engineering experience
- Experience working with RESTful APIs
- Experience troubleshooting complex systems
- Experience working with source control systems (e.g., Git)
- Familiarity with containerization and orchestration (e.g., Docker, Kubernetes)
- Familiarity with front-end technologies
- Familiarity with application performance monitoring tools
- Familiarity with relational databases and SQL
- Familiarity with microservices and distributed system design
- Ability to clearly communicate technical concepts
- Working knowledge of general SRE concepts and DevOps principles
- Understanding of or experience supporting regulated environments and public sector clients
Responsibilities
- Triage, troubleshoot, and fix production problems in every layer of the stack
- Design, develop, improve, and tune logging, monitoring, and alerting systems
- Identify manual tasks, document fixes via runbooks, and drive automation
- Write software to improve the reliability and recoverability of production systems
- Perform and automate system administration tasks
- Participate in on-call rotation supporting production systems
- Collaborate with compliance and security teams to meet standards for FedRAMP, HIPAA, and other regulatory frameworks
- Ensure platform reliability and availability for regulated customer environments, including healthcare and government sectors
- Support infrastructure and deployments aligned with the needs of SLED and federal clients
Preferred Qualifications
- Experience in or exposure to compliance-focused environments (e.g., FedRAMP, HIPAA, CJIS, SOC 2)
- Datadog
- Atlassian Suite (Jira, Confluence, BitBucket)
- Java/Spring
- Python
- Javascript/React
- SQL
- Ansible
- Jenkins
- Tomcat
- Git
- Redis
- RabbitMQ
- Splunk/Kibana
- Terraform
Benefits
- Comprehensive medical coverage, including dental care
- Life insurance, covering life and disability
- PTO and Paid Sick time as per CBA, paid parental leave
- Private pension plan available
- Employee Assistance Program and comprehensive wellness initiatives
- Access to ongoing learning and development opportunities and career advancement