Site Reliability Engineer - Observability

Second Front Systems Logo

Second Front Systems

πŸ’΅ $160k-$180k
πŸ“Remote - Worldwide

Summary

Join Second Front Systems' Product team as a Senior Site Reliability Engineer and contribute to the deployment of emerging technology in national security. You will be responsible for deploying, maintaining, and scaling observability infrastructure across multiple DoD networks, working with Kubernetes-based platforms and BigBang charts. Collaborate with others to implement infrastructure delivering unique capabilities for commercial and government customers, including the Department of Defense. The ideal candidate possesses deep DevSecOps and Kubernetes experience, with a focus on security in highly-regulated environments. This is a fast-growing team working at the intersection of technology and national security. U.S. citizenship is required due to government contract requirements.

Requirements

  • 5+ years of Site Reliability Engineering or DevOps experience
  • Deep experience with Kubernetes administration, troubleshooting, and scaling
  • Hands-on experience deploying and maintaining observability tools (Prometheus, Grafana, Mimir/Cortex)
  • Strong understanding of Helm charts, GitOps practices, and CNCF tooling
  • Experience with service mesh technologies (Istio preferred)
  • Proven ability to debug complex distributed systems and networking issues
  • Understanding of authentication systems and security in regulated environments
  • Ability to work independently and collaborate with team members in a remote environment

Responsibilities

  • Deploy and maintain observability stack (Grafana, Mimir, Prometheus) across multiple customer clusters and DoD networks
  • Build Helm chart abstractions and automation to streamline monitoring deployments for new customers
  • Troubleshoot and debug complex Kubernetes issues, networking problems, and monitoring stack failures
  • Configure and maintain BigBang charts and DoD Platform One integrations
  • Design and implement infrastructure automation using tools like Pulumi, ArgoCD, and Flux
  • Work with Istio service mesh and Keycloak for authentication in secure environments
  • Monitor and optimize performance of monitoring infrastructure across multiple environments
  • Collaborate with security teams to ensure compliance with NIST requirements and DoD standards
  • Participate in on-call rotation and incident response for production environments

Preferred Qualifications

  • Active security clearance or ability to obtain a Secret-level security clearance
  • Previous experience with DoD software deployments and Platform One
  • Experience with BigBang charts and Iron Bank containers
  • Experience working in national security or highly regulated environments
  • Familiarity with compliance frameworks (NIST, FedRAMP, etc.)
  • Experience with infrastructure as code (Pulumi, Terraform)

Benefits

  • Competitive Salary
  • 100% Healthcare, vision and dental coverage
  • 401(k) + 3% company contribution
  • Wellness perks (Fitness classes, mental health resources)
  • Equity incentive plan
  • Tech + office supplies stipend
  • Annual professional development stipend
  • Flexible paid time off + federal holidays off
  • Parental leave
  • Work from anywhere
  • Referral Bonus

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.