Senior Site Reliability Engineer at Blackpoint Cyber

Summary

Join Blackpoint Cyber, a leading cybersecurity company experiencing rapid growth, as a Senior SRE Engineer. You will play a key role in designing, implementing, and maintaining our infrastructure and CI/CD pipelines. This position requires expertise in cloud infrastructure, automation, and various technologies like Terraform, AWS, Kafka, and Kubernetes. You will collaborate with cross-functional teams to ensure system reliability and efficiency. The ideal candidate possesses extensive experience in SRE and a strong understanding of cloud security and scalability. Blackpoint Cyber offers competitive benefits, including health insurance, a 401k plan, and discretionary time off.

Requirements

8+ years proven experience as a Senior SRE Engineer or in a similar role with a strong focus on cloud infrastructure and automation
Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt
Deep knowledge of AWS cloud services and best practices for designing secure and scalable architectures
Hands-on experience with Confluent Cloud and Kafka for distributed data streaming
Strong experience with REDIS for caching and RDS data storage
Strong Experience with OpenSearch/ElasticSearch/ ChaosSearch
Proficiency in monitoring and alerting using Prometheus, Grafana, Alert Manager, and OpsGenie
Experience with LaunchDarkly for feature flag management
Extensive experience managing Kubernetes clusters, including package management with Helm, deployment with ArgoCD, and service mesh configurations using Istio
Familiarity with Kustomize for Kubernetes resource configuration
Excellent problem-solving skills with the ability to troubleshoot complex systems in production
Strong communication and collaboration skills, with experience working in agile environments

Responsibilities

Design, build, and maintain highly scalable infrastructure using Terraform and Terragrunt to automate cloud resource provisioning
Manage cloud environments, particularly in AWS, ensuring cost optimization, security, and high availability
Work with Confluent Cloud and Kafka to manage and scale our data streaming platforms
Deploy and manage REDIS instances for caching and real-time data processing
Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie to ensure system reliability
Enable feature flag management and controlled rollouts using LaunchDarkly
Manage Kubernetes clusters using Kubernetes, Helm, ArgoCD, Istio, and Kustomize for continuous delivery and infrastructure-as-code practices
Collaborate with development teams to ensure seamless integration of new services and features into our infrastructure
Troubleshoot and resolve complex system issues, ensuring high performance and uptime
Continuously improve automation tools, processes, and methodologies to enhance system scalability and maintainability
Stay up-to-date with emerging SRE trends and technologies, ensuring the organization leverages the latest advancements

Preferred Qualifications

Experience with multi-cloud environments (e.g., GCP, Azure)
Familiarity with security best practices in cloud and containerized environments
Knowledge of serverless architectures and CI/CD tools such as Jenkins and Github Actions
Some development experience in NodeJS/Python/GoLang

Benefits

Competitive Health, Vision, Dental, and Life Insurance plans
A robust 401k plan
Discretionary Time Off
Other minor perks

Senior Site Reliability Engineer

Blackpoint Cyber

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Similar Remote Jobs

Remote

DevOps

Senior

ModMed

Remote

DevOps

Senior

Remote

DevOps

Senior

Vantage

Remote

DevOps

Senior

Remote

DevOps

Senior

Wizeline

Remote

DevOps

Senior

Algolia

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Supermetrics

Remote

DevOps

Senior