Senior Site Reliability Engineer

closed
Blackpoint Cyber Logo

Blackpoint Cyber

πŸ“Remote - Canada

Summary

Join Blackpoint Cyber, a leading cybersecurity company experiencing rapid growth, as a Senior SRE Engineer. You will play a key role in designing, implementing, and maintaining our infrastructure and CI/CD pipelines. This position requires expertise in cloud infrastructure, automation, and various technologies like Terraform, AWS, Kafka, and Kubernetes. You will collaborate with cross-functional teams to ensure system reliability and efficiency. The ideal candidate possesses extensive experience in SRE and a strong understanding of cloud security and scalability. Blackpoint Cyber offers competitive benefits, including health insurance, a 401k plan, and discretionary time off.

Requirements

  • 8+ years proven experience as a Senior SRE Engineer or in a similar role with a strong focus on cloud infrastructure and automation
  • Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt
  • Deep knowledge of AWS cloud services and best practices for designing secure and scalable architectures
  • Hands-on experience with Confluent Cloud and Kafka for distributed data streaming
  • Strong experience with REDIS for caching and RDS data storage
  • Strong Experience with OpenSearch/ElasticSearch/ ChaosSearch
  • Proficiency in monitoring and alerting using Prometheus, Grafana, Alert Manager, and OpsGenie
  • Experience with LaunchDarkly for feature flag management
  • Extensive experience managing Kubernetes clusters, including package management with Helm, deployment with ArgoCD, and service mesh configurations using Istio
  • Familiarity with Kustomize for Kubernetes resource configuration
  • Excellent problem-solving skills with the ability to troubleshoot complex systems in production
  • Strong communication and collaboration skills, with experience working in agile environments

Responsibilities

  • Design, build, and maintain highly scalable infrastructure using Terraform and Terragrunt to automate cloud resource provisioning
  • Manage cloud environments, particularly in AWS, ensuring cost optimization, security, and high availability
  • Work with Confluent Cloud and Kafka to manage and scale our data streaming platforms
  • Deploy and manage REDIS instances for caching and real-time data processing
  • Implement and maintain monitoring and alerting solutions using Prometheus, Grafana, Alert Manager, and OpsGenie to ensure system reliability
  • Enable feature flag management and controlled rollouts using LaunchDarkly
  • Manage Kubernetes clusters using Kubernetes, Helm, ArgoCD, Istio, and Kustomize for continuous delivery and infrastructure-as-code practices
  • Collaborate with development teams to ensure seamless integration of new services and features into our infrastructure
  • Troubleshoot and resolve complex system issues, ensuring high performance and uptime
  • Continuously improve automation tools, processes, and methodologies to enhance system scalability and maintainability
  • Stay up-to-date with emerging SRE trends and technologies, ensuring the organization leverages the latest advancements

Preferred Qualifications

  • Experience with multi-cloud environments (e.g., GCP, Azure)
  • Familiarity with security best practices in cloud and containerized environments
  • Knowledge of serverless architectures and CI/CD tools such as Jenkins and Github Actions
  • Some development experience in NodeJS/Python/GoLang

Benefits

  • Competitive Health, Vision, Dental, and Life Insurance plans
  • A robust 401k plan
  • Discretionary Time Off
  • Other minor perks
This job is filled or no longer available

Similar Remote Jobs