πUnited States
Site Reliability Engineer

Blackpoint Cyber
πRemote - Australia
Please let Blackpoint Cyber know you found this job on JobsCollider. Thanks! π
Summary
Join Blackpoint Cyber, a leading cybersecurity firm, as a Site Reliability Engineer (SRE). You will design, build, and scale robust infrastructure, CI/CD pipelines, and build systems. Collaborate with cross-functional teams to enhance system reliability, performance, and automation. Champion a culture of innovation and continuous improvement. This role requires expertise in cloud infrastructure, automation, and various technologies like Terraform, Kubernetes, and Kafka. The ideal candidate possesses strong problem-solving and communication skills and experience in agile environments. Blackpoint Cyber offers competitive benefits for eligible US employees.
Requirements
- 4+ years proven experience as a SRE Engineer or in a similar role with a strong focus on cloud infrastructure and automation
- Excellent problem-solving skills with the ability to troubleshoot complex systems in production
- Strong communication and collaboration skills, with experience working in agile environments
- Expertise in Infrastructure as Code (IaC) using Terraform and Terragrunt
- Deep knowledge of AWS cloud services and best practices for designing secure and scalable architectures
- Hands-on experience with Confluent Cloud and Kafka for distributed data streaming
- Strong experience with REDIS for caching and RDS data storage
- Strong Experience with OpenSearch/Elasticsearch/ Chaos Search
- Proficiency in monitoring and alerting using Prometheus, Grafana, Alert Manager
- Extensive experience managing Kubernetes clusters, including package management with Helm, deployment with ArgoCD, and service mesh configurations using Istio
- Familiarity with Kustomize for Kubernetes resource configuration
- Development experience in NodeJS/Python/GoLang
Responsibilities
- Design, build, and maintain highly scalable infrastructure using Terraform and Terragrunt to automate cloud resource provisioning
- Manage and optimize AWS cloud environments for cost-efficiency, security, and high availability
- Continuously improve infrastructure automation tools and methodologies to support scalability and maintainability
- Manage and scale Kafka and Confluent Cloud platforms for real-time data streaming
- Deploy and maintain Redis instances to support caching and real-time data processing workloads
- Implement and maintain robust monitoring and alerting systems using Prometheus, Grafana, Alert Manager, and OpsGenie to ensure system reliability and visibility
- Troubleshoot and resolve complex system issues, ensuring optimal performance and uptime
- Manage Kubernetes clusters using tools like Helm, ArgoCD, Istio, and Kustomize to support modern infrastructure-as-code and continuous delivery practices
- Enable feature flag management and safe, controlled rollouts using LaunchDarkly
- Work closely with development teams to seamlessly integrate new features and services into the infrastructure
- Foster a culture of continuous improvement by regularly evaluating and adopting emerging SRE tools, technologies, and best practices
Preferred Qualifications
- Experience with multi-cloud environments (e.g., GCP, Azure)
- Familiarity with security, compliance best practices in cloud and containerized environments
- Knowledge of serverless architectures and CI/CD tools such as Jenkins and/or GitHub Actions
Benefits
- Competitive Health, Vision, Dental, and Life Insurance plans
- A robust 401k plan
- Discretionary Time Off
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
π°$135k-$165k
πWorldwide
πTaiwan
πChina
πSingapore
πJapan
π°$60k-$120k
πAsia
πIndia
π°$160k-$220k
πNew Zealand