πNew Zealand
Site Reliability Engineer
![Aurora Labs Logo](https://cdn.jobscollider.com/logo/aurora-labs-bd78.webp)
Aurora Labs
πRemote - Worldwide
Please let Aurora Labs know you found this job on JobsCollider. Thanks! π
Summary
Join Aurora's infrastructure team as a Site Reliability Engineer and Software Engineer, contributing to the smooth operation of our high-performance blockchain networks. This role is a blend of reliability engineering (80%) and software engineering (20%). You will ensure high availability and failure tolerance of our infrastructure, automate configurations, and design cloud-agnostic solutions. Software engineering responsibilities involve developing sidecars, CLI tools, and processing engines. The ideal candidate is a seasoned reliability engineer with backend system experience, proficiency in Golang, and a strong understanding of SRE principles. Experience with Kubernetes and various streaming/pubsub systems is a plus.
Requirements
- Strong emphasis on SRE as an engineering subject area, with proficiency in Golang
- Successful track-record and proven experience as a backend internet services software developer
- Knowledge of SDLC, including continuous integration and testing methodologies
- Understanding of base internet infrastructure services including DNS, HTTP, server virtualization, server monitoring in critical, large scale distributed systems
- Understanding of SRE principals, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts, with a keen eye for opportunities to eliminate toil by code and process improvements
- Excellent verbal and written communication skills in English
Responsibilities
- Ensure high availability and failure tolerance of our infrastructure
- Automate configuration and maintenance of software components such as K8s, NATS, Influxdb, Postgres, Cloudflare using e.g. Ansible, Terraform, Helm and kubernetes operators
- Design and implementation of cloud-agnostic solutions without exclusively relying on specific cloud vendors- Validator and RPC nodes management automation
- Optimize the latency and throughput of the pub-sub infrastructure
- Incident management, monitoring, distributed tracing and recovery automation
- Develop sidecars that implement infrastructure cloud-agnostic abstractions for developers
- Develop CLI tools for pubsub and streaming infrastructure operations
- Develop a time series processing engine for our transaction simulation engine
- Develop indexers and blockchain event aggregation pipelines for monitoring purposes
Preferred Qualifications
- Experience with development within Kubernetes ecosystem, including operator framework, controllers and CRDs
- Experience with streaming and pubsub systems such as NATS, Apache Kafka, Apache Pulsar
- Hardware bootstrap and associated security
- Structured or unstructured storage and caching
- Automating operations processes via services and tools
- Configuration management and fleet orchestration via Puppet, Chef, Ansible, or others
- Cloud Services (AWS S3/EC2/CloudFront or equivalent)
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
πJapan
π°$60k-$120k
πAsia
πIndia
π°$177k-$190k
πUnited States
πIndia
πIndia
πIndia
πCanada
πIndia