Site Reliability Engineer at Syndica

Summary

The job is for a Site Reliability Engineer at Syndica, a Web3 RPC infrastructure company. The candidate should have 5+ years of DevOps or SRE experience, proficiency in scripting languages and modern programming languages, experience with Kubernetes, web protocols, information security, automation tools, capacity planning, and various monitoring tools.

Requirements

Great collaborator with 5+ years of experience in a DevOps or SRE role
Proficiency in scripting languages (Python, Shell) and experience with at least one modern programming language (Go, Rust, Typescript, etc.)
Experience deploying large-scale systems reliably
Experience using Kubernetes
Working knowledge of web and network protocols and standards (HTTP, TLS, DNS, etc)
Working knowledge of information security issues
Experience writing automation tools & eagerness to 'automate all the things
Commitment to implementing reliability and security best practices
Capacity planning experience, including resource optimization and load testing
Systematic problem-solving approach, combined with a strong sense of ownership and drive

Responsibilities

Administer overall site availability, security, latency, and system health
Effective provisioning, installation/configuration, operation, and maintenance of services and system software and related infrastructure
Develop comprehensive monitoring solutions to provide full visibility to the different system components using tools like Kubernetes, Prometheus, Grafana, ELK, Datadog, New Relic, etc
Enable the development team to release code quickly and reliably by ensuring full observability of systems and automated detection of performance and integration issues
Formulate technical performance measures and implement them using queries, logs, code instrumentation and other analytics tools
Design dashboards and visualizations that effectively convey technical measures
Troubleshoot issues at multiple layers of deployment, from hardware, to operating environment, network, and application to conduct root cause analysis and make recommendations from your findings
Work with development teams to ensure best practices for scalability, reliability, and security are designed and implemented from the start
Forecast changes in demand and capacity to establish appropriate scalability plans and drive decisions on the right-sizing of servers, storage and other resources
Design and perform high-throughput stress testing to determine system capacity limits and identify points of failure
Troubleshoot critical customer issues related to Syndica’s RPC, APIs, and App Deployments

Preferred Qualifications

Experience with Prometheus/Grafana for metrics aggregation/visualization and other monitoring and alerting tools
Experience with infrastructure-as-code tools such as Terraform, Ansible, Chef
Experience in Building and managing Virtualized systems (KVM, OVM, Containers/Docker) and ability to read and understand source code
Knowledge of one or more load testing tools (K6, Locust, JMeter, etc.)
Experience with configuration of CI/CD pipelines

Site Reliability Engineer

Syndica

Summary

Requirements

Responsibilities

Preferred Qualifications

Remote

DevOps

Senior

Similar Remote Jobs

Remote

DevOps

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Remote

Software Development

Senior

Tailor

Remote

Software Development

Mid-level

Remote

DevOps

Senior

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

Kraken Digital Asset Exchange

Remote

DevOps

Mid-level

GoDaddy

Remote

DevOps

Mid-level

Remote

DevOps

Senior