Senior Infrastructure Engineer

Raya Logo

Raya

πŸ“Remote - Worldwide

Summary

Join Raya as a Senior Infrastructure Engineer and build robust, scalable infrastructure using modern cloud platforms, Kubernetes, and infrastructure-as-code. Drive technological advancements within Raya's Infrastructure Platform, ensuring reliability, security, and scalability for product teams. Lead Kubernetes strategy, optimize infrastructure performance, and implement monitoring solutions. Participate in incident response and contribute to system evolution through incremental enhancements. Integrate AI tools to automate tasks and collaborate with security teams for secure infrastructure design. Mentor other engineers and contribute to engineering excellence. Raya offers comprehensive benefits including medical/dental coverage, a food delivery budget, equity, unlimited vacation, paid parental leave, and a travel stipend.

Requirements

  • A BS/MS in Computer Science, Engineering, Systems Administration, or a related technical field (Professional experience can be substituted for candidates with non-engineering educational backgrounds)
  • 6-8+ years of hands-on experience with infrastructure engineering, with a track record of designing and implementing scalable infrastructure solutions
  • Strong expertise in Kubernetes and Docker, with experience designing and managing production container orchestration environments
  • Demonstrated expertise in AWS and infrastructure-as-code tools (Terraform, CloudFormation, Pulumi, Ansible)
  • Experience with performance tuning and optimization of both infrastructure and applications
  • Experience with monitoring and observability tools (Datadog, Prometheus, Grafana)
  • Proficiency in scripting and automation (Python, Bash, Go, Ruby)
  • Experience working with and incrementally improving established infrastructure environments
  • Strong collaborative instincts, emphasizing open communication, transparency, and cross-team interaction

Responsibilities

  • Infrastructure Leadership: Design and build major new infrastructure components and platforms to support Raya's growing needs
  • Kubernetes & Container Orchestration: Lead our Kubernetes strategy, designing and implementing container orchestration solutions that optimize for various application workloads
  • Performance Optimization: Design and optimize infrastructure for maximum application performance, focusing on memory management, resource allocation, network traffic optimization, and system efficiency
  • Reliability Engineering: Implement SLOs, monitoring, and observability solutions to ensure high reliability of our platform
  • Cloud Engineering: Apply your in-depth knowledge of AWS to design scalable, resilient architectures across multiple regions
  • Incident Response: Participate in on-call rotations and lead complex infrastructure incident resolution and post-incident analysis
  • System Evolution: Thoughtfully improve existing infrastructure through incremental enhancements while respecting operational constraints
  • Deployment Automation: Enhance our CI/CD pipelines and deployment strategies to enable faster, safer releases
  • AI-Enhanced Workflows: Integrate AI tools and capabilities into infrastructure workflows to automate complex tasks, enhance decision-making, and maximize operational efficiency
  • Infrastructure Security: Collaborate with security teams to implement secure-by-design infrastructure
  • Cost Optimization: Design cost-effective infrastructure solutions and implement optimization strategies
  • Team Mentorship: Contribute to engineering excellence by mentoring other infrastructure engineers

Preferred Qualifications

  • Background in SRE (Site Reliability Engineering) practices
  • Experience using AI tools to enhance infrastructure workflows, automate tasks, and improve operational efficiency
  • Knowledge of database administration and optimization (PostgreSQL, MongoDB, Redis, Elasticsearch)
  • Experience with multi-regional/global infrastructure deployment and operations
  • Track record of successfully modernizing legacy infrastructure components
  • Strong understanding of Node.js performance characteristics and experience optimizing infrastructure for Node.js workloads, including memory management, CPU utilization patterns, and scaling considerations
  • Proficiency in application profiling and performance analysis tools
  • Experience with network infrastructure and security
  • Experience with infrastructure security and compliance controls
  • Understanding of cost optimization strategies in cloud environments
  • Experience with service mesh technologies (Istio, Linkerd, Consul)
  • Experience with other cloud platforms (GCP, Azure) in addition to AWS
  • Familiarity with disaster recovery planning and implementation

Benefits

  • Comprehensive medical and dental coverage
  • $50 a day food delivery budget
  • Equity-based employment
  • Learning opportunities
  • Unlimited vacation
  • 12 weeks paid parental leave
  • $1,000 a year to go somewhere in the world that they've never been

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.