Remote Senior Infrastructure Engineer, Site Reliability Engineer

Logo of Flex

Flex

๐Ÿ“Remote - United States

Job highlights

Summary

Join our dynamic Infrastructure Team as a Senior Infrastructure Engineer to help us keep our mission growing. You will be part of the team responsible for creating a sustainable platform that ensures the effectiveness, reliability and scalability of our systems.

Requirements

  • Proven experience in building, scaling and monitoring cloud infrastructure on AWS, especially EKS, S3, RDS, API Gateway, Load Balancers, VPC, Lambdas, DocumentDB and DynamoDB
  • Proven experience using Terraform to update and maintain cloud infrastructure
  • Proven experience with containerized applications, kubernetes and microservice deployments
  • Strong knowledge of GitHub Actions and CI/CD best practices
  • Experience with developer productivity tools: designing CI/CD workflows, building internal tools, and creating self-service solutions to streamline software development
  • Knowledge of monitoring and observability tools and frameworks, with working knowledge of Datadog being a plus
  • Familiarity with networking concepts (DNS, load balancing, firewalls, VPNs)
  • Strong collaboration skills with the ability to work effectively across teams and communicate technical ideas clearly
  • Experience coding/reading in one of the industry standard language such as Java, Python, TypeScript

Responsibilities

  • Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions optimizing for performance, resilience, and cost
  • Ensure infrastructure aligns with business requirements and industry standards
  • Leverage Terraform to automate infrastructure provisioning and configurations
  • Implement SRE principles to improve system reliability and reduce downtime
  • Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes to remove friction
  • Develop and maintain robust monitoring and alerting systems to proactively identify and resolve issues
  • Lead incident responses, manage on-call rotations, and facilitate post-incident reviews to drive continuous improvement and resilience
  • Automate everythingโ€”drive adoption of Infrastructure as Code (IaC) and build automated pipelines for testing, monitoring, and deployments
  • Leverage your excellent written and verbal communication skills, to create communications on upcoming changes and how they affect teams

Benefits

  • Competitive pay
  • 100% company-paid medical, dental, and vision
  • 401(k) + company stock options
  • Unlimited paid time off with a PTO minimum + 13 company paid holidays
  • Parental leave
  • Flex Cares Program: Non-profit company match + pet adoption coverage
  • Free Flex subscription

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs

Please let Flex know you found this job on JobsCollider. Thanks! ๐Ÿ™