Remote Senior Infrastructure Engineer, Site Reliability Engineer
Flex
๐Remote - United States
Please let Flex know you found this job on JobsCollider. Thanks! ๐
Job highlights
Summary
Join our dynamic Infrastructure Team as a Senior Infrastructure Engineer to help us keep our mission growing. You will be part of the team responsible for creating a sustainable platform that ensures the effectiveness, reliability and scalability of our systems.
Requirements
- Proven experience in building, scaling and monitoring cloud infrastructure on AWS, especially EKS, S3, RDS, API Gateway, Load Balancers, VPC, Lambdas, DocumentDB and DynamoDB
- Proven experience using Terraform to update and maintain cloud infrastructure
- Proven experience with containerized applications, kubernetes and microservice deployments
- Strong knowledge of GitHub Actions and CI/CD best practices
- Experience with developer productivity tools: designing CI/CD workflows, building internal tools, and creating self-service solutions to streamline software development
- Knowledge of monitoring and observability tools and frameworks, with working knowledge of Datadog being a plus
- Familiarity with networking concepts (DNS, load balancing, firewalls, VPNs)
- Strong collaboration skills with the ability to work effectively across teams and communicate technical ideas clearly
- Experience coding/reading in one of the industry standard language such as Java, Python, TypeScript
Responsibilities
- Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions optimizing for performance, resilience, and cost
- Ensure infrastructure aligns with business requirements and industry standards
- Leverage Terraform to automate infrastructure provisioning and configurations
- Implement SRE principles to improve system reliability and reduce downtime
- Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes to remove friction
- Develop and maintain robust monitoring and alerting systems to proactively identify and resolve issues
- Lead incident responses, manage on-call rotations, and facilitate post-incident reviews to drive continuous improvement and resilience
- Automate everythingโdrive adoption of Infrastructure as Code (IaC) and build automated pipelines for testing, monitoring, and deployments
- Leverage your excellent written and verbal communication skills, to create communications on upcoming changes and how they affect teams
Benefits
- Competitive pay
- 100% company-paid medical, dental, and vision
- 401(k) + company stock options
- Unlimited paid time off with a PTO minimum + 13 company paid holidays
- Parental leave
- Flex Cares Program: Non-profit company match + pet adoption coverage
- Free Flex subscription
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- ๐Worldwide
- ๐ฐ$60k-$120k๐Asia
- ๐ฐ$95k-$125k๐Worldwide
- ๐ฐ$147k-$207k๐United States
- ๐Brazil
- ๐United States
- ๐ฐ$198k-$220k๐United States
- ๐United States
Please let Flex know you found this job on JobsCollider. Thanks! ๐