Summary
Join Flex, a NYC-based FinTech company revolutionizing rent payments, as a Senior Infrastructure Engineer. You will be a key member of a small team responsible for building and maintaining a scalable and reliable infrastructure on AWS and GCP. This remote role requires at least 5 years of cloud infrastructure experience and proficiency in technologies like Terraform, Kubernetes, and CI/CD. You will collaborate with service engineering teams, automate processes, and ensure optimal system performance. Flex offers a competitive compensation package, including comprehensive health insurance, 401k, unlimited PTO, and parental leave.
Requirements
- Proven experience in building, scaling and monitoring cloud infrastructure on AWS, especially EKS, S3, RDS, API Gateway, Load Balancers, VPC, Lambdas, DocumentDB and DynamoDB
- Proven experience using Terraform to update and maintain cloud infrastructure
- Proven experience with containerized applications, kubernetes and microservice deployments
- Strong knowledge of GitHub Actions and CI/CD best practices
- Experience with developer productivity tools: designing CI/CD workflows, building internal tools, and creating self-service solutions to streamline software development
- Knowledge of monitoring and observability tools and frameworks, with working knowledge of Datadog being a plus
- Familiarity with networking concepts (DNS, load balancing, firewalls, VPNs)
- Strong collaboration skills with the ability to work effectively across teams and communicate technical ideas clearly
- Experience coding/reading in one of the industry standard language such as Java, Python, TypeScript
- Minimum of 5 years of cloud infrastructure experience
Responsibilities
- Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions optimizing for performance, resilience, and cost
- Ensure infrastructure aligns with business requirements and industry standards
- Leverage Terraform to automate infrastructure provisioning and configurations
- Implement SRE principles to improve system reliability and reduce downtime
- Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes to remove friction
- Develop and maintain robust monitoring and alerting systems to proactively identify and resolve issues
- Lead incident responses, manage on-call rotations, and facilitate post-incident reviews to drive continuous improvement and resilience
- Automate everythingβdrive adoption of Infrastructure as Code (IaC) and build automated pipelines for testing, monitoring, and deployments
- Leverage your excellent written and verbal communication skills, to create communications on upcoming changes and how they affect teams
Benefits
- Competitive pay
- 100% company-paid medical, dental, and vision
- 401(k) + company equity
- Unlimited paid time off with a PTO minimum + 13 company paid holidays
- Parental leave
- Flex Cares Program: Non-profit company match + pet adoption coverage
- Free Flex subscription
- Competitive Pay
- Company Equity
- Unlimited PTO