DevOps / Site Reliability Engineer
closedMoneyLion
πMalaysia
Job highlights
Summary
Join us in the SRE/DevOps team where we design, implement, and maintain a secure and scalable infrastructure platform for delivering MoneyLion's applications.
Requirements
- Exposure to cloud IaaS (AWS, GCP or other relevant)
- Linux administration (CoreOS, or any Linux in general)
- Linux containers, orchestration (Docker, Kubernetes), and Immutable infrastructure
- Familiarity with Infrastructure-as-Code principals and technologies like Terraform or CloudFormation
- Ability to learn quickly, think critically and make snap judgements based on measured data in high pressure situations
- Strong communicator and have the ability to guide teams to troubleshoot and tune production performance issues
- Comfortable in writing tools, in Go or willing to learn, for day-to-day operational use
- Working knowledge of industry best practices with regards to information security
Responsibilities
- Provide or develop the tooling that will allow the individual Product Teams to be autonomous, via shared Kubernetes platform, Codefresh CI/CD and self-services infra resources via Atlantis/Terraform
- Participate in a 24/7 on-call rotation that supports our production Kubernetes platform running in AWS
- Work to constantly improve our resiliency by developing self-healing, self-assembling infrastructure; proactively running load tests and Chaos Engineering experiments
- Dive into problems with an eye to both immediate remediation as well as the follow-through changes and automation that will prevent future occurrences
- Maintain day-to-day vigilance with regards to security while helping to enhance the intrinsic security of the overall production system
- Own and ensure that internal and external SLAβs meet and exceed expectations, System centric KPIs are continuously monitored and improved
- Provide consultation and support for Product Teams in achieving their OKRs: Availability and Service Excellence
- Handle day-to-day duties: on-boarding, off-boarding, manage resource access permissions and maintain the shared tooling like CI/CD, inc. artifact repositories
- Review architecture across teams; ensuring best practices are propagated company wide
Preferred Qualifications
- Have prior experience working in high performance and highly available distributed systems
- Are able to knowledgeably implement performance, and security in complex multi-teams scenarios
- Are familiar with microservices architectures and able to understand the trade-offs
- Have practical knowledge of event streaming and experience in designing systems to leverage SQS, Kafka, Kinesis correctly
- Have good knowledge about Hashicorp stack; especially Vault
Benefits
- Competitive salary packages
- Comprehensive medical, dental, vision and life insurance benefits
- Wellness perks
- Paid parental leave
- Generous Paid Time Off
- Learning and Development resources
- Flexible working hours
This job is filled or no longer available
Similar Remote Jobs
- πMexico
- πBrazil
- πBrazil
- π°$165k-$200kπUnited States
- π°$177k-$213kπUnited States
- π°$60k-$120kπAsia
- π°$151k-$297kπUnited States
- πSpain
- πIndia
- πFrance