DevOps / Site Reliability Engineer

closed
Logo of MoneyLion

MoneyLion

πŸ“Malaysia

Job highlights

Summary

Join us in the SRE/DevOps team where we design, implement, and maintain a secure and scalable infrastructure platform for delivering MoneyLion's applications.

Requirements

  • Exposure to cloud IaaS (AWS, GCP or other relevant)
  • Linux administration (CoreOS, or any Linux in general)
  • Linux containers, orchestration (Docker, Kubernetes), and Immutable infrastructure
  • Familiarity with Infrastructure-as-Code principals and technologies like Terraform or CloudFormation
  • Ability to learn quickly, think critically and make snap judgements based on measured data in high pressure situations
  • Strong communicator and have the ability to guide teams to troubleshoot and tune production performance issues
  • Comfortable in writing tools, in Go or willing to learn, for day-to-day operational use
  • Working knowledge of industry best practices with regards to information security

Responsibilities

  • Provide or develop the tooling that will allow the individual Product Teams to be autonomous, via shared Kubernetes platform, Codefresh CI/CD and self-services infra resources via Atlantis/Terraform
  • Participate in a 24/7 on-call rotation that supports our production Kubernetes platform running in AWS
  • Work to constantly improve our resiliency by developing self-healing, self-assembling infrastructure; proactively running load tests and Chaos Engineering experiments
  • Dive into problems with an eye to both immediate remediation as well as the follow-through changes and automation that will prevent future occurrences
  • Maintain day-to-day vigilance with regards to security while helping to enhance the intrinsic security of the overall production system
  • Own and ensure that internal and external SLA’s meet and exceed expectations, System centric KPIs are continuously monitored and improved
  • Provide consultation and support for Product Teams in achieving their OKRs: Availability and Service Excellence
  • Handle day-to-day duties: on-boarding, off-boarding, manage resource access permissions and maintain the shared tooling like CI/CD, inc. artifact repositories
  • Review architecture across teams; ensuring best practices are propagated company wide

Preferred Qualifications

  • Have prior experience working in high performance and highly available distributed systems
  • Are able to knowledgeably implement performance, and security in complex multi-teams scenarios
  • Are familiar with microservices architectures and able to understand the trade-offs
  • Have practical knowledge of event streaming and experience in designing systems to leverage SQS, Kafka, Kinesis correctly
  • Have good knowledge about Hashicorp stack; especially Vault

Benefits

  • Competitive salary packages
  • Comprehensive medical, dental, vision and life insurance benefits
  • Wellness perks
  • Paid parental leave
  • Generous Paid Time Off
  • Learning and Development resources
  • Flexible working hours
This job is filled or no longer available