Summary
Join Multi Media LLC as a remote Senior Site Reliability Engineer and enhance our infrastructure resilience and optimize system performance. You will analyze complex systems, perform software engineering and patching, and make infrastructure modifications in both data center and cloud environments. Responsibilities include predictive failure analysis, disaster planning, creating automation tools, collaborating with engineering teams, and managing databases. You will also handle incident response and reporting. This role requires a STEM degree, strong problem-solving skills, and proficiency in programming languages and Unix/Linux environments. Experience with DevOps tools and database administration is essential.
Requirements
- STEM degree and relevant experience as a Site Reliability Engineer
- Exceptional problem solving skills
- High proficiency in one of the following: C, C++, Java, Python, Go, etc
- High proficiency in Unix/Linux environment, excellent knowledge of internals (e.g., filesystems, system calls)
- Networking knowledge (e.g., routing, switching, TCP stack) for both metal and cloud (VPC, Security Groups) environments
- Experience in database administration and configuration
- Experience with DevOps tools such as Terraform, Ansible, Docker, Kubernetes
- On call reporting to monitoring and alerting of core website functions as needed
Responsibilities
- Performance analysis to identify sources of instability using data from APM and distributed telemetry data tools
- Analyze complex systems to identify operational surprises and minimize downtime
- Software engineering and patching in to incrementally improve performance, scalability, and reliability
- Infrastructure modifications in both a data center metal environment with advanced routing/switching and in the public cloud
- Predictive failure analysis and disaster planning
- Author new tools and automation to streamline the DevOps pipeline
- Collaborate with other engineering teams
- Database and kv store administration and configuration with a focus on uptime and performance
- Incident response and postmortem reports
Benefits
- Fully remote optional and flexible work schedule
- We share successβour bonus program scales with company performance, offering up to 20-30% in achievable bonuses, with potential for 90%!
- Health, Vision, Dental, and Life Insurance for you and any dependents, with policy premiums covered by the Company
- 401k plan with 5% matching
- Long & Short term disability insurance
- Unlimited PTO
- Annual Year-End Company Closure
- 12 Paid Holidays
- $125/week food and grocery stipend via Sharebite
- Employee wellness programs via Holisticly
- EAP and Employee Recognition Programs
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.