Summary

Join Nas Company, a media and tech company focused on connecting people online and offline, as a Senior Site Reliability Engineer (SRE) and lead their reliability, observability, and infrastructure initiatives. This remote role, based in the Latin America timezone, requires you to be the primary on-call engineer during Asia-based off-hours (approximately 8:00 PM – 9:00 AM GMT+8) to ensure platform stability and performance. As a seasoned, autonomous engineer, you will work mostly asynchronously, collaborating with the team via documentation and chat, with one weekly synchronous meeting to align with the broader engineering group. You will report directly to the Head of Engineering (Edwin Candinegara) and will have minimal working-hour overlap with the core engineering team in Singapore/India, emphasizing the importance of strong communication and independent decision-making.

Requirements

4+ years in a Site Reliability Engineer, DevOps, or similar role, with a track record of maintaining and scaling web infrastructure
Proficiency with monitoring and observability tools such as Prometheus, Grafana, Datadog, and AWS CloudWatch
You know how to instrument applications and set up alerts that catch issues early
Strong hands-on experience with Amazon Web Services (AWS) and managing cloud resources
Familiarity with MongoDB Atlas (managed MongoDB) and deployment platforms like Vercel
Comfortable automating infrastructure (Infrastructure as Code, CI/CD pipelines) and managing deployments
Exposure to modern web development stacks
Our environment includes Node.js/Python backends, Next.js frontends, Redis caching, and a Flutter mobile app
Excellent grasp of CI/CD concepts and tools
Experience implementing build pipelines, continuous integration, and automated deployments
Knowledge of Docker, container orchestration, and version control workflows
Strong analytical and problem-solving skills
Able to debug complex issues across distributed systems and find root causes
Outstanding communication skills in a remote, asynchronous setting
You can document your work and decisions clearly
Highly self-driven and able to make sound decisions independently, especially during the hours when other team members are offline

Responsibilities

Independently monitor, maintain, and improve our AWS infrastructure and deployment pipelines during off-hours to ensure smooth operations even when others are offline
Ensure high availability, reliability, and uptime of all platform services (web, backend, and mobile) by proactively managing system health and responding to incidents swiftly
Implement robust observability solutions – set up monitoring dashboards, logging, and real-time alerting across all systems (web applications, backend services, mobile API) using tools like Prometheus, Grafana, Datadog, AWS CloudWatch, etc
Continuously monitor AWS and related infrastructure performance
Optimize resource usage and configurations for improved performance and cost efficiency (e.g., right-sizing instances, caching improvements, query optimization)
Work closely with product and engineering teams in an asynchronous manner
Document your insights, decisions, and progress clearly so team members in other timezones can follow along and contribute
Proactively identify and resolve production issues
Act as the first responder to any system incidents during your shift, performing root cause analysis and restoring service
Communicate incidents and fixes to the team, and update runbooks for future reference
Develop and maintain internal SRE documentation, runbooks, and playbooks
Ensure that troubleshooting guides, deployment processes, and escalation protocols are well-documented and easy to follow for the entire engineering team

Preferred Qualifications

Direct coding in these is not mandatory, but understanding how these components work is a plus
Experience with incident response and post-mortem analysis is highly valued

Benefits

We care about your health
Our comprehensive medical insurance plans are tailored to each region, ensuring you have the coverage you need
Your well-being matters
We provide a monthly fund of $100 to spend on activities that bring you joy and promote your self-care
We care about your mental health too!
We provide a monthly fund of $150 to spend on therapy or career coaching
You’ll be entitled to paid time off based on your region, in line with the company policy
Unwind, bond, and collaborate at our annual company retreats
A time for the entire company to come together to rejuvenate and bond
As part of our team, you’ll have the opportunity to potentially own a piece of the company through stock options or profit sharing, depending on company discretion, aligning your success with ours

Senior SRE

Nas Company

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

DevOps

Senior

Share this job:

Similar Remote Jobs

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Assured

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior

Remote

DevOps

Senior