Senior SRE

Nas Company Logo

Nas Company

πŸ“Remote - Argentina

Summary

Join Nas Company, a media and tech company focused on connecting people online and offline, as a Senior Site Reliability Engineer (SRE) and lead their reliability, observability, and infrastructure initiatives. This remote role, based in the Latin America timezone, requires you to be the primary on-call engineer during Asia-based off-hours (approximately 8:00 PM – 9:00 AM GMT+8) to ensure platform stability and performance. As a seasoned, autonomous engineer, you will work mostly asynchronously, collaborating with the team via documentation and chat, with one weekly synchronous meeting to align with the broader engineering group. You will report directly to the Head of Engineering (Edwin Candinegara) and will have minimal working-hour overlap with the core engineering team in Singapore/India, emphasizing the importance of strong communication and independent decision-making.

Requirements

  • 4+ years in a Site Reliability Engineer, DevOps, or similar role, with a track record of maintaining and scaling web infrastructure
  • Proficiency with monitoring and observability tools such as Prometheus, Grafana, Datadog, and AWS CloudWatch
  • You know how to instrument applications and set up alerts that catch issues early
  • Strong hands-on experience with Amazon Web Services (AWS) and managing cloud resources
  • Familiarity with MongoDB Atlas (managed MongoDB) and deployment platforms like Vercel
  • Comfortable automating infrastructure (Infrastructure as Code, CI/CD pipelines) and managing deployments
  • Exposure to modern web development stacks
  • Our environment includes Node.js/Python backends, Next.js frontends, Redis caching, and a Flutter mobile app
  • Excellent grasp of CI/CD concepts and tools
  • Experience implementing build pipelines, continuous integration, and automated deployments
  • Knowledge of Docker, container orchestration, and version control workflows
  • Strong analytical and problem-solving skills
  • Able to debug complex issues across distributed systems and find root causes
  • Outstanding communication skills in a remote, asynchronous setting
  • You can document your work and decisions clearly
  • Highly self-driven and able to make sound decisions independently, especially during the hours when other team members are offline

Responsibilities

  • Independently monitor, maintain, and improve our AWS infrastructure and deployment pipelines during off-hours to ensure smooth operations even when others are offline
  • Ensure high availability, reliability, and uptime of all platform services (web, backend, and mobile) by proactively managing system health and responding to incidents swiftly
  • Implement robust observability solutions – set up monitoring dashboards, logging, and real-time alerting across all systems (web applications, backend services, mobile API) using tools like Prometheus, Grafana, Datadog, AWS CloudWatch, etc
  • Continuously monitor AWS and related infrastructure performance
  • Optimize resource usage and configurations for improved performance and cost efficiency (e.g., right-sizing instances, caching improvements, query optimization)
  • Work closely with product and engineering teams in an asynchronous manner
  • Document your insights, decisions, and progress clearly so team members in other timezones can follow along and contribute
  • Proactively identify and resolve production issues
  • Act as the first responder to any system incidents during your shift, performing root cause analysis and restoring service
  • Communicate incidents and fixes to the team, and update runbooks for future reference
  • Develop and maintain internal SRE documentation, runbooks, and playbooks
  • Ensure that troubleshooting guides, deployment processes, and escalation protocols are well-documented and easy to follow for the entire engineering team

Preferred Qualifications

  • Direct coding in these is not mandatory, but understanding how these components work is a plus
  • Experience with incident response and post-mortem analysis is highly valued

Benefits

  • We care about your health
  • Our comprehensive medical insurance plans are tailored to each region, ensuring you have the coverage you need
  • Your well-being matters
  • We provide a monthly fund of $100 to spend on activities that bring you joy and promote your self-care
  • We care about your mental health too!
  • We provide a monthly fund of $150 to spend on therapy or career coaching
  • You’ll be entitled to paid time off based on your region, in line with the company policy
  • Unwind, bond, and collaborate at our annual company retreats
  • A time for the entire company to come together to rejuvenate and bond
  • As part of our team, you’ll have the opportunity to potentially own a piece of the company through stock options or profit sharing, depending on company discretion, aligning your success with ours

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs