Senior DevOps Engineer

Natera Logo

Natera

πŸ’΅ $146k-$183k
πŸ“Remote - United States

Summary

Join Natera as a Senior DevOps Engineer focused on Observability, where you will establish observability standards, lead automation efforts, and mentor engineers. You will design and manage a code-driven Datadog observability platform, ensuring end-to-end visibility into applications and infrastructure. This role demands expertise in Datadog and cost-effective observability at scale. You will collaborate with various teams to standardize monitoring and logging practices. This hands-on position requires managing all configurations through Terraform, APIs, and CI/CD workflows. Mentorship and internal training program development are also key responsibilities.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, Mathematics, Physics or a related technical field
  • 5+ years of experience in DevOps, Site Reliability Engineering, or related roles with a strong focus on observability and infrastructure as code
  • Hands-on experience managing and scaling Datadog programmatically using code-based workflows (e.g. Terraform, APIs, CI/CD)
  • Deep expertise in Datadog including APM, logs, metrics, tracing, dashboards and audit trails
  • Proven experience integrating Datadog observability into CI/CD pipelines (e.g. GitLab CI, AWS CodePipeline, GitHub Actions)
  • Solid understanding of AWS services and best practices for monitoring services on Kubernetes infrastructure

Responsibilities

  • Own and define observability standards for Java applications, Kubernetes workloads and cloud infrastructure
  • Configure and manage the Datadog platform using Terraform and Infrastructure-as-Code (IaC) best practices
  • Drive adoption of structured JSON logging, distributed tracing and custom metrics across Java and Python services
  • Optimize Datadog usage through cost governance, log filtering, sampling strategies and automated reporting
  • Collaborate closely with Java developers and platform engineers to standardize instrumentation and alerting
  • Troubleshoot and resolve issues with missing or misconfigured logs, metrics and traces, working with developers to ensure proper instrumentation and data flow into Datadog
  • Lead incident response efforts using Datadog insights for actionable alerting, root cause analysis (RCA) and reliability improvements
  • Serve as the primary point of contact for Datadog-related requests, supporting internal teams with onboarding, integration and usage questions
  • Continuously audit and tune monitors for alert quality, reducing false positives and improving actionable signal detection
  • Maintain clear internal documentation on Datadog usage, standards, integrations and IaC workflows
  • Evaluate and propose improvements to the observability stack, including new Datadog features, OpenTelemetry adoption and future architecture changes
  • Mentor engineers and develop internal training programs on Datadog, observability-as-code and modern log pipeline architecture

Preferred Qualifications

  • Strong background in Java or Python application development is preferred
  • Familiarity with other observability and monitoring tools (e.g., ELK, Prometheus, Grafana, OpenTelemetry, New Relic, Dynatrace, Splunk, Sysdig) is a plus

Benefits

  • Comprehensive medical, dental, vision, life and disability plans for eligible employees and their dependents
  • Free testing in addition to fertility care benefits
  • Pregnancy and baby bonding leave
  • 401k benefits
  • Commuter benefits
  • A generous employee referral program

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs