Observability Engineer

ASCENDING
Summary
Join our client's Enterprise Monitoring team as a Senior Observability Engineer and contribute to maintaining the reliability, scalability, and availability of our log management and metrics/observability platforms. This fully remote, long-term contract position (2+ years) requires a US Citizen. You will be responsible for maintaining performance KPIs, defining SLOs, and deploying monitoring and alerting systems. The role involves designing, configuring, and maintaining large-scale log aggregation solutions, setting up ingestion pipelines, and automating tasks. You will leverage tools like Elk, Dynatrace, Prometheus, OTEL, and Grafana, and possess expertise in scripting languages such as Python and Bash/Powershell. This position demands a strong background in monitoring tools and data pipeline design.
Requirements
- BS/MS in CS/engineering or equivalent, OR 5+ years of experience
- 3+ years of experience working directly with monitoring tools as either an Admin, SME or as an Architect, preferably with Dynatrace and/or ELK
- Hands-on experience with designing data pipelines using filebeat, Logstash and/or fluentbit/fluentd
- Expert level with Either Dynatrace (managed, cloud as well as offline, with full scope of best practices and setup as it relates to Active gate, cloud, on-prem and custom with workflows), or with Elastic on-prem and cloud with best practices around the platform
- Fluent in writing scripts in languages like Python and (Bash or powershell) to automate tasks
- Experience in Terraform and Ansible. Syntax, best practices, and managing complex configurations in multi commercial and Gov clouds to build and manage infra and applications
- Very good working knowledge with Linux OS
- Highly self-motivated and directed
- Good analytical and problem-solving/troubleshooting abilities
Responsibilities
- Maintain and deploy monitoring and alerting
- Design, configuration and maintenance of log aggregation solution at a large scale
- Set up and manage ingestion pipelines and data transformations
- Have the mindset of โautomate any taskโ
- Build and maintain robust monitoring systems using tools like Elk, Dynatrace, Prometheus, OTEL and Grafana to detect potential issues early and trigger alerts for timely response
- Maintain associated documentation as it applies to our audit and certification requirements
- Participate in troubleshooting, capacity planning, and performance analysis activities
- Research new monitoring requirements and in many cases write code for that
- Strong expertise in setting up monitoring policies/rules/templates; and writing scripts to accomplish monitoring requirements
Preferred Qualifications
- Knowledge of SNMP, TCP dump and tracing
- Knowledge of AIOPS platform
- Other scripting experience (JavaScript, Java, PowerShell, or others)
Share this job:
Similar Remote Jobs
