Site Reliability Engineer - III
closed
Rackspace Technology
Summary
Join Rackspace's Professional Services Center of Excellence and contribute to building next-generation applications for our customers. You will work with customers to implement observability solutions using tools like Datadog, New Relic, AppDynamics, or Dynatrace. Responsibilities include building and maintaining scalable systems, developing monitoring tools, and collaborating with development teams. You will proactively analyze data to perform anomaly detection and capacity planning. This role requires a Bachelor's degree in engineering/computer science or equivalent and senior-level experience in Site Reliability Engineering and DevOps. Rackspace offers a remote work environment and is committed to equal employment opportunity.
Requirements
- Bachelorβs degree in engineering/computer science or equivalent
- Senior-level experience with Site Reliability Engineering, DevOps, Code level application support and troubleshooting, AWS Infrastructure design, implementation and optimization, Automation for deployment, scaling and reliability
- Experience with observability solutions tools like Splunk, Datadog, SignalFx, etc
- Experience deploying, maintaining and supporting software applications/services in the AWS ecosystem
- Proactive approach to identifying problems and solutions
- Experience writing code with one or more interpreted languages such as Python, PHP, Perl, Ruby,Linux Shell
- Experience with Terraform or Cloud Formation scripting
- Experience with configuration management tools like Ansible, Chef or Puppet
- Experience with standard software development best practices and tools such as code repositories (Git preferred)
- Experience executing in an agile software development environment
- Good understanding of pricing/cost models across AWS services, especially compute, storage, and database offerings
- A clear understanding of network & system Management solutions
- Excellent organizational and project management skills
- Excellent communication, critical thinking & analytical skills
Responsibilities
- Work with customers and implement Observability solutions
- Build and maintain scalable systems and robust automation that supports engineering goals
- Develop and maintain monitoring tools, alerts, and dashboards to provide visibility into system health and performance
- Proactively gather and analyze both metric and log data from systems and applications to perform anomaly detection, performance tuning, capacity planning and fault isolation
- Collaborate with development teams to implement and deploy new features and enhancements, ensuring they meet reliability, security and performance standards
- Collaborate with team members to document and share solutions
- Maintain a deep understanding of the customerβs business as well as their technical environment
- Identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues
Benefits
Remote work, flexible hours
Similar Remote Jobs







