Senior DevOps Engineer
Thrive Market
💵 $150k-$190k
📍Remote - United States
Please let Thrive Market know you found this job on JobsCollider. Thanks! 🙏
Job highlights
Summary
Join Thrive Market's Platform team as a Senior DevOps Engineer and drive our cloud infrastructure and continuous delivery of applications/services. You will architect and administer AWS cloud infrastructure, automate infrastructure provisioning, and ensure system reliability and performance. The ideal candidate possesses extensive experience in software development, SRE, cloud infrastructure, and service operations. You will leverage your expertise in CI/CD pipelines, automation, security, and incident management to enhance our platform. Thrive Market offers a competitive salary, comprehensive health benefits, flexible PTO, and a supportive work environment.
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience)
- 8+ years of experience in DevOps, SRE, system administration, or software development
- Extensive experience building and maintaining complex cloud infrastructure on AWS
- Experience in applying SRE/DevOps/Platform Engineering best practices to enhance system reliability, scalability, and operational efficiency continually. This includes leveraging automated tooling, implementing robust observability frameworks, and proactively reducing technical debt
- Extensive experience in managing and administering AWS Aurora databases (Postgres and MYSQL) and expertise in troubleshooting complex database issues. Extensive experience troubleshooting production problems and leading multiple teams to resolve large-scale production issues
- Deep knowledge and experience with Docker and Kubernetes
- Experience using and maintaining IaC using Terraform
- Extensive experience in developing applications using at least one of the following programming languages: Python, Bash, or Go
- Strong experience in managing Linux-based infrastructure, preferably Debian/Ubuntu
- Proficiency with modern continuous integration and deployment tools such as Tekton and GitHub Actions
- Knowledge of A/B, in-place, rolling, and phased deployment methodologies
- Understanding of monitoring and systems tools likeLogstash, Grafana, Prometheus, New Relic, etc
- Good understanding of networking fundamentals and protocols
- Good critical thinking and problem-solving skills
- Sense of ownership and pride in your performance and its impact on the organization's success
- Effective interpersonal, collaboration, and communication skills - we work collaboratively and want to bring in folks who want to work together across teams toward a common goal
- Ability to independently execute projects promptly
Responsibilities
- Implement and manage continuous integration/continuous deployment (CI/CD) pipelines for the company
- Identify and remove bottlenecks in the software delivery process
- Promote best practices for software deployment
- Architect, improve and administer AWS cloud infrastructure and services
- Leverage AWS-managed services wherever possible
- Automate the provisioning, configuration, and monitoring of infrastructure, using Infrastructure as Code techniques with Terraform, Docker, and Kubernetes
- Manage and optimize large-scale, cloud-hosted MySQL, Redis, and Elasticsearch DBs
- Ensure system reliability, availability, and performance
- Lead efforts for disaster recovery, capacity expansion, and system upgrades
- Write and support scripts and automation using Python and bash
- Enable DevOps engineers and application developers to streamline processes and automate their work whenever possible
- Develop and maintain automation for infrastructure provisioning, configuration management, and deployment
- Automate all aspects of the software lifecycle
- Conduct security audits, vulnerability assessments, and system hardening initiatives, including maintaining PCI and SOX compliance
- Ensure that systems and processes adhere to industry best practices for security and compliance
- Implement and manage monitoring tools to ensure system health and performance
- Lead incident response efforts and post-incident reviews to learn from failures and to mitigate and prevent future occurrences
- Manage JIRA ticket creation, grooming, ticket/epic management, and documentation and keep it up to date
Preferred Qualifications
- Experience in a startup environment and larger, scaled organizations is a plus!
- AWS Certified - AWS Solutions Architect or similar is a plus
- Proficiency with Atlassian Jira and Confluence for project management and documentation is a plus
- Able to define and enforce best practices across a large organization
Benefits
- Comprehensive health benefits (medical, dental, vision, life and disability)
- Competitive salary (DOE) + equity
- 401k plan
- 9 Days of Observed Holidays
- Flexible Paid Time Off
- Subsidized ClassPass Membership with access to fitness classes and wellness and beauty experiences
- Ability to work in our beautiful co-working space at WeWork in Playa Vista and other locations
- Free Thrive Market membership with exclusive employee discount
- Coverage for Life Coaching & Therapy Sessions on our holistic mental health and well-being platform
Share this job:
Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.
Similar Remote Jobs
- 📍South Africa
- 📍Worldwide
- 📍Poland
- 📍Germany
- 📍India
- 📍Worldwide
- 📍Turkey
- 📍Portugal
- 📍Slovakia, Czechia
Please let Thrive Market know you found this job on JobsCollider. Thanks! 🙏