Senior Site Reliability Engineer

SWORD Health
Summary
Join Sword Health, a Forbes Best Startup Employer, and become a Site Reliability Engineer (SRE). You will play a crucial role in maintaining the health and uptime of our services, collaborating with development teams to build and operate scalable and resilient systems. Responsibilities include monitoring and incident management, automation and tooling, performance optimization, security and compliance, documentation, and database management. This position requires proficiency in programming languages, cloud platforms, Linux/Unix systems, and various tools. Sword Health offers a stimulating environment, career development, competitive salary, flexible hours, unlimited vacation, access to a health and well-being program, and remote or hybrid work options (Portugal only). The position requires a valid EU visa and is based in Portugal; relocation assistance is not provided.
Requirements
- Proficiency in programming languages such as Python, Go, Javascript
- 5+ years of experience with cloud platforms such as AWS, Google Cloud, or Azure
- Strong understanding of Linux/Unix systems and networking
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)
- Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack)
- Knowledge of CI/CD pipelines and tools (e.g., Jenkins, GitLab CI)
- Proficiency with relational and NoSQL databases (e.g., MySQL, PostgreSQL, Redis, Elasticsearch)
- Willingness to collaborate and share knowledge with colleagues to drive collective success
- Taking responsibility for your work and demonstrating accountability for outcomes
Responsibilities
- Develop and maintain monitoring and alerting solutions
- Respond to incidents, troubleshoot issues, and perform root cause analysis
- Automate repetitive tasks and improve deployment processes
- Develop and maintain tools to support infrastructure and applications
- Analyze system performance and implement optimizations to improve efficiency and reduce latency
- Ensure systems are secure and compliant with relevant standards and regulations
- Maintain comprehensive documentation of systems and processes
- Share knowledge and best practices with team members
- Ensure the reliability, performance, and scalability of databases
- Perform database optimization, maintenance, and troubleshooting
Preferred Qualifications
- A passion for exploring new technologies and methodologies to improve reliability and performance
- Ability to anticipate potential issues and implement preventive measures
- A dedication to learning and growing in your role, staying updated with industry trends and best practices
Benefits
- A stimulating, fast-paced environment with lots of room for creativity
- A bright future at a promising high-tech startup company
- Career development and growth, with a competitive salary
- The opportunity to work with a talented team and to add real value to an innovative solution with the potential to change the future of healthcare
- A flexible environment where you can control your hours (remotely) with unlimited vacation
- Access to our health and well-being program (digital therapist sessions)
- Remote or Hybrid work policy (Portugal only)
- Comprehensive health, dental and vision insurance
- Life and AD&D Insurance
- Financial advisory services
- Supplemental Insurance Benefits (Accident, Hospital and Critical Illness)
- Health Savings Account
- Equity shares
- Discretionary PTO plan
- Parental leave
- 401(k)
- Flexible working hours
- Remote-first company
- Paid company holidays
- Free digital therapist for you and your family