Summary
Join Kustomer's Foundation team as a Site Reliability Engineer and build systems used company-wide. You'll maintain cloud infrastructure security, plan capacity, manage software lifecycles, optimize CI/CD, and enhance developer productivity. This role requires extensive experience in large-scale web application management and AWS infrastructure. You will collaborate with cross-functional teams, lead system migrations, and implement security best practices. Kustomer offers competitive salaries, stock options, and comprehensive benefits, including healthcare coverage and generous vacation policies.
Requirements
- Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience
- 8+ years experience building and managing large scale, highly available, distributed web applications
- A working understanding of a high-level programming language like Go, Python, JavaScript, Bash, etc
- Strong AWS experience managing infrastructure in a secure, highly available, automated fashion (VPC, ELB, Containers, Auto Scaling)
- Strong background in Linux/Unix, networking, HTTP/2, DNS, REST, etc
- Experience with managing large databases and Lucene-based search systems such as Elasticsearch
- Experience with infrastructure as code and managing Terraform configurations in a sustainable and scalable way
- Experience with observability tools (ELK/Prometheus/Coralogix/distributed tracing)
Responsibilities
- Analyze, design, develop, maintain and improve infrastructure to expand its automation capabilities
- Automate the deployment of testing, staging, and production environments
- Improve the efficiency of development testing
- Measure, report and drive improvements on scalability, performance, and availability
- Support the cloud developer environments and its iterative improvements
- Lead, plan, and execute large scale system migrations
- Participate in cross-team initiatives to drive engineering best-practices
- Conduct code, architecture, and infrastructure reviews across the platform
- Provide education and support to the engineering team in systems architecture design
- Staying involved in initiatives around on-call rotations, application performance monitoring, and continuous integration and delivery pipelines
- Lead various scalability initiatives across the platform and infrastructure
- Implement and enforce change management best practices
- Collaborate with the InfoSec team to drive compliance, observability and automation for the security of our platform
- Work closely with the Security team to optimize infrastructure in order to satisfy compliance requirements
- Manage secrets and automated key rotations
- Manage security vulnerabilities and upgrade schedules for EOL (End of Life) software
- Manage CDN, firewall rules, and other tools to mitigate attacks and threats
Preferred Qualifications
- You have Github activity showing thoughtful, relevant contributions
- You have a working knowledge of writing code and scripts in more than one language
- You have experience developing internal tools for others
- You have experience creating SLAs, SLOs, SLIs
Benefits
- Competitive salaries and stock options
- 100% healthcare coverage
- 401K
- WiFi and Mobile reimbursement
- A generous vacation policy
- Pension
- Supplemental health insurance
- Other perks