
Lead Platform Engineer

TetraScience
Summary
Join TetraScience as a Lead Platform Engineer and play a critical role in evolving and scaling our cloud-native Tetra data platform. You will partner with engineering, data, and AI teams to design scalable architectures, anticipate and mitigate scaling challenges, and ensure platform performance, reliability, and cost-efficiency. This impactful role requires expertise in distributed systems, large-scale design, and turning scalability goals into technical strategies. You will architect and evolve our cloud-native platform infrastructure, design scalable distributed systems, analyze platform performance and scalability, and collaborate with engineering and product teams. The position also involves building and maintaining infrastructure-as-code, enhancing observability and monitoring, and championing best practices in distributed systems design. This is a challenging role for an engineer passionate about solving complex scalability problems.
Requirements
- 10+ years of hands-on experience in software and infrastructure engineering, with a proven track record of designing, building, and scaling distributed, cloud-native systems in production environments
- Demonstrated experience as a technical leader or architect, making key decisions on system design, scalability, performance, and cost optimization
- Strong proficiency in API-first design, including REST, GraphQL, and OpenAPI specifications designing APIs that are scalable, secure, versioned, and extensible
- Strong proficiency in TypeScript and Python, with a focus on building highly performant backend services
- Expertise in AWS cloud services and architecture, including deep experience with core services (e.g., EC2, Lambda, ECS/EKS, IAM, S3) and advanced data and messaging tools such as SQS, Kinesis, Kafka, and EventBridge
- Expert knowledge of infrastructure-as-code frameworks such as CloudFormation and CDK, CI/CD pipelines, and strong opinions on production deployment strategy across dozens of platforms
- Solid understanding of observability best practices, including monitoring, alerting, and distributed tracing for SLI/SLO/SLA design
- Ability to articulate ideas clearly, present findings persuasively, and build rapport with clients and team members
- Strong collaboration skills and the ability to partner effectively with cross-functional teams
Responsibilities
- Architect and evolve our cloud-native platform infrastructure to support high-throughput, low-latency data processing patterns, customer-facing features, and design platform to meet scalability requirements
- Design scalable, distributed systems powering complex capabilities such as authentication & authorization, data lifecycle management, search infrastructure, operational intelligence, and real-time event processing
- Proactively analyze platform performance and scalability; identify potential constraints and define strategies that enable both continuous and step-function growth
- Collaborate with engineering and product teams to deliver infrastructure that supports new services, customer-facing applications, and high-volume data processing workloads
- Build and maintain infrastructure-as-code (e.g., CloudFormation, AWS CDK) to automate, standardize, and secure deployments to support online upgrades and on-demand infrastructure allocation
- Enhance observability and monitoring to ensure reliability, cost efficiency, and rapid incident response
- Champion best practices in distributed systems design, scalability, and performance optimization, and share architectural insights through design reviews and technical documentation
Benefits
- 100% employer-paid benefits for all eligible employees and immediate family members
- Unlimited paid time off (PTO)
- 401K
- Flexible working arrangements - Remote work
- Company paid Life Insurance, LTD/STD
- A culture of continuous improvement where you can grow your career and get coaching
Share this job:
Similar Remote Jobs



