Principal Data Platform Engineer

Stitch Fix
Summary
Join Stitch Fix's Data Platform team as a Data Platform Engineer to evolve the company's modern data stack and support the next wave of AI/ML-powered products. You will play a key role in making the data platform more scalable, reliable, and easier to navigate. Responsibilities include evolving the core data stack (Spark, Trino, Iceberg, Kafka, Flink), defining and implementing observability standards, supporting critical data pipelines, and unifying workflows from data ingestion to model serving. You will also design and build foundational AI data capabilities and work with cross-functional teams to create well-documented, AI-friendly datasets. This is a high-impact individual contributor role requiring significant experience in cloud-scale data infrastructure and AI-focused systems.
Requirements
- You have 8+ years of experience building cloud-scale data infrastructure or ML platforms
- You've contributed to AI-focused systems such as LLM APIs, RAG/GraphRAG, vector search, or knowledge graphs
- You've worked with vector and graph databases to support intelligent querying and semantic discovery
- You're hands-on with Spark, SQL, and Python and/or Scala with strong experience building scalable APIs and services
- You understand streaming systems such as Kafka or Flink and how to design for real-time insights
- You have experience with orchestration systems, CI/CD, and production monitoring
- You're interested in defining data product standards, governance-by-design, and AI observability metrics
- You thrive in cross-functional, collaborative environments and navigate evolving priorities effectively
- You're detail-oriented, curious, and passionate about building infrastructure people love to use
Responsibilities
- Evolve our core data stack (Spark, Trino, Iceberg, Kafka, Flink) to meet the scale and latency demands of AI workloads
- Define and implement observability, lineage, and access standards for our data products and AI applications
- Support critical data pipelines while creating abstractions that simplify end-user interactions
- Unify workflows from data ingestion to model serving, ensuring a shared foundation for feature stores, ML observability, and semantic modeling
- Design and build foundational AI data capabilities supporting LLM orchestration, retrieval-augmented generation (RAG/GraphRAG), semantic data layers, and vectorized search
- Work with teams to build well-documented, AI-friendly datasets that power analytics, personalization, and better business decisions
Benefits
- We offer comprehensive compensation packages and inclusive health and wellness benefits
- This role will receive a competitive salary, benefits, and equity
- The position is eligible for medical, dental, vision, and other benefits
- This position is eligible for new hire and ongoing grants of restricted stock units depending on employee and company performance