Principal Data Engineer, Architect

Scribd
Summary
Join Scribd's data engineering team and lead the design and development of a robust data architecture. You will serve as a data and analytics solution architect, leading initiatives encompassing data warehousing, data pipeline development, and data modeling. Shape Scribdβs data strategy, guiding stakeholders in data consumption and action. Leverage your expertise in architecting, designing, and developing batch and real-time streaming infrastructure and workloads to establish data modeling, integration, processing, and delivery standards. Translate business requirements into technical specifications and collaborate with cross-functional teams. This role offers a significant impact on improving Scribd's core data layer, working with a massive user base across three distinct brands.
Requirements
- 7+ years of experience in data engineering, with a strong background in data architecture, data modeling, and data management, building and scaling robust data systems for complex business domains
- Expertise in Scala or Python, with a deep understanding and hands-on experience in Spark for designing, optimizing, and scaling large-scale data processing pipelines, and proficiency in at least one SQL dialect
- Experience with data lake technologies (e.g., Databricks, Delta Lake), data storage formats (Parquet, Avro), query engines (such as Photon, Spark SQL), and both real-time streaming and batch processing, or equivalent technologies and frameworks
Responsibilities
- Lead the design and development of a robust data architecture that guides data modeling, integration, processing, and delivery standards enabling modern data product development at Scribd
- Serve as a data and analytics solution architect, leading architecture initiatives encompassing data warehousing, data pipeline development, data integrations, and data modeling
- Shape Scribdβs data strategy, guiding stakeholders in how they consume and act on data
- Establish standards for data modeling, integration, processing, and delivery
- Translate business requirements into technical specifications
- Work with the Data Science, Analytics, and other Engineering and Business teams to design cohesive data models, database schemas and data storage solutions, consumption strategies and patterns
- Increase the "customer satisfaction" for internal customers of Scribd data
Preferred Qualifications
- Experience and working knowledge of streaming platforms, typically based around Kafka
- Strong grasp of AWS data platform services and their strengths/weaknesses
- Hands on experience in implementing data pipelines for data ingestion and transformation to support analytics and ML pipelines
- Strong experience communicating asynchronously using collaboration tools like Jira, Slack, etc
- Experience using automation and CI/CD tooling like Git, GitHub,Docker,Jenkins, Terraform, etc
- Experience developing standards for database design and implementation of various strategic data architecture initiatives around data quality, data management policies/standards, data governance, privacy and metadata management
- Working experience integrating with BI frameworks like Qlik, ThoughtSpot, Looker, Tableau, etc
Benefits
- Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
- 12 weeks paid parental leave
- Short-term/long-term disability plans
- 401k/RSP matching
- Tuition Reimbursement
- Learning & Development programs
- Quarterly stipend for Wellness, Connectivity & Comfort
- Mental Health support & resources
- Free subscription to Scribd + gift memberships for friends & family
- Referral Bonuses
- Book Benefit
- Sabbaticals
- Company wide events
- Team engagement budgets
- Vacation & Personal Days
- Paid Holidays (+ winter break)
- Flexible Sick Time
- Volunteer Day
- Company-wide Diversity, Equity, & Inclusion programs