Principal Data Engineer, Architect at Scribd

Summary

Join Scribd's data engineering team and lead the design and development of a robust data architecture. You will serve as a data and analytics solution architect, leading initiatives encompassing data warehousing, data pipeline development, and data modeling. Shape Scribd’s data strategy, guiding stakeholders in data consumption and action. Leverage your expertise in architecting, designing, and developing batch and real-time streaming infrastructure and workloads to establish data modeling, integration, processing, and delivery standards. Translate business requirements into technical specifications and collaborate with cross-functional teams. This role offers a significant impact on improving Scribd's core data layer, working with a massive user base across three distinct brands.

Requirements

7+ years of experience in data engineering, with a strong background in data architecture, data modeling, and data management, building and scaling robust data systems for complex business domains
Expertise in Scala or Python, with a deep understanding and hands-on experience in Spark for designing, optimizing, and scaling large-scale data processing pipelines, and proficiency in at least one SQL dialect
Experience with data lake technologies (e.g., Databricks, Delta Lake), data storage formats (Parquet, Avro), query engines (such as Photon, Spark SQL), and both real-time streaming and batch processing, or equivalent technologies and frameworks

Responsibilities

Lead the design and development of a robust data architecture that guides data modeling, integration, processing, and delivery standards enabling modern data product development at Scribd
Serve as a data and analytics solution architect, leading architecture initiatives encompassing data warehousing, data pipeline development, data integrations, and data modeling
Shape Scribd’s data strategy, guiding stakeholders in how they consume and act on data
Establish standards for data modeling, integration, processing, and delivery
Translate business requirements into technical specifications
Work with the Data Science, Analytics, and other Engineering and Business teams to design cohesive data models, database schemas and data storage solutions, consumption strategies and patterns
Increase the "customer satisfaction" for internal customers of Scribd data

Preferred Qualifications

Experience and working knowledge of streaming platforms, typically based around Kafka
Strong grasp of AWS data platform services and their strengths/weaknesses
Hands on experience in implementing data pipelines for data ingestion and transformation to support analytics and ML pipelines
Strong experience communicating asynchronously using collaboration tools like Jira, Slack, etc
Experience using automation and CI/CD tooling like Git, GitHub,Docker,Jenkins, Terraform, etc
Experience developing standards for database design and implementation of various strategic data architecture initiatives around data quality, data management policies/standards, data governance, privacy and metadata management
Working experience integrating with BI frameworks like Qlik, ThoughtSpot, Looker, Tableau, etc

Benefits

Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
12 weeks paid parental leave
Short-term/long-term disability plans
401k/RSP matching
Tuition Reimbursement
Learning & Development programs
Quarterly stipend for Wellness, Connectivity & Comfort
Mental Health support & resources
Free subscription to Scribd + gift memberships for friends & family
Referral Bonuses
Book Benefit
Sabbaticals
Company wide events
Team engagement budgets
Vacation & Personal Days
Paid Holidays (+ winter break)
Flexible Sick Time
Volunteer Day
Company-wide Diversity, Equity, & Inclusion programs

Principal Data Engineer, Architect

Scribd

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Principal

Share this job:

Similar Remote Jobs

Ekimetrics

Remote

Data

Principal

Remote

Data

Principal

Plus Power

Remote

Data

Principal

Remote

DevOps

Principal

Databricks

Remote

Data

Principal

Remote

Data

Principal

Remote

Software Development

Principal

Remote

Data

Principal

Remote

Data

Principal