Data Engineer II at Scribd - United States, Canada

Summary

Join Scribd's ML Data Engineering team as a Software Engineer II and contribute to building and optimizing large-scale data processing systems. You will design, develop, and maintain data pipelines for metadata extraction and enrichment, working with diverse datasets and collaborating with cross-functional teams. The role requires proficiency in data engineering, software development, and scalable systems, along with experience with technologies like Python, Spark, and AWS. Scribd offers a competitive compensation package, including a comprehensive benefits package and flexible work arrangements through Scribd Flex, requiring occasional in-person attendance. The company values a culture of GRIT, emphasizing goal setting, results achievement, innovation, and teamwork. Scribd is committed to equal employment opportunity and encourages applications from diverse backgrounds.

Requirements

5+ years of experience as a professional software engineer (post qualifications)
Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar
Hands-on experience with real-time data processing
Experience in frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing
Experience working with systems at scale
Experience maintaining and deploying infrastructure on public cloud provider (AWS, Azure, or Google Cloud)
Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas
Proven ability to test and optimize systems for performance and scalability
Bachelor’s in CS or equivalent professional experience

Responsibilities

Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types
Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions
Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images
Optimize and refactor existing systems for performance, scalability, and reliability
Ensure data accuracy, integrity, and quality through automated validation and monitoring
Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase
Manage and maintain data pipelines, security and infrastructure

Preferred Qualifications

Bonus points if you have experience working with Machine Learning systems

Benefits

Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
12 weeks paid parental leave
Short-term/long-term disability plans
401k/RSP matching
Onboarding stipend for home office peripherals + accessories
Tuition Reimbursement
Learning & Development programs
Quarterly stipend for Wellness, Connectivity & Comfort
Mental Health support & resources
Free subscription to Scribd + gift memberships for friends & family
Referral Bonuses
Book Benefit
Sabbaticals
Company wide events
Team engagement budgets
Vacation & Personal Days
Paid Holidays (+ winter break)
Flexible Sick Time
Volunteer Day
Company-wide Employee Resource Groups and programs that foster an inclusive and diverse workplace

Data Engineer II

Scribd

Summary

Requirements

Responsibilities

Preferred Qualifications

Benefits

Remote

Data

Mid-level

Similar Remote Jobs

Remote

Data

Mid-level

Remote

Data

Senior

RefinedScience

Remote

Data

Mid-level

Remote

Data

Mid-level

Remote

Data

Senior

Remote

Data

Mid-level

Mediavine

Remote

Data

Mid-level

Remote

Software Development

Mid-level

TrueML

Remote

Data

Senior

Remote

Data

Mid-level