
Data Engineer II

Scribd
Summary
Join Scribd's ML Data Engineering team as a Software Engineer II and contribute to building and optimizing large-scale data processing systems. You will design, develop, and maintain data pipelines for metadata extraction and enrichment, working with diverse datasets and collaborating with cross-functional teams. The role requires proficiency in data engineering, software development, and scalable systems, along with experience with technologies like Python, Spark, and AWS. Scribd offers a competitive compensation package, including a comprehensive benefits package and flexible work arrangements through Scribd Flex, requiring occasional in-person attendance. The company values a culture of GRIT, emphasizing goal setting, results achievement, innovation, and teamwork. Scribd is committed to equal employment opportunity and encourages applications from diverse backgrounds.
Requirements
- 5+ years of experience as a professional software engineer (post qualifications)
- Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar
- Hands-on experience with real-time data processing
- Experience in frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing
- Experience working with systems at scale
- Experience maintaining and deploying infrastructure on public cloud provider (AWS, Azure, or Google Cloud)
- Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas
- Proven ability to test and optimize systems for performance and scalability
- Bachelorβs in CS or equivalent professional experience
Responsibilities
- Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types
- Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions
- Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images
- Optimize and refactor existing systems for performance, scalability, and reliability
- Ensure data accuracy, integrity, and quality through automated validation and monitoring
- Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase
- Manage and maintain data pipelines, security and infrastructure
Preferred Qualifications
Bonus points if you have experience working with Machine Learning systems
Benefits
- Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
- 12 weeks paid parental leave
- Short-term/long-term disability plans
- 401k/RSP matching
- Onboarding stipend for home office peripherals + accessories
- Tuition Reimbursement
- Learning & Development programs
- Quarterly stipend for Wellness, Connectivity & Comfort
- Mental Health support & resources
- Free subscription to Scribd + gift memberships for friends & family
- Referral Bonuses
- Book Benefit
- Sabbaticals
- Company wide events
- Team engagement budgets
- Vacation & Personal Days
- Paid Holidays (+ winter break)
- Flexible Sick Time
- Volunteer Day
- Company-wide Employee Resource Groups and programs that foster an inclusive and diverse workplace
Share this job:
Similar Remote Jobs
