Data Engineer II

closed
Scribd Logo

Scribd

πŸ’΅ $94k-$196k
πŸ“Remote - United States, Canada

Summary

Join Scribd's ML Data Engineering team as a Software Engineer II and contribute to building and optimizing large-scale data processing systems. You will design, develop, and maintain data pipelines for metadata extraction and enrichment, working with diverse datasets and collaborating with cross-functional teams. The role requires proficiency in data engineering, software development, and scalable systems, along with experience with technologies like Python, Spark, and AWS. Scribd offers a competitive compensation package, including a comprehensive benefits package and flexible work arrangements through Scribd Flex, requiring occasional in-person attendance. The company values a culture of GRIT, emphasizing goal setting, results achievement, innovation, and teamwork. Scribd is committed to equal employment opportunity and encourages applications from diverse backgrounds.

Requirements

  • 5+ years of experience as a professional software engineer (post qualifications)
  • Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar
  • Hands-on experience with real-time data processing
  • Experience in frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing
  • Experience working with systems at scale
  • Experience maintaining and deploying infrastructure on public cloud provider (AWS, Azure, or Google Cloud)
  • Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas
  • Proven ability to test and optimize systems for performance and scalability
  • Bachelor’s in CS or equivalent professional experience

Responsibilities

  • Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types
  • Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions
  • Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images
  • Optimize and refactor existing systems for performance, scalability, and reliability
  • Ensure data accuracy, integrity, and quality through automated validation and monitoring
  • Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase
  • Manage and maintain data pipelines, security and infrastructure

Preferred Qualifications

Bonus points if you have experience working with Machine Learning systems

Benefits

  • Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
  • 12 weeks paid parental leave
  • Short-term/long-term disability plans
  • 401k/RSP matching
  • Onboarding stipend for home office peripherals + accessories
  • Tuition Reimbursement
  • Learning & Development programs
  • Quarterly stipend for Wellness, Connectivity & Comfort
  • Mental Health support & resources
  • Free subscription to Scribd + gift memberships for friends & family
  • Referral Bonuses
  • Book Benefit
  • Sabbaticals
  • Company wide events
  • Team engagement budgets
  • Vacation & Personal Days
  • Paid Holidays (+ winter break)
  • Flexible Sick Time
  • Volunteer Day
  • Company-wide Employee Resource Groups and programs that foster an inclusive and diverse workplace
This job is filled or no longer available

Similar Remote Jobs