Data Engineer II
closed
Scribd
Summary
Join Scribd's ML Data Engineering team as a Software Engineer II and contribute to building and optimizing large-scale data processing systems. You will design, develop, and maintain data pipelines for metadata extraction and enrichment, working with diverse datasets and collaborating with cross-functional teams. The role requires proficiency in data engineering, software development, and scalable systems, along with experience with technologies like Python, Spark, and AWS. Scribd offers a competitive compensation package, including a comprehensive benefits package and flexible work arrangements through Scribd Flex, requiring occasional in-person attendance. The company values a culture of GRIT, emphasizing goal setting, results achievement, innovation, and teamwork. Scribd is committed to equal employment opportunity and encourages applications from diverse backgrounds.
Requirements
- 5+ years of experience as a professional software engineer (post qualifications)
 - Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar
 - Hands-on experience with real-time data processing
 - Experience in frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing
 - Experience working with systems at scale
 - Experience maintaining and deploying infrastructure on public cloud provider (AWS, Azure, or Google Cloud)
 - Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas
 - Proven ability to test and optimize systems for performance and scalability
 - Bachelorβs in CS or equivalent professional experience
 
Responsibilities
- Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types
 - Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions
 - Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images
 - Optimize and refactor existing systems for performance, scalability, and reliability
 - Ensure data accuracy, integrity, and quality through automated validation and monitoring
 - Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase
 - Manage and maintain data pipelines, security and infrastructure
 
Preferred Qualifications
Bonus points if you have experience working with Machine Learning systems
Benefits
- Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
 - 12 weeks paid parental leave
 - Short-term/long-term disability plans
 - 401k/RSP matching
 - Onboarding stipend for home office peripherals + accessories
 - Tuition Reimbursement
 - Learning & Development programs
 - Quarterly stipend for Wellness, Connectivity & Comfort
 - Mental Health support & resources
 - Free subscription to Scribd + gift memberships for friends & family
 - Referral Bonuses
 - Book Benefit
 - Sabbaticals
 - Company wide events
 - Team engagement budgets
 - Vacation & Personal Days
 - Paid Holidays (+ winter break)
 - Flexible Sick Time
 - Volunteer Day
 - Company-wide Employee Resource Groups and programs that foster an inclusive and diverse workplace
 
Similar Remote Jobs









