Ml Data Engineer Ii
Scribd
Job highlights
Summary
Join Scribd's ML Data Engineering team as a Software Engineer II and contribute to building and optimizing large-scale data processing systems. You will design, develop, and maintain data pipelines for metadata extraction and enrichment, working with diverse datasets and collaborating with cross-functional teams. The role requires strong data engineering and software development skills, experience with large-scale data processing frameworks, and proficiency in programming languages like Python, Scala, or Ruby. Scribd offers a flexible work environment with Scribd Flex, occasional in-person attendance, and a competitive compensation and benefits package. The company prioritizes a culture of collaboration and empowers employees to take action. This position offers a competitive salary, equity ownership, and a comprehensive benefits package.
Requirements
- 3+ years of experience as a professional software engineer
- Proficient in one or more programming languages, such as Python, Ruby, Scala, or similar
- Hands-on experience with data processing frameworks like Apache Spark, Databricks, or similar tools for large-scale data processing
- Experience working with systems at scale
- Experience working with a public cloud provider (AWS, Azure, or Google Cloud)
- Hands-on experience with building, deploying, and optimizing solutions using ECS, EKS or AWS Lambdas
- Proven ability to test and optimize systems for performance and scalability
- Bachelorβs in CS or equivalent professional experience
Responsibilities
- Design and develop data pipelines to extract, enrich, and process metadata from millions of documents, images, and other content types
- Collaborate with cross-functional teams, including ML engineers and product managers, to deliver scalable, efficient, and reliable metadata solutions
- Build and maintain systems that operate at a massive scale, handling hundreds of millions of documents and billions of images
- Optimize and refactor existing systems for performance, scalability, and reliability
- Ensure data accuracy, integrity, and quality through automated validation and monitoring
- Participate in code reviews, ensuring best practices are followed and maintaining high-quality standards in the codebase
- Manage and maintain data pipelines, security and infrastructure
Preferred Qualifications
Bonus points if you have experience working with Machine Learning systems
Benefits
- Healthcare Insurance Coverage (Medical/Dental/Vision): 100% paid for employees
- 12 weeks paid parental leave
- Short-term/long-term disability plans
- 401k/RSP matching
- Tuition Reimbursement
- Learning & Development programs
- Quarterly stipend for Wellness, Connectivity & Comfort
- Mental Health support & resources
- Free subscription to Scribd + gift memberships for friends & family
- Referral Bonuses
- Book Benefit
- Sabbaticals
- Company wide events
- Team engagement budgets
- Vacation & Personal Days
- Paid Holidays (+ winter break)
- Flexible Sick Time
- Volunteer Day
- Company-wide Diversity, Equity, & Inclusion programs
- Competitive equity ownership
Share this job:
Similar Remote Jobs
- πUnited States
- π°$125k-$175kπCanada
- π°$142k-$210kπUnited States
- πCanada
- π°$146k-$219kπUnited States
- π°$146k-$219kπWorldwide
- πIndia
- πCanada
- π°$148k-$175kπUnited States