Ai Data Operations Labeling Engineer

N-Power Medicine Logo

N-Power Medicine

💵 $155k-$183k
📍Remote - Worldwide

Summary

Join N-Power Medicine as an AI Data Operations/Labeling Engineer and play a critical role in building and optimizing our data labeling pipeline. You will ensure the creation of high-quality labeled datasets from diverse healthcare sources. This involves close collaboration with AI scientists, engineers, and subject matter experts to power the development and validation of advanced AI/ML models. This hybrid or remote role is preferably based in the Bay Area but not required. You will design and implement efficient data labeling workflows, manage data labeling platforms, and collaborate with stakeholders to create labeling guidelines. You will also track labeler performance, implement quality control measures, and develop data pipelines for data preprocessing and integration. The role requires strong proficiency in Python and SQL, along with experience in data engineering and AI.

Requirements

  • BS/MSc in the field of engineering, computer science, applied mathematics, physics, statistics, or relevant field of study
  • 5+ years of relevant experience in AI oriented engineering or data science, including proven experience with data labeling platforms and tools
  • Strong proficiency in Python and SQL
  • Strong project management and organizational skills, with an understanding of data workflows
  • Solid understanding of data engineering principles and practices, particularly within distributed data processing environments
  • Experience with scalable data processing frameworks
  • Ability to see problems and solutions from a ‘holistic’ point of view and communicate how specific solutions bring business value
  • Excellent ability to see the big picture while decomposing complex solutions into incremental steps
  • Insight into how to create sustainable, reusable, and properly modular code
  • Excellent written, verbal, interpersonal, and communication skills
  • Generous, Curious and Humble

Responsibilities

  • Design and implement efficient data labeling workflows that integrate with data processing pipelines
  • Write appropriate statistical design and analysis plans in the curation of AI labeling dataset to ensure the translation of labeling effort into improvements in model analytical performance or validation
  • Select and manage data labeling platforms and tools
  • Collaborate with stakeholders, SMEs to craft detailed labeling guidelines
  • Work with TPM to recruit, train, and manage human labelers (internal or external)
  • Track labeler performance and provide feedback
  • Implement robust quality control measures to ensure labeling accuracy and consistency
  • Develop data pipelines for data preprocessing and labeled data integration, utilizing scalable data processing frameworks
  • Manage data storage and versioning for labeled datasets
  • Develop and refine tools to automate labeling tasks and improve efficiency
  • Integrate labeling platforms with other AI/ML tools
  • Create and implement quality assurance procedures for labeling
  • Collaborate closely with AI data scientists and engineers to understand labeling requirements
  • Understand the principles of AI models that explicitly learn from human feedback and assist humans in evaluating AI model output accuracy
  • Applies safeguards and protections in line with HIPAA and applicable privacy laws and adheres to relevant compliance, quality, security, privacy, legal, and ethical standards when it comes to the use of AI

Preferred Qualifications

  • Experience and/or knowledge of the healthcare data domain in at least one of: electronic medical records (EHR/EMR), clinical imaging, clinical workflows; Electronic Data captures (EDC), and/or clinical research/trials
  • Polyglot coder with hands-on development experience across additional modern programming languages (e.g. R, javascript, C++, …)
  • Familiarity of and documented experience with one or more deep learning frameworks (e.g. PyTorch, Tensorflow, MLflow, etc.)
  • Hands-on experience with modern, web-based compute - and storage services (e.g., Databricks, AWS, Google Colab, Microsoft Azure, etc.)
  • Familiarity with the software development life cycle (SDLC)
  • Experience in creating detailed labeling guidelines and quality assurance processes
  • Ability to effectively manage and motivate human labelers

Benefits

  • Equity at hire
  • Discretionary annual bonus
  • Company benefits
  • 401K plan
  • Other great company “perks”
  • Hybrid or remote role

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.

Similar Remote Jobs