Master Thesis in Visual Representation of Technical Diagrams for LLMs

Bosch Logo

Bosch

๐Ÿ“Remote - Germany

Summary

Join Bosch and shape the future by contributing to a master's thesis focused on processing and representing technical diagrams for integration with Large Language Models (LLMs). You will research optimal methods for diagram comprehension, design a pipeline for creating a synthetic dataset linking diagrams to textual descriptions, and rigorously test the trained model on real-world diagrams. The project involves comparative analysis of state-of-the-art multi-modal models and investigating the potential of an abstract markup syntax for technical diagrams. The thesis is a 6-month position, with flexible work options available in Germany. Enrollment at a university is required.

Requirements

  • Master studies in Computer Science, Data Science or comparable
  • Proficiency in Python
  • A self-driven, proactive and solution-oriented individual with an independent work ethic
  • Strong passion for developing the best possible solutions
  • Fluent in English
  • Enrollment at university

Responsibilities

  • Research optimal methods for processing and representing technical diagrams available solely as graphics for integration with Large Language Models (LLMs)
  • Explore the challenges and limitations of current multi-modal LLMs when handling visual data and propose novel approaches for diagram comprehension
  • Design and implement a pipeline for creating a synthetic dataset that links technical diagrams to textual descriptions in both directions (diagram-to-text and text-to-diagram)
  • Investigate how variations in synthetic dataset quality and structure influence model training and performance
  • Conduct rigorous testing of the trained model on real-world technical diagrams from diverse domains, such as engineering, biology or software development
  • Develop metrics to assess the model's ability to interpret, explain, and generate meaningful outputs based on diagram inputs
  • Perform a comparative analysis of state-of-the-art multi-modal models to determine their strengths and limitations in handling technical diagrams as well as identify key factors (e.g., model architecture, training data) that influence performance on diagram-related tasks
  • Investigate the potential of an abstract markup syntax to serve as a universal intermediate representation for technical diagrams as well as evaluate how such a syntax could improve model interpretability, training efficiency, and generalization across domains
  • Prototype and test possible syntaxes, assessing their practicality and alignment with existing diagramming standards

Preferred Qualifications

Some prior experience in building RAG systems or chatbots

Benefits

  • Flexible work from home in Germany
  • Work at the Bosch location in Abstatt

Share this job:

Disclaimer: Please check that the job is real before you apply. Applying might take you to another website that we don't own. Please be aware that any actions taken during the application process are solely your responsibility, and we bear no responsibility for any outcomes.