Master Thesis in Visual Representation of Technical Diagrams for LLMs

closed
Bosch Logo

Bosch

📍Remote - Germany

Summary

Join Bosch and shape the future by contributing to a master's thesis focused on processing and representing technical diagrams for integration with Large Language Models (LLMs). You will research optimal methods for diagram comprehension, design a pipeline for creating a synthetic dataset linking diagrams to textual descriptions, and rigorously test the trained model on real-world diagrams. The project involves comparative analysis of state-of-the-art multi-modal models and investigating the potential of an abstract markup syntax for technical diagrams. The thesis is a 6-month position, with flexible work options available in Germany. Enrollment at a university is required.

Requirements

  • Master studies in Computer Science, Data Science or comparable
  • Proficiency in Python
  • A self-driven, proactive and solution-oriented individual with an independent work ethic
  • Strong passion for developing the best possible solutions
  • Fluent in English
  • Enrollment at university

Responsibilities

  • Research optimal methods for processing and representing technical diagrams available solely as graphics for integration with Large Language Models (LLMs)
  • Explore the challenges and limitations of current multi-modal LLMs when handling visual data and propose novel approaches for diagram comprehension
  • Design and implement a pipeline for creating a synthetic dataset that links technical diagrams to textual descriptions in both directions (diagram-to-text and text-to-diagram)
  • Investigate how variations in synthetic dataset quality and structure influence model training and performance
  • Conduct rigorous testing of the trained model on real-world technical diagrams from diverse domains, such as engineering, biology or software development
  • Develop metrics to assess the model's ability to interpret, explain, and generate meaningful outputs based on diagram inputs
  • Perform a comparative analysis of state-of-the-art multi-modal models to determine their strengths and limitations in handling technical diagrams as well as identify key factors (e.g., model architecture, training data) that influence performance on diagram-related tasks
  • Investigate the potential of an abstract markup syntax to serve as a universal intermediate representation for technical diagrams as well as evaluate how such a syntax could improve model interpretability, training efficiency, and generalization across domains
  • Prototype and test possible syntaxes, assessing their practicality and alignment with existing diagramming standards

Preferred Qualifications

Some prior experience in building RAG systems or chatbots

Benefits

  • Flexible work from home in Germany
  • Work at the Bosch location in Abstatt
This job is filled or no longer available