Publication | Journal of Computing and Information Science in Engineering 2023

What’s In A Name?

Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files

The natural language names designers use in CAD software are a valuable source of semantic knowledge.

This work investigates the value in the natural language part and document names users provide when they create CAD models. In a first step towards multi-modal text-CAD learning, our results show that Large Language Models are able to leverage the noisy text data to predict part-part and part-whole relationships, with direct applications in automations and recommendations for part re-use, auto-complete, assembly categorizations, smart tool suggestions and library part recommendations.

Published in the Journal of Computing and Information Science in Engineering.

View publication

Abstract

What’s In A Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files

Pete Meltzer, Joseph Lambourne, Daniele Grandi

Journal of Computing and Information Science in Engineering 2023

Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.

Associated Researchers

Pete Meltzer

Manager, Machine Learning Engineering

Joseph George Lambourne

Senior Principal AI Research Scientist

Daniele Grandi

Principal Research Scientist

View all researchers

Related Resources

Publication

2024

Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation

A novel framework that leverages Large Language Models (LLMs) to…

Publication

2023

Hierarchical Neural Coding for Controllable CAD Model Generation

This paper presents a new controllable parametric CAD generative…

Publication

2023

Task-Centric Application Switching: How and Why Knowledge Workers Switch Software Applications for a Single Task

This research studies task-centric application switching and…

Publication

2024

XLB: A Differentiable Massively Parallel Lattice Boltzmann Library in Python

This research introduces the XLB library, a scalable Python-based…

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Contact us