FAIR Data Fund use case: Mining figures from scientific publications

We interview the author, Artur Schweidtmann, about his project funded by the 4TU.ResearchData FAIR Data Fund.

What is your project about?

The goal of our research is to make figures from peer-reviewed open-access scientific publications FAIR. Figures can be found in almost all scientific publications and comprise a lot of information. For example, graphs commonly show relevant correlations between measurements. Another example are engineering diagrams which show the connectivity of equipment in plants. However, this information is difficult to find. We automatically extract figures from scientific publications and classify them. In the future, we will make these accessible to the scientific community via a FAIR data platform. This will allow scientists to search directly for relevant figures. In the end, it can be imagined like a clever search engine for scientific images. Moreover, this has a great potential for the training of machine learning models in different domains. 

What are some key results that you can share?

We have developed software that can automatically extract images and classify them. This increased the robustness of the approach and also contributed to the following publication [1]. In this publication, we automatically identified over 1,000 figures that show specific (chemical) engineering diagrams called “flowsheets”. In the future, we will extend this approach to different types of figures and create an open platform where everyone can access the images. Moreover, we envision to train machine learning algorithms on the mined flowsheet data to ultimately support the design of sustainable chemical processes. 

[1] Balhorn, L. S., Gao, Q., Goldstein, D., & Schweidtmann, A. M. (2022). Flowsheet recognition using deep convolutional neural networks. In Computer Aided Chemical Engineering (Vol. 49, pp. 1567-1572). Elsevier.

How has the FAIR DATA Fund helped you with your project? What is the added value?

The FAIR DATA Fund has helped us significantly by co-financing a student assistant and an external software developer. This helped us to further develop the algorithms and improve robustness and quality of our code.

Related Articles


Leave a comment!