Spotlight on Data Curation
Authors: Madeleine de Smaele, Jan van der Heul
Data curation is the process in which a data curator reviews a dataset and its accompanying documentation to identify ways to improve its findability, accessibility, interoperability, and reusability (FAIR). Now that our data curator Jan, will soon be retiring, this is a wonderful opportunity to highlight the work he has carried out over many years. Often a bit hidden, but no less important.
A data curator is a specialist who examines the structure, context, and documentation of the data, ensuring that it meets the quality standards of the repository and follows established best practices. Importantly, the curator evaluates the presentation and usability of the dataset, not the scientific value of its content.
Each dataset deposited at 4TU.ResearchData undergoes a review, after which the curator provides the researcher with feedback and suggested improvements. Some changes—such as removing personally identifiable information—are required before the dataset can be accepted into the repository. Others, such as enhancing documentation or adding metadata, are strongly recommended to support long-term reuse.
All communication with the researcher takes place through personal contact, which helps the curator understand user needs and continuously improve the curation workflow and publication process. If a dataset cannot be accepted—for example, because it falls outside the scope of the repository—the depositor receives a personal message explaining the reason for the rejection and suggesting a more suitable repository for their dataset.
Jan says: “My personal contact with researchers guides them through the process and helps them learn how to publish better quality FAIR data. It’s gratifying to receive their positive feedback once I’ve helped them succeed in publishing their data.”
However, data curation doesn’t necessarily start at the time when a dataset has been deposited to 4TU.ResearchData. Even before the actual deposit or upload, the data curator offers assistance and advice on how to improve the quality of the dataset.
Curation is not only considered important by 4TU.ResearchData as a trusted data repository itself, but we also clearly see an awareness among users that curated data is far more likely to be reused in the future. It is all too common to encounter a potentially valuable dataset that lacks documentation, has unclear variable definitions, or is missing essential files. Data curation helps address these issues by supporting both the researchers who share their data and the users who may want to reuse it. Through careful review and improvement of documentation, structure, and metadata, curation strengthens the overall quality and usability of research data.
Jan has dedicated almost 15 years to data curation, quietly shaping the foundation for trustworthy data in our repository. His knowledge, reliability, and commitment to making data FAIR have had a lasting impact. He leaves behind not just well-curated data, but a legacy we will continue to build upon. Thank you, and we wish him all the best in this next chapter.
- The 4TU.ResearchData Team
