Spotlight on Data Curation

Authors: Madeleine de Smaele, Jan van der Heul

Data cura­tion is the process in which a data cura­tor reviews a dataset and its accom­pa­ny­ing doc­u­men­ta­tion to iden­ti­fy ways to improve its find­abil­i­ty, acces­si­bil­i­ty, inter­op­er­abil­i­ty, and reusabil­i­ty (FAIR). Now that our data cura­tor Jan, will soon be retir­ing, this is a won­der­ful oppor­tu­ni­ty to high­light the work he has car­ried out over many years. Often a bit hid­den, but no less impor­tant.

A data cura­tor is a spe­cial­ist who exam­ines the struc­ture, con­text, and doc­u­men­ta­tion of the data, ensur­ing that it meets the qual­i­ty stan­dards of the repos­i­to­ry and fol­lows estab­lished best prac­tices. Impor­tant­ly, the cura­tor eval­u­ates the pre­sen­ta­tion and usabil­i­ty of the dataset, not the sci­en­tif­ic val­ue of its con­tent.

Each dataset deposit­ed at 4TU.ResearchData under­goes a review, after which the cura­tor pro­vides the researcher with feed­back and sug­gest­ed improve­ments. Some changes—such as remov­ing per­son­al­ly iden­ti­fi­able information—are required before the dataset can be accept­ed into the repos­i­to­ry. Oth­ers, such as enhanc­ing doc­u­men­ta­tion or adding meta­da­ta, are strong­ly rec­om­mend­ed to sup­port long-term reuse.

All com­mu­ni­ca­tion with the researcher takes place through per­son­al con­tact, which helps the cura­tor under­stand user needs and con­tin­u­ous­ly improve the cura­tion work­flow and pub­li­ca­tion process. If a dataset can­not be accepted—for exam­ple, because it falls out­side the scope of the repository—the depos­i­tor receives a per­son­al mes­sage explain­ing the rea­son for the rejec­tion and sug­gest­ing a more suit­able repos­i­to­ry for their dataset.

Jan says: “My per­son­al con­tact with researchers guides them through the process and helps them learn how to pub­lish bet­ter qual­i­ty FAIR data. It’s grat­i­fy­ing to receive their pos­i­tive feed­back once I’ve helped them suc­ceed in pub­lish­ing their data.” 

How­ev­er, data cura­tion doesn’t nec­es­sar­i­ly start at the time when a dataset has been deposit­ed to 4TU.ResearchData. Even before the actu­al deposit or upload, the data cura­tor offers assis­tance and advice on how to improve the qual­i­ty of the dataset. 

Cura­tion is not only con­sid­ered impor­tant by 4TU.ResearchData as a trust­ed data repos­i­to­ry itself, but we also clear­ly see an aware­ness among users that curat­ed data is far more like­ly to be reused in the future. It is all too com­mon to encounter a poten­tial­ly valu­able dataset that lacks doc­u­men­ta­tion, has unclear vari­able def­i­n­i­tions, or is miss­ing essen­tial files. Data cura­tion helps address these issues by sup­port­ing both the researchers who share their data and the users who may want to reuse it. Through care­ful review and improve­ment of doc­u­men­ta­tion, struc­ture, and meta­da­ta, cura­tion strength­ens the over­all qual­i­ty and usabil­i­ty of research data.

Jan has ded­i­cat­ed almost 15 years to data cura­tion, qui­et­ly shap­ing the foun­da­tion for trust­wor­thy data in our repos­i­to­ry. His knowl­edge, reli­a­bil­i­ty, and com­mit­ment to mak­ing data FAIR have had a last­ing impact. He leaves behind not just well-curat­ed data, but a lega­cy we will con­tin­ue to build upon. Thank you, and we wish him all the best in this next chap­ter.
- The 4TU.ResearchData Team

Image: The FAIR prin­ci­ples. The Tur­ing Way project illus­tra­tion by Scriberia. Used under a CC-BY 4.0 licence. DOI: 10.5281/zenodo.3332807

Related Articles

Discover more from 4TU.ResearchData

Subscribe now to keep reading and get access to the full archive.

Continue reading