The Europeana Working Group on Datasheets for Datasets was established in September 2022 and aims to fill a missing link in cultural heritage institutions and research centres managing digital collections. But what exactly are Datasheets?
Datasheets for datasets were developed in response to the lack of a standardised process for documenting datasets within the machine learning community. This gap poses a major obstacle to the re-use and transparency of the data. In a cultural context datasheets should standardise the way how cultural and natural heritage datasets are described effectively in a human and machine-readable format. Datasets in this context could be for example a group of digitised artworks, a corpora of digitised books or a collections of newspapers or even all together if the items have been grouped and used for research activities or machine learning. By establishing Datasheets for the description of such Datasets, data will be compliant with the FAIR principles and enable efficient reuse (e.g. AI applications).
Additionally, in the context of the common European Dataspace for Cultural Heritage, a ‘Collections as Data’ workflow was developed, in which data documentation is addressed as one of the ten steps suggested for curating datasets.
The established Working Group realised Version 1 of the Datasheet Template in September 2023. Version 2 followed two years later. If you want to learn more about the Datasheets concept or how to get involved, you can read the news report on the data space website.
