The Linguistic Linked Open Data (LLOD) Cloud has emerged as a cornerstone of linguistic research, fostering dataset sharing and data reuse. Leveraging Semantic Web technologies, LLOD provides a rich tapestry of interconnected linguistic datasets that underpin advancements in both linguistics and Natural Language Processing. However, the ecosystem faces challenges related to data accessibility, interoperability, and reuse. This article evaluates the compliance of LLOD datasets with the FAIR principles, i.e., Findability, Accessibility, Interoperability, and Reusability, to assess their quality. A systematic literature review was conducted, identifying 69 linguistic datasets published over the last decade (2014–2024) using Semantic Web technologies. The datasets were evaluated through KGHeartBeat, an automated framework that assesses linked data quality. The analysis focused on an alignment between FAIR principles and Quality dimensions, including Accessibility and Trust, revealing that LLOD datasets are only partially findable and accessible, with scarce interlinking and a limited use of open licenses, which inhibits broader reuse. More in detail, the mapping proposed in this article is a novel and actionable alignment between quality dimensions and the FAIR principles, providing a structured framework for improving dataset compliance. The findings emphasize the need for enhanced accessibility, improved interlinking, and more widespread adoption of open licensing to maximize the value of LLOD for research and applications.
FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation
Pellegrino, Maria Angela
;Esposito, Pasquale;Tuozzo, Gabriele
2025
Abstract
The Linguistic Linked Open Data (LLOD) Cloud has emerged as a cornerstone of linguistic research, fostering dataset sharing and data reuse. Leveraging Semantic Web technologies, LLOD provides a rich tapestry of interconnected linguistic datasets that underpin advancements in both linguistics and Natural Language Processing. However, the ecosystem faces challenges related to data accessibility, interoperability, and reuse. This article evaluates the compliance of LLOD datasets with the FAIR principles, i.e., Findability, Accessibility, Interoperability, and Reusability, to assess their quality. A systematic literature review was conducted, identifying 69 linguistic datasets published over the last decade (2014–2024) using Semantic Web technologies. The datasets were evaluated through KGHeartBeat, an automated framework that assesses linked data quality. The analysis focused on an alignment between FAIR principles and Quality dimensions, including Accessibility and Trust, revealing that LLOD datasets are only partially findable and accessible, with scarce interlinking and a limited use of open licenses, which inhibits broader reuse. More in detail, the mapping proposed in this article is a novel and actionable alignment between quality dimensions and the FAIR principles, providing a structured framework for improving dataset compliance. The findings emphasize the need for enhanced accessibility, improved interlinking, and more widespread adoption of open licensing to maximize the value of LLOD for research and applications.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.