The Linguistic Linked Open Data (LLOD) Cloud has emerged as a cornerstone of linguistic research, fostering dataset sharing and data reuse. Leveraging Semantic Web technologies, LLOD provides a rich tapestry of interconnected linguistic datasets that underpin advancements in both linguistics and Natural Language Processing. However, the ecosystem faces challenges related to data accessibility, interoperability, and reuse. This article evaluates the compliance of LLOD datasets with the FAIR principles, i.e., Findability, Accessibility, Interoperability, and Reusability, to assess their quality. A systematic literature review was conducted, identifying 69 linguistic datasets published over the last decade (2014–2024) using Semantic Web technologies. The datasets were evaluated through KGHeartBeat, an automated framework that assesses linked data quality. The analysis focused on an alignment between FAIR principles and Quality dimensions, including Accessibility and Trust, revealing that LLOD datasets are only partially findable and accessible, with scarce interlinking and a limited use of open licenses, which inhibits broader reuse. More in detail, the mapping proposed in this article is a novel and actionable alignment between quality dimensions and the FAIR principles, providing a structured framework for improving dataset compliance. The findings emphasize the need for enhanced accessibility, improved interlinking, and more widespread adoption of open licensing to maximize the value of LLOD for research and applications.

FAIRness of the Linguistic Linked Open Data Cloud: an Empirical Investigation

Pellegrino, Maria Angela
;
Esposito, Pasquale;Tuozzo, Gabriele
2025

Abstract

The Linguistic Linked Open Data (LLOD) Cloud has emerged as a cornerstone of linguistic research, fostering dataset sharing and data reuse. Leveraging Semantic Web technologies, LLOD provides a rich tapestry of interconnected linguistic datasets that underpin advancements in both linguistics and Natural Language Processing. However, the ecosystem faces challenges related to data accessibility, interoperability, and reuse. This article evaluates the compliance of LLOD datasets with the FAIR principles, i.e., Findability, Accessibility, Interoperability, and Reusability, to assess their quality. A systematic literature review was conducted, identifying 69 linguistic datasets published over the last decade (2014–2024) using Semantic Web technologies. The datasets were evaluated through KGHeartBeat, an automated framework that assesses linked data quality. The analysis focused on an alignment between FAIR principles and Quality dimensions, including Accessibility and Trust, revealing that LLOD datasets are only partially findable and accessible, with scarce interlinking and a limited use of open licenses, which inhibits broader reuse. More in detail, the mapping proposed in this article is a novel and actionable alignment between quality dimensions and the FAIR principles, providing a structured framework for improving dataset compliance. The findings emphasize the need for enhanced accessibility, improved interlinking, and more widespread adoption of open licensing to maximize the value of LLOD for research and applications.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4919648
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact