The Linked Open Data (LOD) Cloud, a catalog of datasets adhering to Linked Data principles, serves as a critical resource for semantic web applications. It is structured into domains, known as subclouds. While the overall quality of the LOD Cloud has been extensively examined in prior research, the quality of its individual subclouds has received limited attention. This study addresses this gap by evaluating the quality of the LOD Cloud by domain, using the November 2024 snapshot as a reference. The assessment employs KGHeartBeat, a Knowledge Graph (KG) quality assessment tool that evaluates six quality categories: Accessibility, Contextual, Intrinsic, Dataset-Dynamicity, Trust, and Representational aspects. The analysis encompasses 1, 289 LD datasets spanning nine subclouds, including Cross Domain, Government, Life Sciences, and User Generated content. Results reveal significant quality variation across domains, likely influenced by domain-specific community priorities. For example, the Linguistic subcloud excels in Representational quality, while the Life Sciences subcloud demonstrates strength in the Intrinsic category. In contrast, the User Generated domain faces notable challenges in the Trust and Representational quality categories. A general trend observed includes improvements in Licensing information and Data Dump availability across most subclouds, contrasted by a decline in the availability of VoID files and SPARQL endpoints. This study enhances our understanding of how domain-specific factors impact dataset quality and identifies areas requiring targeted efforts to ensure consistent quality enhancements across the LOD Cloud.

Navigating the LOD Subclouds: Assessing Linked Open Data Quality by Domain

Tuozzo G.
2025

Abstract

The Linked Open Data (LOD) Cloud, a catalog of datasets adhering to Linked Data principles, serves as a critical resource for semantic web applications. It is structured into domains, known as subclouds. While the overall quality of the LOD Cloud has been extensively examined in prior research, the quality of its individual subclouds has received limited attention. This study addresses this gap by evaluating the quality of the LOD Cloud by domain, using the November 2024 snapshot as a reference. The assessment employs KGHeartBeat, a Knowledge Graph (KG) quality assessment tool that evaluates six quality categories: Accessibility, Contextual, Intrinsic, Dataset-Dynamicity, Trust, and Representational aspects. The analysis encompasses 1, 289 LD datasets spanning nine subclouds, including Cross Domain, Government, Life Sciences, and User Generated content. Results reveal significant quality variation across domains, likely influenced by domain-specific community priorities. For example, the Linguistic subcloud excels in Representational quality, while the Life Sciences subcloud demonstrates strength in the Intrinsic category. In contrast, the User Generated domain faces notable challenges in the Trust and Representational quality categories. A general trend observed includes improvements in Licensing information and Data Dump availability across most subclouds, contrasted by a decline in the availability of VoID files and SPARQL endpoints. This study enhances our understanding of how domain-specific factors impact dataset quality and identifies areas requiring targeted efforts to ensure consistent quality enhancements across the LOD Cloud.
2025
979-8-4007-1331-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4919645
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact