Data quality assessment is a multifaceted challenge involving various dimensions such as accessibility, interlinking, and completeness. These dimensions are domain-dependent and can be aggregated into a score between 0 and 1, facilitating dataset ranking based on quality. Achieving effective representation and explanation of these rankings poses significant challenges akin to those in machine learning, where interpretability and understandability are crucial. In the domain of natural language processing, data interpretation is a critical yet complex process, often requiring domain expertise and significant resources. Advanced Language Model Models (LLMs) offer promise in automating annotation tasks, ensuring consistency, and adapting to specific domains. Leveraging such models for knowledge representation tasks necessitates adept prompt engineering. This study focuses on experiencing state-of-the-art prompt engineering methods, particularly using GPT-3.5, for representing knowledge related to dataset quality. By exploring techniques to extract RDF triples from textual data without predefined labels or constraints, this work aims to enhance interpretability and understanding of dataset quality assessment results while verifying the feasibility on automatic knowledge representation leveraging LLMs.

Moving from Tabular Knowledge Graph Quality Assessment to RDF Triples Leveraging ChatGPT

Tuozzo G.
2024

Abstract

Data quality assessment is a multifaceted challenge involving various dimensions such as accessibility, interlinking, and completeness. These dimensions are domain-dependent and can be aggregated into a score between 0 and 1, facilitating dataset ranking based on quality. Achieving effective representation and explanation of these rankings poses significant challenges akin to those in machine learning, where interpretability and understandability are crucial. In the domain of natural language processing, data interpretation is a critical yet complex process, often requiring domain expertise and significant resources. Advanced Language Model Models (LLMs) offer promise in automating annotation tasks, ensuring consistency, and adapting to specific domains. Leveraging such models for knowledge representation tasks necessitates adept prompt engineering. This study focuses on experiencing state-of-the-art prompt engineering methods, particularly using GPT-3.5, for representing knowledge related to dataset quality. By exploring techniques to extract RDF triples from textual data without predefined labels or constraints, this work aims to enhance interpretability and understanding of dataset quality assessment results while verifying the feasibility on automatic knowledge representation leveraging LLMs.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4919647
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact