A central problem in Information Retrieval (IR) is multi-word unit (MWU) processing, as they are often considered impromptu, statistically tractable, word combinations. Several linguistic studies show, instead, that MWUs, and mainly compound nouns, are usually units of “fixed meaning”, with specific formal, morphological, grammatical and semantic characteristics. Moreover, they can be processed as electronic dictionary entries, making them useful for semantic-based IR. In this paper, we present a prototype of a semantic-based IR system that aims to identify multi-word units within English (or other natural language) texts, by associating each of them with one (or more) semantic domains. This allows automatic translation of MWUs from English (or other natural language) to Italian and vice versa. The system can be integrated into sites and portals, is suitable for any type of text and represents a relevant support for interactive Semantic Web (SW) platforms

A Linguistic Semantic Text-Mining for Multiword Units

Alberto Postiglione
;
Mario Monteleone
2020-01-01

Abstract

A central problem in Information Retrieval (IR) is multi-word unit (MWU) processing, as they are often considered impromptu, statistically tractable, word combinations. Several linguistic studies show, instead, that MWUs, and mainly compound nouns, are usually units of “fixed meaning”, with specific formal, morphological, grammatical and semantic characteristics. Moreover, they can be processed as electronic dictionary entries, making them useful for semantic-based IR. In this paper, we present a prototype of a semantic-based IR system that aims to identify multi-word units within English (or other natural language) texts, by associating each of them with one (or more) semantic domains. This allows automatic translation of MWUs from English (or other natural language) to Italian and vice versa. The system can be integrated into sites and portals, is suitable for any type of text and represents a relevant support for interactive Semantic Web (SW) platforms
2020
9788833691015
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4751357
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact