One of the most relevant problems with Information Retrieval (IR) software is the correct processing of complex lexical units, today also known as multiword units (MWUs). The shortcomings comes from the fact that such units are often considered as extemporaneous combinations of words, retrievable by means of statistical routines. On the contrary, several linguistic studies, starting from the ‘60s, show that the MWUs, and mainly compound nouns, are usually “fixed meaning” units, with specific formal, morphological, grammatical and semantic characteristics. Furthermore, these units can be processed as electronic dictionary entries, so becoming concrete lingware tools useful to achieve an efficient semantic-based IR. In this paper, we present a prototypal version of B-Tri, an automatic semantic-based IR software which aims to locate and display multi-word units/compound words, occurring inside English texts (or in any other natural language), associating to each of them their appropriate semantic domain, and also to translate them from English (or from any other natural language) to Italian and vice-versa, without any human intervention. B-TRI is currently configured as a stand-alone software, which can be integrated in Web sites and portals to be used online. The analytical procedure here described will prove itself appropriate for any type of digitized text, and will also represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms.

Semantic-based Bilingual Text-Mining

POSTIGLIONE, Alberto;MONTELEONE, Mario
2016-01-01

Abstract

One of the most relevant problems with Information Retrieval (IR) software is the correct processing of complex lexical units, today also known as multiword units (MWUs). The shortcomings comes from the fact that such units are often considered as extemporaneous combinations of words, retrievable by means of statistical routines. On the contrary, several linguistic studies, starting from the ‘60s, show that the MWUs, and mainly compound nouns, are usually “fixed meaning” units, with specific formal, morphological, grammatical and semantic characteristics. Furthermore, these units can be processed as electronic dictionary entries, so becoming concrete lingware tools useful to achieve an efficient semantic-based IR. In this paper, we present a prototypal version of B-Tri, an automatic semantic-based IR software which aims to locate and display multi-word units/compound words, occurring inside English texts (or in any other natural language), associating to each of them their appropriate semantic domain, and also to translate them from English (or from any other natural language) to Italian and vice-versa, without any human intervention. B-TRI is currently configured as a stand-alone software, which can be integrated in Web sites and portals to be used online. The analytical procedure here described will prove itself appropriate for any type of digitized text, and will also represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4687553
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact