CATALOGA: a Software for Semantic-Based Terminological Data Mining

Elia, Annibale; Postiglione, Alberto; Monteleone, Mario

doi:10.1109/CCP.2011.42

This paper is focused on Cataloga, a software package based on Lexicon-Grammar theoretical and practical analytical framework and embedding a lingware module built on compressed terminological electronic dictionaries. We will here show how Cataloga can be used to achieve efficient data mining and information retrieval by means of lexical ontology associated to terminology-based automatic textual analysis. Also, we will show how accurate data compression is necessary to build efficient textual analysis software. Therefore, we will here discuss the creation and functioning of a software for semantic-based terminological data mining, in which a crucial role is played by Italian simple and compound-word electronic dictionaries. Lexicon-Grammar is one of the most profitable and consistent methods for natural language formalisation and automatic textual analysis; it was set up by French linguist Maurice Gross during the ‘60s, and subsequently developed for and applied to Italian by Annibale Elia, Emilio D’Agostino and Maurizio Martinelli. Basically, Lexicon-Grammar establishes morphosyntactic and statistical sets of analytic rules to read and parse large textual corpora. The analytical procedure here described will prove itself appropriate for any type of digitalised text, and will represent a relevant support for the building and implementing of Semantic Web (SW) interactive platforms. http://www.computer.org/portal/web/csdl/abs/proceedings/ccp/2011/4528/00/pccp201100toc.htm http://www.computer.org/csdl/proceedings/ccp/2011/4528/00/4528a153-abs.html

UniSa - IRIS Institutional Research Information System