In this study, we will cope with the creation of a lingware for Natural Language Processing (NLP) applications, composed by multi word-expression terminological electronic dictionaries (in Machine-Readable Form) and by local grammars (in the form of finite-state automata and transducers). Both parts of this lingware were built and applied according to Lexis-Grammar (LG) formalization principles and methods. Actually, Lexis-Grammar is the investigation system practised by the “Maurice Gross Group” of the Communication Sciences Department at the University of Salerno (Italy). The electronic dictionaries, namely those of compound words we will deal with in this paper, include entries coming from the knowledge domains of European Community Information, and have been extracted from 15 institutional glossaries produced by European Community lexicon policies, or built according to these policies. Also, a text corpus (3.071.610 tokenizations) has been created from European Community Govern Information. These texts were extracted from European Community govern Web sites. The manual construction of finite-states automata and transducers was developed using a software for NLP. Together, dictionaries and grammars will form the basis for a Smart Information Retrieval System, an application which will automatically recognize a given set of frequently-asked questions (from here on, FAQs) on European Community Information, previously formalized as syntactic patterns inside local grammars.

Manually Constructed Lexicons and Grammars for NLP: Building Lingware for Smart Information Retrieval Systems

ELIA, Annibale;VELLUTINO, Daniela;MONTELEONE, Mario;MARANO, FEDERICA;
2010-01-01

Abstract

In this study, we will cope with the creation of a lingware for Natural Language Processing (NLP) applications, composed by multi word-expression terminological electronic dictionaries (in Machine-Readable Form) and by local grammars (in the form of finite-state automata and transducers). Both parts of this lingware were built and applied according to Lexis-Grammar (LG) formalization principles and methods. Actually, Lexis-Grammar is the investigation system practised by the “Maurice Gross Group” of the Communication Sciences Department at the University of Salerno (Italy). The electronic dictionaries, namely those of compound words we will deal with in this paper, include entries coming from the knowledge domains of European Community Information, and have been extracted from 15 institutional glossaries produced by European Community lexicon policies, or built according to these policies. Also, a text corpus (3.071.610 tokenizations) has been created from European Community Govern Information. These texts were extracted from European Community govern Web sites. The manual construction of finite-states automata and transducers was developed using a software for NLP. Together, dictionaries and grammars will form the basis for a Smart Information Retrieval System, an application which will automatically recognize a given set of frequently-asked questions (from here on, FAQs) on European Community Information, previously formalized as syntactic patterns inside local grammars.
2010
9788675890805
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/3000961
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact