In this paper we use Nooj to solve a recognition and translation task on medical terms with a morphosemantic approach. The Medical domain is characterized by a huge number of different terms that appear in corpora with very low frequencies. For this reason, machine learning or statistical approaches do not achieve good results on this domain. In our work we apply a morpho-semantic approach that take advantage from a number of Italian and English word-formation strategies for the automatic analysis of Italian words and for the generation of Italian/English bilingual lexicons in the medical sub-code. Using Nooj we built a series of Italian and bilingual dictionaries of morphemes, a set of morphological grammars that specify how morphemes combine with each other, a syntactic grammar for the recognition of compound terms and a Finite State Transducer (FST) for the translation of medical terms based on morphemes. This approach produces as output: a categorized Italian electronic dictionary of medical simple words, provided with labels specifying the meaning of each term; a Thesaurus of simples and compounds medical terms, organized in 22 medical subcategories; A an Italian/English translation of medical terms.
Morpheme-Based Recognition and Translation of Medical Terms
MAISTO, ALESSANDRO;GUARASCI, RAFFAELE
2015-01-01
Abstract
In this paper we use Nooj to solve a recognition and translation task on medical terms with a morphosemantic approach. The Medical domain is characterized by a huge number of different terms that appear in corpora with very low frequencies. For this reason, machine learning or statistical approaches do not achieve good results on this domain. In our work we apply a morpho-semantic approach that take advantage from a number of Italian and English word-formation strategies for the automatic analysis of Italian words and for the generation of Italian/English bilingual lexicons in the medical sub-code. Using Nooj we built a series of Italian and bilingual dictionaries of morphemes, a set of morphological grammars that specify how morphemes combine with each other, a syntactic grammar for the recognition of compound terms and a Finite State Transducer (FST) for the translation of medical terms based on morphemes. This approach produces as output: a categorized Italian electronic dictionary of medical simple words, provided with labels specifying the meaning of each term; a Thesaurus of simples and compounds medical terms, organized in 22 medical subcategories; A an Italian/English translation of medical terms.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.