Because of the importance of the information conveyed by the clinical documents and owing to the large quantity of raw texts produced in the healthcare system, it became a determinant challenge, in the NLP research field, to arrange the extraction and the management of meaningful data, starting from real text occurrences. In this paper we approach a corpus of 5000 medical diagnoses with sophisticated linguistic and computational devices, which are able to access the semantic dimension of words and sentences contained in it. Our morphosemantic method is grounded on a list of neoclassical formative elements pertaining to the medical domain which has been used for the automatic creation and population of medical lexical resources. The outcomes of this work are automatically built electronic dictionaries and thesauri and an annotated corpus for the NLP in the medical domain.
Morphosemantic strategies for the automatic enrichment of Italian lexical databases in the medical domain
Pelosi, Serena
Conceptualization
;Maisto, Alessandro
Methodology
;Elia, AnnibaleValidation
2017-01-01
Abstract
Because of the importance of the information conveyed by the clinical documents and owing to the large quantity of raw texts produced in the healthcare system, it became a determinant challenge, in the NLP research field, to arrange the extraction and the management of meaningful data, starting from real text occurrences. In this paper we approach a corpus of 5000 medical diagnoses with sophisticated linguistic and computational devices, which are able to access the semantic dimension of words and sentences contained in it. Our morphosemantic method is grounded on a list of neoclassical formative elements pertaining to the medical domain which has been used for the automatic creation and population of medical lexical resources. The outcomes of this work are automatically built electronic dictionaries and thesauri and an annotated corpus for the NLP in the medical domain.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.