In this work, we present two modules for a python open-source library for the analysis of the Italian language. The modules include a Pos tagger based on Averaged Perceptron Tagger and a Lemmatizer, based on the vast collection of linguistic data held by the Department of Politics and Communication Science of the University of Salerno. While the Averaged Perceptron Tagger algorithm is mostly used for the the English language from famous python libraries such as NLTK or Spacy, the Lemmatizer represents an entirely original module that relies on a vast electronic dictionary characterized by the presence of syntactic, morphological, and semantic tags. We present our approach and a preliminary experiment in which we compare our module results with the results of another widely used Pos-tagger and Lemmatizer as Tree-Tagger.
File in questo prodotto:
Non ci sono file associati a questo prodotto.