A debated controversies in recent years is the Plagiarism, specially in fields able to generate huge amount of money, such as the Music. However, the existing mechanisms to detect plagiarism, i.e., copying the work of others and trying to pass it off as one, mainly apply superficial and brute-force string matching techniques. Such well-known metrics, widely used to discover similarities in text documents, could not work well in discovering similarities in music compositions. This because many algorithms for text similarity are based on the well-known bag-of-word representation of text. Despite its popularity, bag-of-word representation ignores semantic of the words, and this is a weakness when the text is a representation of a piece of music, and so the semantic of the words (sequence of notes) is a fundamental aspect to detect similarities. In fact, despite the wide-spread belief that few notes in common between two songs is enough to decide whether a plagiarism exists, the analysis of similarities is a very complex process. In this work, we provide novel perspectives in the field of automatic music plagiarism detection, and specifically, we show how the advantages of a textual representation of music (simplicity, readability) can be exploited by a plagiarism detection system based on two computational intelligence modules: an unsupervised machine learning algorithm to retrieve similar melodies, and a fuzzy deep analyzer to discover plagiarisms. Given a dataset of melodies and a suspicious melody, our system envisions three steps: (1) its transformation in a text representation, (2) retrieving of a list of similar melodies by using a machine learning algorithm that, in an unsupervised way, learns representations of features with a fixed length starting from pieces of text with a variable length, (3) analysis and comparison with this subset of associated similar scores by using a fuzzy degree of similarity, that varies in a range between 0 for melodies that are fully musically different, and 1 for identical melodies. The effectiveness of our approach was assessed with tests performed on a large dataset of ascertained plagiarisms. Results show that it is able to reach an accuracy of 96.4%.

A computational intelligence text-based detection system of music plagiarism

Roberto De Prisco;Delfina Malandrino;Rocco Zaccagnino
2017

Abstract

A debated controversies in recent years is the Plagiarism, specially in fields able to generate huge amount of money, such as the Music. However, the existing mechanisms to detect plagiarism, i.e., copying the work of others and trying to pass it off as one, mainly apply superficial and brute-force string matching techniques. Such well-known metrics, widely used to discover similarities in text documents, could not work well in discovering similarities in music compositions. This because many algorithms for text similarity are based on the well-known bag-of-word representation of text. Despite its popularity, bag-of-word representation ignores semantic of the words, and this is a weakness when the text is a representation of a piece of music, and so the semantic of the words (sequence of notes) is a fundamental aspect to detect similarities. In fact, despite the wide-spread belief that few notes in common between two songs is enough to decide whether a plagiarism exists, the analysis of similarities is a very complex process. In this work, we provide novel perspectives in the field of automatic music plagiarism detection, and specifically, we show how the advantages of a textual representation of music (simplicity, readability) can be exploited by a plagiarism detection system based on two computational intelligence modules: an unsupervised machine learning algorithm to retrieve similar melodies, and a fuzzy deep analyzer to discover plagiarisms. Given a dataset of melodies and a suspicious melody, our system envisions three steps: (1) its transformation in a text representation, (2) retrieving of a list of similar melodies by using a machine learning algorithm that, in an unsupervised way, learns representations of features with a fixed length starting from pieces of text with a variable length, (3) analysis and comparison with this subset of associated similar scores by using a fuzzy degree of similarity, that varies in a range between 0 for melodies that are fully musically different, and 1 for identical melodies. The effectiveness of our approach was assessed with tests performed on a large dataset of ascertained plagiarisms. Results show that it is able to reach an accuracy of 96.4%.
978-1-5386-1107-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4714136
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 1
social impact