The bibliographic archives used to study scientific collaboration can affect the derived bibliometric indicators as well as the co-authorship network structure. Indeed, the most used international databases might not be able to cover all kinds of works, especially for those disciplines having a more national orientation in their scientific production. In this case, the integration of high-impact journals databases with specialized and local bibliographic archives may be the best compromise to obtain a good coverage of whole research products of scientists involved in a specific field. To carry out the above task, two main challenges have to be addressed: 1) how to combine information by identifying and linking duplicate records, i.e. record linkage, and 2) how to deal with authors name disambiguation, i.e. synonyms and polysems. In this study, we aimed at discussing main issues and practical considerations when these two features are dealt to reach a better quality of co-authorship data for network analysis. Specifically, the bibliographic archives used in De Stefano et al.  are joined to obtain a unified co- authorship network, based on both top-international as well as nationally oriented scientific production of Italian academic Statisticians. To this aim, in the first step a semi-automatic method was adopted to merge three bibliographic archives. Due to the lack of training data, in the second step a modified version of the techniques described in Strotmann et al.  provided promising results for author name disambiguation. Once we assessed how well the two procedures fared in achieving high quality results in the constructed co-authorship network, further statistical analyses will be devoted to identify the co-authorship characteristics of the emerging groups of statisticians under analysis.
|Titolo:||Improving co-authorship network structure by combining different data sources: issues and practical considerations|
|Data di pubblicazione:||2015|
|Appare nelle tipologie:||4.2 Abstract|