GitHub currently hosts more than 100 million public repositories. This has made it very popular to conduct Mining Software Repositories (MSR) studies. Researchers have been exploiting the information stored in GitHub (e.g., commits, pull requests, or issues) to investigate both developer- and project-related aspects. GitHub provides the REST API to make queries without cloning repositories. In this tool-demo paper, we highlight some issues we noticed when conducting an MSR study on GitHub by using the REST API and present G-Repo: a tool developed to support researchers when tackling these issues able to ease the creation of datasets for MSR studies. Also, we provide a manually-annotated dataset with information about the kind and the (spoken) languages of 1, 500 repositories hosted on GitHub. A video showing the functioning of G-Repo is available at: https://youtu.be/mb9CIALBFZk.
G-Repo: A Tool to Support MSR Studies on GitHub
Romano S.;Scanniello G.
2021-01-01
Abstract
GitHub currently hosts more than 100 million public repositories. This has made it very popular to conduct Mining Software Repositories (MSR) studies. Researchers have been exploiting the information stored in GitHub (e.g., commits, pull requests, or issues) to investigate both developer- and project-related aspects. GitHub provides the REST API to make queries without cloning repositories. In this tool-demo paper, we highlight some issues we noticed when conducting an MSR study on GitHub by using the REST API and present G-Repo: a tool developed to support researchers when tackling these issues able to ease the creation of datasets for MSR studies. Also, we provide a manually-annotated dataset with information about the kind and the (spoken) languages of 1, 500 repositories hosted on GitHub. A video showing the functioning of G-Repo is available at: https://youtu.be/mb9CIALBFZk.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.