Research Data

C4DM has produced several hundred refereed research publications in the last decade, but only a small number of associated data sets have been published. A combination of reasons could be cited for this, including failure to understand the benefits of data publishing, the lack of suitable infrastructure for managing and preserving research data, and the lack of impetus from main funding bodies (EPSRC and EU) and journals, who do not require data publication. Growing awareness of the principles of Reproducible Research applied to digital music research have led to isolated examples of publishing research data, such as the automatic chord transcriptions at, however this is the exception rather than the rule.

Output from automatic feature extraction algorithms is usually summarised for paper publications as a set of statistics describing the extent to which the output concurs with a set of manual annotations. This is a useful measure of progress in algorithm development, and is used for example in the MIREX series of international evaluations of music information retrieval algorithms ( Such statistics do not, however, tell the whole story: they fail to inform readers about which musical examples were analysed successfully, and which examples the algorithm failed on. Characterisation of successes and failures is necessary in order to gain insight about how systems can be improved. Further, a difference in mean performance of two different algorithms is less revealing than a piecewise comparison of algorithm performance, from the viewpoint of both informal and formal statistical analysis. Other reasons for data publication are to allow verification of published results and testing of reimplementations of published algorithms. Finally, algorithm outputs are needed by researchers who wish to assess others’ work using different criteria or metrics than those appearing in their publications.