Data at the Centre for Digital Music

The Centre for Digital Music (C4DM) at Queen Mary University of London is one of the leading research centres in the field of audio and music technology and signal processing. C4DM makes use of a variety of data as research inputs—most obviously audio datasets—and produces a variety of types of data as research outputs. These outputs include:

  1. manually annotated feature data (“reference annotations”) such as expert chord and key transcriptions of existing music recordings which are used as comparative data for evaluating research work, and
  2. automatically produced annotations such as those accompanying the publication of methods for audio feature analysis.

C4DM also publishes some “data sets” which are actually software services calculating data dynamically or on demand, such as those provided by the SAWA (Sonic Annotator Web Application) feature extractor and the JISC-funded LinkedBrainz project (linkedbrainz.c4dmpresents.org).

With this project, we aim to improve the efficiency and sustainability of storage and publication efforts for those data sets produced as research outputs. We will not be handling audio data (which generally can not be published, due to copyright restrictions) or data that is generated dynamically, e.g. by web services, in this proposal. We believe this restriction is appropriate given the short timescale of the present proposal and its interaction with ongoing work funded by other grants (namely the Sound Software project described below). We anticipate that outcomes from this work will prove relevant for other types of research data. In subsequent work in the Sound Software project, we intend to build on this project to cover further data types.
The two classes of research data to be considered here are therefore:

  1. reference annotations produced within C4DM as discrete pieces of work;
  2. data accompanying research publications, such as the results of experiments, evaluation data, or representative outputs of an algorithm.

The common theme is that these data originate in C4DM and are to be published externally.