DAFx 2012

Following on from the pre-DAfx Software Bootcamp, the DAFx conference itself was 17th-21st September. This gathers international researchers in audio effects and related fields - the program covered spectral processing of audio, physical modelling, time and pitch scaling, auralisation, spatial processing, virtual analogue devices, sound synthesis, touchscreen devices, auditory scene analysis, feature extraction, source separation and other algorithms.

On Monday 17th September, we presented a 90 minute tutorial in conjunction with the Sound Software project on "More Effective Software and Data in Audio Research". The tutorial started with Professor Mark Plumbley, the head of the Centre for Digital Music, introducing reproducible research and the need for Open Access publications, I then discussed research data management, Chris Cannam showed how to write effective research software and the session closed with Professor Plumbley wrapped-up the session with some pointers on what research groups can do to enable researchers to create reproducible research.

A 30 minute talk on data management can cover very little ground. Just covering the most basic content soon fills the time slot and decisions as to the most essential content need to be made. In addition, the lack of thematic repositories within the discipline means that there are few practical recommendations that can be given to researchers regarding data publication. On that basis, I aimed for an overview of data management considerations with the core message of preserving your data through backups.

The presentation was structured as a 4-step program to effective data management:

  • Preserve - make sure that your data gets backed up and archive it after your research
  • Document - make sure your data is understandable by others and your future self
  • Organise - use a folder structure and appropriate file names
  • Publish - for future use

Whilst introducing these themes, relevant issues for each theme were raised e.g. regarding infrastructure (and the lack of it!), and appropriate licensing of research data (suggesting CC0).

In order to motivate people regarding data management (particularly the basic general day-to-day backup of data), I included some evidence of the risk to data:

  • In a 2010 Ponemon report for Intel, it was discovered that 10.8% of laptops in US education and research were lost before the end of their useful life - approx. 3 years on average, so the duration of a fairly quick PHD. Very similar figures were produced in a 2011 report on EU laptops
  • In 2011, The US PC World web-site surveyed 63,000 of its readers on laptop reliability and discovered that over 22% had significant problems during their lifetime and 18% of those (so about 1 in 25 laptops) had hard disc problems
  • the BBC Domesday project was used as an example of how obsolescence can make data inaccessible unless serious efforts are made to recover the data. The original 1986 project involved obscure media and formats for data (not that there were many open formats in 1986!) and relied upon specific hardware for the platform (BBC Master computers and laserdiscs!). Data was, however, successfully recovered using hardware emulation and the original masters of the materials.

Professor Plumbley's wrap-up considered Open Access publication and the need for the heads of research groups to provide the necessary plans and tools to:

  • encourage Open Access;
  • create disaster recovery plans;
  • promote institutional data policies;
  • build data management plans;
  • provide basic development skills to researchers;
  • provide version control facilities.

As the tutorial was open to all attendees of the conference, ca. 80-100 people were in attendance. During the rest of the conference data management was discussed informally with various people - from the UK and abroad.

In general, people seemed to accept data publication as a good thing although some researchers felt that it was giving away the opportunity for follow-on research. The difference between supporting data for publications, intermediate results and primary datasets should probably be incorporated in the training materials - probably around a discussion of "what is research data ?".

Other researchers that had felt as though the session would be irrelevant to their needs were surprised to find that we presented possible solutions to their research data management problems - e.g. the idea of setting up a relatively light-weight research group specific repository rather than an entire institutional data management infrastructure.

The "why not back up to DropBox ?" question came up. My gut feeling is that institutional backups are better for both the researcher and the institution, but it's a question I need to look into in more detail. Overall, it felt that in order to present data management to this group of researchers, more evidence of the value of data management would have helped - I now need to find some of that evidence.

I hope to incorporate points raised at DAFx in the tutorial for ISMIR at the start of October, and will use them to update the online materials for SoDaMaT.

Slides and handouts from the tutorial are available from the Sound Software web-site.