DSpace test repository: first user feedback

One of the main goals of our project is to test a pilot dataset repository for the Centre for Digital Music (C4DM). After surveying the user requirements and selecting DSpace out of several software options, we installed it, and started customising it. Once we felt the system was ready for some user testing, we selected five "power users" from those who were interviewed or answered the online questionnaire and that have some datasets ready to be published, and asked them to submit their data. Of the five users, three (A, B, and C) gave us detailed feedback until now.

Registration went smoothly. We created a "sandbox" sub-community for each user where to experiment with submissions. In the email we sent to them, we gave a brief description of the structure of DSpace (i.e. communities, collections, and items), and some directions on how to submit a dataset.

A point raised by users A and B was that in general more detailed explanations are required. While users A and C understood the structure of DSpace (Communities, Collections, Items) and the Item = Dataset logic (see this post as well), user B at the beginning tried to use a Collection to store a dataset, because he associated an Item with a File, and wanted to create a dataset with multiple files. Once he came to the submission page, he quickly realised from the available metadata fields that this was not the way it works.

Other unclear parts that were felt to need more explanation were:

  • the general structure of the DSpace repository (communities, collections, items). User B suggests it might be a good idea to use more meaningful names, such as Dataset instead of Item;
  • the metadata fields in the Collection creation and Item submission processes (User B);
  • the purpose of the "Related resources" field (User A), used to store publications or other datasets that are related the submitted one (e.g. subsets, cited by, ...);
  • the format to use in the "Citation" field (user A suggested Bibtex);
  • the way to add multiple keywords (user A suggested using comma separated values, but in reality each new keyword is entered by clicking on the "Add" button);
  • the Creative Commons licenses. User A had difficulty choosing the most suitable since he was not familiar with them. User C complained about the fact that there are two separate licenses (the "publication agreement" for the repository, and the license for the dataset), which is confusing. Furthermore, the license page comes at the very end of the process, while he thinks it should be the first thing the user agrees to.

Furthermore, all three users would like to have a progress bar in the file upload page. This would be especially useful when uploading large files. It happened that the upload of a large file failed at first, and not knowing how much data had been already transferred was very frustrating.

In the email we sent to the users, we suggested using a ZIP archive to avoid having to upload many files one at a time through the web interface. For User C, this is a "suboptimal" solution (see this related post).

Editing metadata after a submission has been published is not as user friendly as the submission itself, according to User A. And User C wanted to change the license afterwards, but that was not possible. This is in fact conceptually correct, because once an Item is published for long-term preservation, it should not change (unless there are mistakes to correct), but this point should be made clear to the users in advance.

To summarise, users need at least some basic training, explaining the purpose of such a repository, the way it is structured, and how to use it.

Finally, the more intensive use by several users that were not administrators also pointed out issues with access permissions. We were in fact already aware of the strange (at least to us) way in which access control is managed in DSpace. Authentication is discussed more in detail here.