Collections in Institutional Repositories
July 26, 2006 at 4:51 pm | In Thoughts | Comments OffTags: Institutional repositories
I stumbled across this post today on the Disruptive Library Technology Jester blog and found it very interesting. In our experiments with all of the repository solutions that the RUBRIC project has been investigating we’ve also found that the concept of a “collection” is very difficult to manage. The problem essentially stems from the issue that different people have different expectations about what a “collection” should be.
Take for example DSpace. DSpace provides the functionality to have very defined collections, they even go one step further and have “communities” which are in essence collections of collections. This works well when you have a limited number of collections, all with very strict definitions of what should and should not be included in them. The troubles start when it is necessary to have items in more than one collection. As an example, consider a user who wants to have a thesis in a “Thesis” collection containing all theses from the institution, and have the same thesis in the “Business Faculty Collection” because the thesis was written while they were part of that faculty.
Sure it is possible to map one item in one collection to another. But if you later change your mind and delete the item from the wrong collection things tend to get very tricky, very quickly. Not to mention the issues that come up if you want to move large groups of items between collections. Steve Thomas in his blog outlines how he worked out to do it. There is one other thing I discovered as part of my data migration work. When you export a collection of items out of DSpace there is no metadata associated with the item that says which collection it came from. In essence the relationship between the collection and the item is stored separately to the item and so is lost when the item is outside the context of the repository. That may not seem like such a big deal, but when you’re focused on migrating large numbers of items from repository to another, as I was, the ability to have all the metadata associated with an item, be with that item became very important.
When we were exploring the VITAL repository from VTLS, that uses Fedora, we were interested in the RDF relationships that were beginning to be possible to associate objects together. The VITAL software allows us to express relationships between objects using RDF in an easy way. The primary issue that we discovered was that by default a “collection” of items, didn’t visually look that much different from an ordinary item. Our users had become accustomed to the DSpace way of presenting a “collection”.
Taking this one step further. It’s possible to imagine items that are made up of a number of smaller items. Each smaller item is only a component of the larger item. Consider an item that is made up of a thesis, dataset, a few movie files of experiments, and may be even an audio recording of a presentation related to the thesis. Each individual component is interesting, but when combined together they produce a much richer item. Creating an object like that in some institutional repositories would be a very interesting challenge. The ability to define relationships with RDF makes it even more interesting, if not necessarily less challenging.
I think both Steve and Peter make very interesting points. The things that have stuck in my mind are:
- The term collection is one that gets used a great deal, and is really overused.
- If we’re going to use these new ways of expressing relationships, such as RDF, we’re going to need to be careful in how we implement them, and also manage user expectations
- No matter how the collection, or aggregation, is achieved keeping the structure as simple as possible is a very good thing
It’s exciting times in this area of Library technology and I look forward to watching it evolve, and hopefully participating in the evolution
Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.



