Tuesday, May 26, 2009

Open Repostories 2009

Open Repositories (OR) continues to be the best conference I attend each year, and OR 09 didn't disappoint. The diversity of the presentations (and platforms) and the amount of talent and intellectual capital aggregated in one place facilitates discussion and ideas that may not happen independently. OR has "big ideas", cool software, and offers a preview of what the future holds for "repository" technology.

ORE Implementations

One of the nascent technologies at OR 08 was the OAI-ORE specification, which wasn't even in final draft at that point in time. Workshops on ORE had taken place independently prior to OR 08, and there was a workshop held for ORE at OR 08, but even as Herbert was presenting at OR 08, the draft was not final. At OR 09, there were a number of software stacks that leveraged or implemented ORE, including but not limited to ICE-TheOREm, LORE, and DSpace. At the risk of oversimplifying these technologies, I must make an attempt at summarizing their use of ORE.

ICE-TheOREm (presented by Jim Downing and Peter Sefton) integrates electronic thesis management with repository deposit. A thesis is broken down into its components (chapters, data sets) and an ORE object is used to represent the thesis. The ORE object is the "thing" that is exchanged between the thesis management system and the repository. The repository, upon receipt of the ORE object, dereferences the components of the thesis and ingests them into the repository, maintaing the semantics of the aggregation. One of the interesting things ICE allows is embargoes on specific portions of the thesis (say a chapter or paragraph). By disaggregating thesis content in this way, it may allow more open access by allowing a thesis which would normally be entirely embargoed to remain mostly open, only restricting access to the embargoed chapter or paragraph.

LORE (Literature Object Re-use and Exchange) is a fascinating FireFox browser plugin, allowing ORE object graphs to be created, visualized, edited, and annotated in the browser and saved back out to the repository. LORE objects encapsulate FRBR bibliographic data with digital resources. LORE is not released yet, but will be released under GPL v3. I imagine there are ways this plugin would be utilized in the Rose scholarly community. LORE leverages Sesame 2 to store ORE objects. When stored in a repository (in this case a Fedora repository) the ORE ReM is serialized as RDF XML. The domain ontology is configurable, allowing it to be adapted for other disciplines. Future developments include a rules engine to infer object relationships, and the ability to attach license information to the objects in order to facilitate re-use.

Finally, the Texas Digital Library machine implemented ORE on top of DSpace, allowing DSpace objects to be exposed as ORE Resource Maps (ReMs). The DSpace data model was mapped to ORE, and a crosswalk was written from DSpace DC to ORE. This way members of the TDL DSpace federation can expose their collections using ORE via OAI-PMH. The federating server, with some modifications, harvests the ORE ReMs, and is able to present "real" DSpace collections, or it can present a collection as a view on an ORE aggregation. The TDL work allows for three levels of aggregation ranging from metadata only, to metadata with references to the items (with the bits stored on the member server), to metadata with references to the items (with the bits stored on the federating server).

Nascent Tech

The nascent technology this year is the DuraSpace initiative which is promoting a web-based approach to repositories: web apis, cloud storage (DuraCloud), etc. One of the things that the DSpace Foundation brings to the table of the DuraSpace initiative is the large, existing, install base of DSpace 1.x repositories. With its soup-to-nuts approach to repository implementation and one-size-fits-all data model, DSpace 1.x hits a sweet spot for many institutions, including museums and other cultural heritage organizations. If DuraSpace can provide hosting solutions for these folks, there seems to be a potential source of untapped revenue. It was encouraging to hear that prototypes of DuraCloud exist, with a formal release "winterish" 2009.

The second nascent technology this year is DSpace 2 (DS2). DSpace 2 is a ground up clean room re-write of DSpace in an effort to modernize its architecture: make the data model flexible, make the architecture pluggable, and enable re-use of shared components between web developers (Cocoon blocks) and repository developers (Akubra) alike. The architecture is solid but the implementation is not feature compatible (with DSpace 1.x) nor feature complete. The future of DS2 is cloudy in my mind, but I also didn't attend any of the roadmap sessions where that may have been resolved.

Cool Stuff

Vireo is a ETD submission and management system developed by the TDL machine, and designed for use by the 18 (?) schools of the TDL (labs demo). It is implemented using Manakin, and requires three additions to the DSpace database schema. Otherwise the DSpace codebase remains untouched. Vireo deserves a more complete review, but I'm running out of steam. It is Shibboleth aware, can be configured for different schools' workflow, and has all kinds of ajaxy goodie bits. It doesn't prescribe a workflow, so the system is quite flexible. However, it does gently nudge users and administrator in the right directions.

Matt Zumwalt of MediaShelf presented on ActiveFedora, which is a Ruby API over the Fedora API, allowing rapid development and prototyping of lightweight user interfaces on top of Fedora.


There were some themes throughout the conference: data modeling - approaches to atomistic modeling.

Lowering barriers to entry: on the user side (reducing the resistance of the wire, appropriate incentives, proper licensing of digital content) and the developer side.

Repository tech seems to be moving further towards the service/infrastructure layer, only because more and more interactive applications are being built on top of them, enabled/abstracted by technologies like OAI-ORE and SWORD. The end user doesn't know, and wouldn't care, that various services like search, storage, or deposit are being handled by the repository.

No comments:

Post a Comment