Monday, June 1, 2009

Open Repositories 2009

In mid-May, I attended the annual Open Repositories meeting for the third time. I noted after attending this conference for the first time in 2007 that it was the most practical conference that I had attended in some time. And the last two years have done nothing to disabuse me of that notion. In fact, if there were anything that I would complain about, it would be that the conference is a bit overwhelming because it is so information rich.

I should note that Sayeed Choudhury and Elliot Metsger also attended this meeting and have already blogged about it.

Data Curation, Archiving, and Preservation

Because of the Data Conservancy (our DataNet project currently in the start-up phase) and our DataPub project currently underway, curation of and long-term access to data is of key importance to the Digital Research & Curation Center (DRCC) and the Sheridan Libraries in general. Many of the presentations covered issues of interest in this area. I'll highlight a few of them below.

As Sayeed mentioned in his post, Michael Witt of Purdue spoke about research into the development of data curation profiles. This work is a collaboration between Purdue and UIUC's Graduate School of Library and Information Science. Their approach is based on discussions with researchers and employs an initial unstructured interview to get the conversion started. One of the most interesting findings thus far relates to issues of the data sharing (with whom, after what activities. Michael presented an earlier version of this work at a Sun Preservation and Archiving Special Interest Group (PASIG) meeting. More information can be found on the project site.

John Kunze of the California Digital Library and our own Sayeed Choudhury both spoke in a session devoted to the recommended NSF DataNet projects. John spoke about the Data Observation Network for Earth (DataONE) project, led by University of New Mexico. Sayeed spoke about our project, the Data Conservancy. My focus was on the IT and data frameworks of the two projects. The approaches are different in many ways and it will be interesting to work together to establish the kinds of data management partnerships envisioned by NSF in the creation of the DataNet program.

In addition to the talks, Sayeed and I pulled together a birds of a feather session, which he was unfortunately unable to attend. I was there to represent the Data Conservancy's process and approach. John Kunze and Stephen Abrams, both of whom I was fortunate enough to wrangle at the last minute, represented the perspective and approach of DataONE.

Simple Web Service Offering Repository Deposit (SWORD) and the Open Archives Initiative Object Reuse and Exchange (OAI-ORE, or ORE for short) are two relatively recent developments meant to, respectively, reduce the burden of content deposit and improve the description and exchange of resource aggregations (think compound/complex objects) on the Web. We are employing both of these technologies in our DataPub (curating published data) project. Elliot has done a nice job of highlighting some of the ORE presentations in his post, so I will just add a few comments about the SWORD talks.

Pablo Fernicola gave a presentation describing work on an authoring add-in for Microsoft Word on the Windows platform. The add-in, currently in beta, will support ORE, SWORD, and the Publishing tagset of the NLM DTD. We have been working with Pablo on the ORE components of the add-in. This technology will allow an author to create a document, link it with data and rich media, describe the relationships of this components, and submit the package to a repository -- all without leaving Microsoft Word. While other approaches will be needed for other authoring environments (e.g., LaTeX), these tools go a long way to lowering the barriers to contributing and reusing content.

Adrian Stevenson and Julie Allinson shared a talk describing ongoing work in the second phase of development (SWORD2) and some of the history behind the development of the original SWORD protocol specification and implementations. It is now possible to deposit content into a properly configured Fedora, DSpace, or Eprints repository through Facebook, a web client, and a desktop client (among others). As I mentioned previously, the Microsoft Word will soon support SWORD deposit via an add-in.

Repository Challenge

The Repository Challenge started last year at the Southampton Open Repositories meeting and was organized by David Flanders, then of the JISC-funded Common Repositories Interface Group, with the goal of getting "developers working in small teams to try to quickly pull together established platforms and services to demonstrate how to achieve real-life, user-relevant scenarios and services."

This year's Challenge was again organized by Flanders, now of JISC proper.

The Repository Challenge winner this year was Tim Donohue of UIUC. Tim used JavaScript (JS) to implement a system he called "Mention It". This JS library allows a web page designer to embed into a web page an aggregation of mentions of a specified string on Twitter, FriendFeed, Technorati, and Google Blog Search. Among many other uses, this would allow repository developers to embed the display of mentions for a digital object by specifying its splash page, item, or Handle URI.

The runner-up, Rebecca Sutton Koesar of Emory, created FedoraFS, which combined a Fedora Commons repository with FUSE (Filesystem in User Space) to support access to repository content as if it were in regular files. For example, a PDF file stored as a datastream within a Fedora digital object could be accessed with a standard desktop PDF viewer. Her entry video is available on vimeo.

No comments:

Post a Comment