Thursday, December 31, 2009

2009 Fall Preservation and Archiving Special Interest Group

The Preservation and Archiving Special Interest Group (PASIG) comprises educational and commercial institutions that meet semi-annually. Sun Microsystems and Stanford University are the main sponsors of PASIG, so the fall meetings have occurred in San Francisco (with the spring meetings rotating throughout the world). The Sheridan Libraries DataNet award was still recent news during the 2009 Fall meeting, so there was a great deal of interest regarding our initial plans.

I provided a presentation about the Data Conservancy (the name of our DataNet award) that focused on the proposal elements and initial technical architecture. The most important point that I had mentioned was there are significant unknowns and research questions related to data curation infrastructure, particularly related to storage systems. It is tempting to assume that storage is storage and one can simply deposit large amounts of complex scientific data onto storage systems to complete the curation process, but nothing could be further from the truth. Storage is a necessary but far from sufficient condition.

Even at the storage layer, there are fundamental questions regarding failure rates, large-scale system performance, interfaces with preservation and access systems, etc. There are several vendors within the storage hardware and software sector so the ability to span across different vendors is an important consideration as well. More recently, cloud-based storage (and associated services) have become more available as options.

One of the key questions for the Data Conservancy will be comparative advantage. That is, how we do leverage the expertise and capabilities of partners, including commercial storage providers, while focusing on our core areas of expertise such as preservation policies, processes and actions. This type of question could be explored through PASIG.

Both educational and commercial institutions have presented their technology offerings for preservation and archiving through PASIG, which has been helpful in terms of developing a better understanding of the landscape. However, it is my hope that PASIG (and other digital preservation meetings) start to address the "ecological" view of infrastructure development. No single institution will develop the capacity to curate all scientific data, so it's critical for our community to consider the distributed nature of curation, including storage systems. As we build out our local node of infrastructure at Johns Hopkins through the Data Conservancy, we will undoubtedly need to find ways to integrate and connect with other nodes of infrastructure.

Wednesday, December 30, 2009

2009 Fall Digital Library Federation Forum

The most recent DLF Forum occurred November 11-12 in Long Beach, CA. As others have observed, this Forum represented a departure from previous Forums. Roy Tennant, a member of the planning committee, described the Forum through one of his blog posts. Roy was one of the members of an extraordinary planning committee that demonstrated professionalism of the highest order. It was an honor for me to work with this planning group.

The Forum came at a time of transition for DLF specifically and libraries more generally. Libraries continue to grapple with budget cuts, some of which seem quite severe, that include travel restrictions or embargoes resulting in fewer attendees than previous Forums. For these reasons, the planning committee and CLIR adopted a different approach for this Forum. The theme was "Strategies for Innovation" with a subtheme of "Getting Results." The Forum provided an opportunity for reflection on lessons learned and consideration of strategies for moving forward.

Perhaps what struck me the most about this Forum was the passion, energy, and thoughtful dialogue of every participant. I can honestly say I've never attended a meeting or forum where everyone was so focused and engaged over two days without digressing into what could have been many pathways for chaos. It was a daunting challenge to organize and implement such a Forum and our community rose to this challenge. In addition to the people in the room, there was an extremely rich "back channel" conversation through Second Life and Twitter. On several occasions, the Forum participants in the room found ourselves responding to comments from our colleagues who participated from afar.

While there were many worthwhile observations, we tried to encapsulate the overarching ideas through the following principles:
  • Libraries must transform themselves to better support the mission of our institutions
  • Innovation is an essential component for transformation
  • Innovation comes in many varieties
  • Successful innovation can not happen without effective people, processes, practices and technologies
This last point is one that has continued to bounce around my head since the Forum. I worry a great deal that libraries do not have "industrial strength" capabilities in terms of people, processes, practices and technologies. I have no doubt regarding our service orientation and commitment, but I believe we have a long way to go in terms of developing infrastructure -- both people and technology -- that can meet the needs of our scholars.

Perhaps this is something the to be hired DLF Program Officer will think about.

Tuesday, December 29, 2009

CNI Fall 2009 Membership Meeting

One of the most interesting presentations at the CNI meeting featured our own Sayeed Choudhury reporting on plans for the Data Conservancy, the DataNet project that he is leading. I will leave it to Sayeed to report on that, but I sat in on a couple of other good talks.

CLIR Postdoctoral Fellows program
Several current and former CLIR Postdoctoral Fellows talked about their experience working in libraries for the first time. This program attempts to bring recent Ph.D. recipients into the library to work on innovative new ways of integrating academic libraries into the teaching and research roles of the university. Gabrielle Dean of the Sheridan Libraries talked about some of the practical benefits of the program for the fellows. These include:
  • a chance to pursue a new career path
  • interesting things to work on and further develop your CV
  • interaction with other Fellows--this often becomes a long term association
  • broader view of academia; scholars sometimes get so focused on their research that they don't see the bigger picture
Several library representatives who have hosted Fellows talked about the benefits of the program. The Fellows have instant credibility with the faculty and graduate students as a result of their recent research. There was much agreement that both the library and the Fellow benefits from this association. Several agreed that there needs to be a tangible project for the Fellow and that there is a clear plan to integrate them into the library

Institutional Repository at UC
Catherine Mitchell from the University of California talked about their IR known as eScholarship. They have decided to stop focusing so much on having faculty submit their already-published work, and instead, to play a larger role in publishing. They recently formed a committee comprising faculty and librarians to gather data about the publishing landscape at UC. Some of the key points they found were:
  • few faculty understood the term "open access" or "institutional repository"
  • the university needs to play a larger role in publishing, not just access
  • campus based journal and monograph publishing needs more support (peer review, distribution, etc.)
  • multimedia publishing and data sets need support
This study and subsequent conversations with more faculty led to the following change in orientation for eScholarship:
  • not calling themselves a "repository" any more
  • they will focus on providing a compelling set of services for faculty rather than trying to get them on board with supporting open access or the institutional repository movement
  • librarians need to learn how to speak to users in a way that will catch their interest
  • eScholarship will be "rebranded" and focus on providing a publishing platform for faculty journals and monographs. This includes providing a clear distinction between peer-reviewed publications and others
  • new services such as the ability to see a rendering of the PDF before downloading it and tracking item "views" as well as downloads
These changes have brought them success in their new way of defining succes (that is, value rather than just numbers). They have increased their journal publishing from 27 titles to 37 in a few months. Participating research units have increased by 10%. eScholarship has seven full time employees at the California Digital Library plus multiple liaisons at each campus.

Monday, December 21, 2009

CNI 2009 Fall Memebership Meeting

I spent a lot of the CNI 2009 Fall Membership Meeting in discussions with people about our current data publishing project and work planning for Year 1 activities for the Data Conservancy. I did manage, however, to make it to a number of the sessions.

Here are few highlights from the CNI 2009 Fall Memebership Meeting:

- A team from Los Alamos National Laboratory (LANL) presented about their work thus far on a framework for annotation of scholarly (and other) resources. Dubbed the Open Annotation Collaboration, the work has its foundations in the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) work, in which the principals were heavily involved. This framework will allow annotation for a wide variety of applications and will provide strong support for annotation of immutable objects.

- A team with members from LANL and Old Dominion University presented on Memento, a system for viewing the web of the past. The system takes advantage of OAI-ORE and a facility of Web Architecture known as content negotiation to provide more seamless interaction with services that provide archive versions of their content (e.g., Internet Archive, Wikipedia). In addition to the development of a new for describing the relationships between the different temporal versions of a resource, the project has developed an application programming interface (API) that allows the various archives to expose their archive content in a consistent manner.

- The meeting ended with a talk from Bernard Frischer, the Director of the Virtual World Heritage Laboratory at the University of Virginia. He talked about using new 3D tools to support the work of humanists. He argued that most humanists rely heavily on 2D objects (printed text on the page) and would benefit tremendously from the availability of two more dimensions -- the third spacial dimension and the temporal (or time) dimension -- to support their research, teaching, and learning. He showed various examples of new tools and how they might help. The last thing that he showed us was a 3D animation of gladiators fighting, from which it was clear that his work would benefit from engagement with the gaming industry. The kinds of animation he showed are already available in video games.

Monday, December 14, 2009

Data Conservancy Presentation at Educause

At the 2009 National Educause Conference, I presented a talk about the Data Conservancy, our NSF DataNet funded data curation infrastructure development effort. Rather than write about what I presented, I will point you to the recording for this session at Educause: