March 22nd and 23rd I attended and presented at the ASIS&T Research Data Access & Preservation Summit in New Orleans. This is a small but focused meeting with about about a hundred attendees. Participation was highly encouraged with lots of questions being asked, spontaneous discussion, and spur of the moment lighting talks. Everyone there was focused on issues related to managing, curating and preseving data. Overall there was considerable discussion throughout the two days centering on solutions for addressing data managment requirements, with many institutions launching or using traditional institutional repositories to address this requirement seemingly in the absence of other options. The other thread throughout the two days consisted of the need to create training in data management and curation for experienced librarians who are being asked to help support these endeavors.
Some highlights from the panels which were centered around the following five themes.
Data management plans and policies - Suzanne Allard with DataOne emphasized the need to focus on developing the tools scientists need in order to shift the culture, in particular she pointed out their continued focus and work on interoperabily and the need for everyone to work together on this. RUCore at Rutgers, built on Fedora, was showcased as one model for data management. They've integrated with the researcher's workflow system, created discipline specific portals, and are developing tools such as RUAnalytics that let researchers annotate video for example. The Texas Digital Library is also looking at workflow and has integrated their repository with the Open Journal System for their community.
Data Citation - It was noted that ASIS&T will be publishing best practices for citing data soon. DataCite's Mark Martin spoke to the confusion around citing data e.g. no real acknowledgement, processed and packed forms, which mirror was used, which edition or version of th data, where to get the data, etc. The idea of citation landing pages garnered a heated discussion ranging from concerns of whether you'd need landing pages for subsets of collections and others noting that that this starts to sounds like a MARC record system. Paul Uhlir from the National Academy of Sciences spoke about philosophical differences between the US and Europe in the approach to developing a citation style.
Curation Services and Models - David Minor of the University of California San Diego spoke to their experiences with high performance computing and storage management and recent pilots in data curation ranging observational data of the human brain to archeology to geological collections. Michael Witt explained how Purdue's PURR is being set up to support data management and the process they plan to put in place for managing data. I was one of the panelists for the Curation Services and Models panel and presented on the services we've been developing and building at Johns Hopkins. There was good interest in the work we're doing particulary in how we've scoped and modeled service provision within the JHU DMS.
Sustainability - The ArXiv.org presentation pointed out how their community is expanding into other scientific domains and the success of this resources. Models for financially sustaining ArXiv.org are being reviewed including a potential membership model. Dryad's Peggy Schaeffer talked about their financial model with involves working with publishers and charging them a fee everytime an author deposits data in Drayd in association with a publication. Fees are kept very low but so is the storage allocated for that fee.
Training Data Management Practitioners - Kirk Borne of George Masson University shared his experiences with training high school students in data mining and the importance of building these skills in young people. Peter Fox talked about the Computer Science Programs at Rensselaer Poloytechnic Institute the need to think about application themes by domain. Jian Qin of Syracuse University talked about their new online training opportunities in data services. She noted that data literacy is not just important for science students but also important for librarians.
Besides panels there was a lively poster session and lots of opportunities to network and learn from others. The models for data management and service provision range widely as the needs of researchers and communities differ from institution to institution. - Barbara Pralle