Sunday, December 11, 2011

7th International Data Curation Conference

Last week I attended the 7th International Data Curation Conference (IDCC, twitter #idcc11, in Bristol, UK.   The theme of this year's meeting was "Public? Private? Personal? navigating the open data landscape".

In addition to a large number of very stimulating presentations and discussions throughout, the meeting was bracketed by two interesting keynotes.

The opening keynote was delivered by Ewan McIntosh of NoTosh (  He talked about moving away from the often dry messages of research, research data, and the data management that supports reuse and toward stories that demonstrate impact of the research and data.  He talked about the work that his company has done with some  young students to develop TED-like talks (  Some of the lessons learned from this activity, he feels, are more broadly applicable to improving data reuse: 1. tell a story 2. create curiosity 3. create wonder 4. solve pain 5. create reasons to trade data.

The closing keynote was given by Natasa Milic-Frayling of Microsoft Research Cambridge (MSRC). Natasa talked about the Microsoft team's interaction with a working research team. The big lessons from their work relate to the disconnect between how clean and precise we would like the data management processes to be and how messy they are in real life.  The lab team's pragmatism is not typically mirrored by those of us who work in the realm of data management.  A couple of examples:
  • Though the researchers used electronic notebooks and many steps in their process were captured therein, only more informal summaries of these processes fed into next steps in the data provenance chain.
  • New technologies were brought to bear on the research team's workflow.  These technologies were not perfectly aligned with the needs of the project and there were some compromises made to achieve workable integrations.  These compromises made it difficult to accurately track and verify the origins of some information that became part of later data products.
Presence in the researcher's lab was the only way for the MSRC team to detect these issues.  Understanding the lessons of this work will inform the way we engage with researchers in our own institutions.

As I mentioned previously, the conference had many excellent presentations.  Links to presentation slides and other resources, where available, have been added to the online conference program.

Friday, December 2, 2011

Maryland E-Books Summit

by Sue Vazakas

Kristin Bernet, Sharon Morris, and I attended this event in Columbia, MD, on November 29-30, 2011.

Pre-Summit: Publishing 101 and Libraries 2015

This 1/2-day event was intended to give some background about how the publishing industry works and what's going on right now in their world.

(1) Steve Potash, Overdrive
(2) Nora Rawlinson, Publisher, Earlyword: the Publisher/Librarian Connection
(3) Josh Marwell, Director of Sales, HarperCollins
(1) Overdrive provides e-books mostly to public libraries and is getting bigger and bigger. For example, if you go to the Baltimore County Libraries site, find an e-book, and click on the "download" link, you'll be taken to the Maryland's Digital e-Library Consortium, "powered by Overdrive." Maryland's site had a growth of 300% over the past 1.5 years.

Interesting fact: Overdrive is adding more and more publishers to its service, and just signed O'Reilly, whose stuff will be DRM-free (as opposed to its stuff on Safari, which we have).

Another of their initiatives: adding a "buy it now" link on their site. Whichever library's portal you go through to get to that link will receive a % of your TOTAL purchase, even if it's not books! NYPL and Philadelphia Free Library are piloting this.

(2) Rawlinson discussed "The Big 6" publishers, which is actually more than 6: Random House, Simon and Schuster, Penguin, HarperCollins, Scholastic, Hachette, Harlequin, Wiley, Holtzbrink (formerly MacMillan). Each has its intrigues; e.g., HarperCollins' "after 26 checkouts, library must buy it again," and Penguin's November pull-out of all new e-books from all public libraries, strangely citing "security concerns" even though Overdrive has DRM.

Another recent newsworthy event was Amazon's Jeff Bezos pricing the bestsellsers instead of the publishers. But Apple suggested another model: the publisher sets the price and the seller can discount them, which Amazon did. The Kindle Fire loses $3 per sale, but probably doing this so that it will sell more e-books. Amazon, the elephant in the room, also has several of its own imprints now.

(3) The HarperCollins speaker, when asked about the 26-checkout policy, said "it's always under review."

He believes that publishers add value by doing these things very well: developing talent, creating products (enhanced e-books and apps as well as print books), maintaining high quality of e-books, helping with e-book discovery, and marketing them, among other things.

They will be putting Espresso print-on-demand machines in some bookstores for their backlist trade publications.

As of now, 71% of their business comes from physical bookstores.

The full-day meeting had about 300 attendees, evenly divided among academics, schools, and public libraries.

(1) Sue Polanka -- eBooks in Libraries: Big Dreams Meet Reality
(2) Robert Miller -- A Digital Library: Too Little, Too Late, or Just in Time
(3) Mary Minow -- eBooks: Buying vs. Licensing
(4) Joseph Sanchez -- The eRoad Less Travelled: Getting Control, Staying Relevant
(5) Eli Neiburger -- Libraries and eBooks in this Century: What to Do Now, What to Do Later

Here are just brief highlights:

Sue Polanka -- Her blog is No Shelf Required. She gave an overview of e-book business models and pros/cons. Recommended Eric Hellman's Glue Jar blog. She has an article in the Nov/Dec 2011 issue of American Libraries about buying e-books.

She discussed other developments, including:

--- The 3M Cloud Library: You pay $10K for the platform, and it even includes a "discovery station" (shown on the site) that shows you book covers, allows searching and browsing, and then lets you download the e-book onto your USB!
--- Freading: Short-term loans for $.50-$2 each, nothing owned, no fees, you buy the same content again as many times as you want to read it. Interesting but needs *lots* more content.
--- Consumer subscription services: These include Amazon Prime, Baen, 24Symbols (Spain), Afictionado (UK). Easy to use and cheaper than buying

Robert Miller -- Internet Archive

--- JHU should join!
--- They collect discarded books and even pay for shipping!
--- FREE e-books. They've had 10 million downloads per month on 2 million classic e-books!
---; 415-640-1092 (San Francisco)

Their system respects copyright laws because it's one user at a time. Duplicate books are not a problem, but they suggest choosing local items that are harder to find. His presentation blew everyone away.

Mary Minow -- She's a lawyer and Executive Editor of Stanford's Copyright and Fair Use site. She discussed "first sale," the Hathi lawsuit, DRM and DMCA, and the lack of accessibility of almost all e-devices.

Joseph Sanchez --- Fascinating fellow. Instructor designer at a college in Colorado; blog is thebookmyfriend. He gave an overview of the current "information ecology," including points such as

--- We're in the era of "non-expertise" (i.e., crowd-sourcing). Amazon and Apple aren't run by MBA's, just techies. They're both operating at a loss and don't care.
--- Overdrive doesn't manage its own DRM; Adobe Content Server does.
--- CIPA is Colorado Independent Publishers' Association. Sanchez's friend Jamie LaRue (whom Kristin and I heard at last month's NISO e-book conference) *bought* an Adobe CS system for $10K, then asked CIPA for their files so that their *own* consortium could be set up.
--- Patrons don't care about owning; they just want access.

Eli Neiburger --- This fellow was the lightning rod; one of this year's LJ "movers and shakers." He spoke very fast and was funny.

His scenario is that libraries *and* publishers are dead in their current forms. He showed a four-square chart, with combinations of "publishers thrive," "publishers die," "closed," and "open," and then described the different combinations.

The "open/publishers die" combination included "free," "no DRM," "no publisher deals because no publishers," and "libraries collect and store." For this combination, then, libraries need to have the infrastructure to collect and store, which we currently don't. We should store the content OF the community, rather than pay for commercial content FOR the community; e.g., school yearbooks (JHU technical reports), and material that doesn't exist anywhere else.

Other points:

--- Library economics work when each use gets *cheaper*, NOT stays the same
--- The price of $9.99 for an e-book is insane. If they were $.99, the authors would get about five times more money
--- Or, offer them free with something else. Example: in June 2011, 65% of the biggest money-making games were FREE to play.
--- Libraries must provide unique experiences, that you can't get anywhere else. Take half the collection money and spend it on events!
--- The Ann Arbor libraries started a site called Old News, which includes Ann Arbor newspapers that died and were digitized and put up for free, for *their* community and everyone else.
--- They also held "Scanning Day," on which people could bring a box of stuff and get it scanned while they waited. The library got great pictures never seen before.

Bottom line: Libraries must be prepared to pick up the pieces when the publishers (like ice houses and tgravel agents) are gone. The 21st century library must bring *its* community to the world.

Please let me know if you have questions.

Monday, November 14, 2011

Digital Library Federation Forum 2011

I attended the Digital Library Federation Forum in Baltimore, October 31-November 2, 2011. You can take a look at the program at the DLF website. A couple of themes that showed up multiple times were data curation, digital humanities, and project management. I will report on the latter here.

Agile Methodolgy (Naomi Dushay and Tom Cramer, Stanford University)
The Stanford library has moved to Agile methodology for their software development over the past few years. Their presentation reported on their implementation of Blacklight using Agile. They are committed to testing their software at every turn, but it is difficult to do with many developers. Their solution is to use the Hudson software to keep track of how much of the code has been tested. If you get a bug report, you can see what hasn't been tested and start from there. Posting test coverage can work well as a "shame factor". You don't want to be the one to make the software fail.

Another problem that was identified in the development process is "too many meetings." Developers need blocks of time to code. To help give developers more free blocks of time, Stanford has instituted "developer happy hours" in which no meetings are scheduled. It has greatly helped morale. They also have monthly "dead weeks" in which all standing meetings are cancelled.

A final problem that they are addressing is competing priorities for developers' time. To help combat this, the project managers group meets monthly for iteration planning. Agile is difficult to do in a multi-project shop. Stanford has capped each developer at 2 projects per month. The iteration planning meeting produces a road map for the month--anyone can look at all the projects and see where there are dependencies. Jira is used for weekly sprints and its versioning feature helps a lot.

The Project One-Pager (Tito Sierra, MIT)
Digital projects fail for many reasons including:
  • goals of the project are unclear
  • there is disagreement about the goals of the project
  • the requirements are unrealistic or ambiguous
  • the project is inadequately staffed
  • there is a lack of consensus about project goals
  • the proposed schedule is unrealistic
  • the scope is poorly managed or unconstrained
  • project doesn't make sense on further inspection
Good planning and communication will improve resource allocation, schedule estimation and managing scope. Tito has developed a tool that helps with planning and communication. It is known as the "Project One-Pager", and it enables a shared understanding of the project before it begins. The Project One-Pager comprises six elements that are agreed to by all principles in the project. The elements are:
  1. Project title--a unique name to be used in all communication
  2. Objective statement--a concise high level summary of what the project intends to accomplish
  3. Requirements--a list of outcomes that must be achieved before the project is considered complete
  4. Out of scope--a list of outcomes that the project explicitly WILL NOT address
  5. Team--proposed core team roster with roles
  6. Schedule--a list of high-level milestones with proposed dates
The project manager drafts the initial version and then the team starts the iteration process. When all agree that each element is clearly defined, everyone signs off. Just to make sure there is no misunderstanding, they also have the dean or an appropriate dean sign off. This leads to a clearly defined scope that will not creep out of control.

The benefits of this tool are that it is simple and accessible by all, it creates useful documentation for use externally, and it can reveal fundamental problems at the start when costs are low. It is not used for managing the project throughout its life cycle, but it does offer a good starting point.

Thursday, November 10, 2011

Charleston Conference 2011

I (Robin Sinn) also attended the Charleston Conference. Some of the things I found interesting are below.

  • Several talks focused on e-reference. How to point undergraduate students to an online reference collection One library actually adds a subject heading to the records for their online encyclopedias, so they can be found with a catalog search. Several vendors are trying to build databases of reference works. Springer has a new product, there's Credo, as well as older efforts like MIT CogNet.
  • Open data, access, and research were discussed. One of the plenary sessions featured Cliff Lynch (CNI) and Lee Dirks (Microsoft Research) talking about the disconnect between research and the science publishing industry, and how technology might be able to help. I was part of a panel discussion on open access. It was interesting to hear about how The Optics Society has two entirely separate backend systems for their open and subscription journals. One of their problems is how to take author payments from non-English speakers who may not have a credit card or other electronic pay form handy.
  • One of the most interesting stories I heard was during a shotgun sessions. Librarians at Texas A&M - Commerce had to weed 40,000 items in a summer. They didn't talk to the faculty, they marketed what would happen with the new space, they created rules for temp hires to use when pulling, everything went through cataloging, onto pallets, and into closed tractor trailers. They eliminated 41 tons of print in 4 weeks. And replaced it with a newly acquired special collection and study space.
  • CrossMark is a new effort by CrossRef that will allow publishers to mark articles with information about retractions, errata, etc. This should help with version of record problems.

Tuesday, November 8, 2011

Charleston Conference 2011

I attended the annual Charleston Conference (Nov 2-5) and this year's theme was "Something's Gotta Give!"
I noticed two overarching themes for the plenary sessions: What we are giving up and how we are evolving, as well as "open." Open data, open research, etc.

Here are a few highlights:
  • I met one-on-one with the Credo Reference CEO, Mike Sweet, who showed me their new product, Literati. It is described as "a simplified, smart approach to managing information literacy” and I think it is definitely something to watch. It seems like they are also selling services, like creating video tutorials for libraries, which is a novel idea for vendors.

  • Linked data was another big point of discussion. For example, I attended a talk on Thursday morning on the semantic web (Highwire founder, Michael Keller) and then one immediately following called "Data Papers in the Network Era" (MacKenzie Smith, Research Director, MIT)

  • I organized a panel called, "Experiences from the Field: Choosing a Discovery Tool for YOUR Unique Library" which brought together five librarians from different types of libraries to explain what their evaluation process was like, why they chose the tool they did and the impact it has had so far. It was a success!

  • In keeping with Discovery, I attended a session from James Madison Univ on assessing usage statistics from Ebsco's Discovery, which helped to inform our current practices in usage data evaluation for Excelsior's instance of Discovery.

  • I attended a session that explained how 2 librarians from Eastern Michigan University evaluated their Wiley collection to move from a "Big deal" package to purchasing titles on an individual basis. They not only used usage data for each title in the package, but also had one-on-one faculty interviews to see which journals they deem important. It was a neat approach (that seemed like a lot of work), and they ended up saving a lot of money.

  • The closing session went back to the conference theme for this year and was called "The Status Quo Has Got to Go." Speaker, Brad Eden, Dean of Library Services at Valparaiso University discussed everything libraries are doing wrong and how we can fix it. He ended with many motivational quotes, on of which was "Don't play it safe. This fosters mediocrity, which leads to decay in a competitive environment." (The powerpoint has many references to current articles and links and is worth checking out!)