Sunday, December 11, 2011

7th International Data Curation Conference

Last week I attended the 7th International Data Curation Conference (IDCC, twitter #idcc11, in Bristol, UK.   The theme of this year's meeting was "Public? Private? Personal? navigating the open data landscape".

In addition to a large number of very stimulating presentations and discussions throughout, the meeting was bracketed by two interesting keynotes.

The opening keynote was delivered by Ewan McIntosh of NoTosh (  He talked about moving away from the often dry messages of research, research data, and the data management that supports reuse and toward stories that demonstrate impact of the research and data.  He talked about the work that his company has done with some  young students to develop TED-like talks (  Some of the lessons learned from this activity, he feels, are more broadly applicable to improving data reuse: 1. tell a story 2. create curiosity 3. create wonder 4. solve pain 5. create reasons to trade data.

The closing keynote was given by Natasa Milic-Frayling of Microsoft Research Cambridge (MSRC). Natasa talked about the Microsoft team's interaction with a working research team. The big lessons from their work relate to the disconnect between how clean and precise we would like the data management processes to be and how messy they are in real life.  The lab team's pragmatism is not typically mirrored by those of us who work in the realm of data management.  A couple of examples:
  • Though the researchers used electronic notebooks and many steps in their process were captured therein, only more informal summaries of these processes fed into next steps in the data provenance chain.
  • New technologies were brought to bear on the research team's workflow.  These technologies were not perfectly aligned with the needs of the project and there were some compromises made to achieve workable integrations.  These compromises made it difficult to accurately track and verify the origins of some information that became part of later data products.
Presence in the researcher's lab was the only way for the MSRC team to detect these issues.  Understanding the lessons of this work will inform the way we engage with researchers in our own institutions.

As I mentioned previously, the conference had many excellent presentations.  Links to presentation slides and other resources, where available, have been added to the online conference program.

Friday, December 2, 2011

Maryland E-Books Summit

by Sue Vazakas

Kristin Bernet, Sharon Morris, and I attended this event in Columbia, MD, on November 29-30, 2011.

Pre-Summit: Publishing 101 and Libraries 2015

This 1/2-day event was intended to give some background about how the publishing industry works and what's going on right now in their world.

(1) Steve Potash, Overdrive
(2) Nora Rawlinson, Publisher, Earlyword: the Publisher/Librarian Connection
(3) Josh Marwell, Director of Sales, HarperCollins
(1) Overdrive provides e-books mostly to public libraries and is getting bigger and bigger. For example, if you go to the Baltimore County Libraries site, find an e-book, and click on the "download" link, you'll be taken to the Maryland's Digital e-Library Consortium, "powered by Overdrive." Maryland's site had a growth of 300% over the past 1.5 years.

Interesting fact: Overdrive is adding more and more publishers to its service, and just signed O'Reilly, whose stuff will be DRM-free (as opposed to its stuff on Safari, which we have).

Another of their initiatives: adding a "buy it now" link on their site. Whichever library's portal you go through to get to that link will receive a % of your TOTAL purchase, even if it's not books! NYPL and Philadelphia Free Library are piloting this.

(2) Rawlinson discussed "The Big 6" publishers, which is actually more than 6: Random House, Simon and Schuster, Penguin, HarperCollins, Scholastic, Hachette, Harlequin, Wiley, Holtzbrink (formerly MacMillan). Each has its intrigues; e.g., HarperCollins' "after 26 checkouts, library must buy it again," and Penguin's November pull-out of all new e-books from all public libraries, strangely citing "security concerns" even though Overdrive has DRM.

Another recent newsworthy event was Amazon's Jeff Bezos pricing the bestsellsers instead of the publishers. But Apple suggested another model: the publisher sets the price and the seller can discount them, which Amazon did. The Kindle Fire loses $3 per sale, but probably doing this so that it will sell more e-books. Amazon, the elephant in the room, also has several of its own imprints now.

(3) The HarperCollins speaker, when asked about the 26-checkout policy, said "it's always under review."

He believes that publishers add value by doing these things very well: developing talent, creating products (enhanced e-books and apps as well as print books), maintaining high quality of e-books, helping with e-book discovery, and marketing them, among other things.

They will be putting Espresso print-on-demand machines in some bookstores for their backlist trade publications.

As of now, 71% of their business comes from physical bookstores.

The full-day meeting had about 300 attendees, evenly divided among academics, schools, and public libraries.

(1) Sue Polanka -- eBooks in Libraries: Big Dreams Meet Reality
(2) Robert Miller -- A Digital Library: Too Little, Too Late, or Just in Time
(3) Mary Minow -- eBooks: Buying vs. Licensing
(4) Joseph Sanchez -- The eRoad Less Travelled: Getting Control, Staying Relevant
(5) Eli Neiburger -- Libraries and eBooks in this Century: What to Do Now, What to Do Later

Here are just brief highlights:

Sue Polanka -- Her blog is No Shelf Required. She gave an overview of e-book business models and pros/cons. Recommended Eric Hellman's Glue Jar blog. She has an article in the Nov/Dec 2011 issue of American Libraries about buying e-books.

She discussed other developments, including:

--- The 3M Cloud Library: You pay $10K for the platform, and it even includes a "discovery station" (shown on the site) that shows you book covers, allows searching and browsing, and then lets you download the e-book onto your USB!
--- Freading: Short-term loans for $.50-$2 each, nothing owned, no fees, you buy the same content again as many times as you want to read it. Interesting but needs *lots* more content.
--- Consumer subscription services: These include Amazon Prime, Baen, 24Symbols (Spain), Afictionado (UK). Easy to use and cheaper than buying

Robert Miller -- Internet Archive

--- JHU should join!
--- They collect discarded books and even pay for shipping!
--- FREE e-books. They've had 10 million downloads per month on 2 million classic e-books!
---; 415-640-1092 (San Francisco)

Their system respects copyright laws because it's one user at a time. Duplicate books are not a problem, but they suggest choosing local items that are harder to find. His presentation blew everyone away.

Mary Minow -- She's a lawyer and Executive Editor of Stanford's Copyright and Fair Use site. She discussed "first sale," the Hathi lawsuit, DRM and DMCA, and the lack of accessibility of almost all e-devices.

Joseph Sanchez --- Fascinating fellow. Instructor designer at a college in Colorado; blog is thebookmyfriend. He gave an overview of the current "information ecology," including points such as

--- We're in the era of "non-expertise" (i.e., crowd-sourcing). Amazon and Apple aren't run by MBA's, just techies. They're both operating at a loss and don't care.
--- Overdrive doesn't manage its own DRM; Adobe Content Server does.
--- CIPA is Colorado Independent Publishers' Association. Sanchez's friend Jamie LaRue (whom Kristin and I heard at last month's NISO e-book conference) *bought* an Adobe CS system for $10K, then asked CIPA for their files so that their *own* consortium could be set up.
--- Patrons don't care about owning; they just want access.

Eli Neiburger --- This fellow was the lightning rod; one of this year's LJ "movers and shakers." He spoke very fast and was funny.

His scenario is that libraries *and* publishers are dead in their current forms. He showed a four-square chart, with combinations of "publishers thrive," "publishers die," "closed," and "open," and then described the different combinations.

The "open/publishers die" combination included "free," "no DRM," "no publisher deals because no publishers," and "libraries collect and store." For this combination, then, libraries need to have the infrastructure to collect and store, which we currently don't. We should store the content OF the community, rather than pay for commercial content FOR the community; e.g., school yearbooks (JHU technical reports), and material that doesn't exist anywhere else.

Other points:

--- Library economics work when each use gets *cheaper*, NOT stays the same
--- The price of $9.99 for an e-book is insane. If they were $.99, the authors would get about five times more money
--- Or, offer them free with something else. Example: in June 2011, 65% of the biggest money-making games were FREE to play.
--- Libraries must provide unique experiences, that you can't get anywhere else. Take half the collection money and spend it on events!
--- The Ann Arbor libraries started a site called Old News, which includes Ann Arbor newspapers that died and were digitized and put up for free, for *their* community and everyone else.
--- They also held "Scanning Day," on which people could bring a box of stuff and get it scanned while they waited. The library got great pictures never seen before.

Bottom line: Libraries must be prepared to pick up the pieces when the publishers (like ice houses and tgravel agents) are gone. The 21st century library must bring *its* community to the world.

Please let me know if you have questions.