Tuesday, June 30, 2009

DLF Spring Forum 2009

The forum was smaller than usual, but that had the benefit of making it a more intimate gathering, where you could have conversations with almost everyone. If there was someone with an interesting idea or presentation, you could easily find them and continue the conversation. As with many conferences, some of the most useful parts happen in the hallways and on breaks.

One of the most interesting presentations to me was Melanie Schlosser's “Whose Stuff Is It, Anyway? A Study of Copyright Statements on DLF-Member Digital Library Collections”. Schlosser did a survey of copyright statements presented on items in library digital materials, and found that most libraries were presenting incomplete, inaccurate, or mis-leading copyright statements, leaving users un- or mis-informed about their legal rights to use the materials. Schlosser suggested that libraries missions required us to do a better job of providing information on particular items we 'publish' online, as well as using it as an opportunity to educate users about their rights under copyright in general.

This forum for the first time (as far as I'm aware) featured something taken from smaller and more informal conferences: "Lightning Talks." Short 5 minute talks any attendees can sign up to give right at the conference. I gave a presentation on Umlaut, the open source software we use to power our Find It service, explaining why i think this is an important innovative building block to a powerful user-centered library infrastructure. I am always trying to stir up more interest in Umlaut, because I think the more libraries adopt and contribute to it, the safer our own resource investment in the open source product is.

Another session that got a lot of 'buzz' was Clay Redding from LC's presentation about the new http://id.loc.gov service. In addition to providing a simple free online interface to the LC subject authorities (which is in some ways arguably easier for finding subjects for assignment than the for-fee Catalogers Desktop I used in library school), what's especially exciting about id.loc.gov is that it provides machine-access to the same information, theoretically supporting seamless (and free) integration of live LC subject authorities lookup into other third party software, any other software whose developers are interested in making use of it. There are still a few quirks to the service, and a few parts of the original LC subject authority records not represented in the id.loc.gov service. It's just the beginning of a work in progress (at least we hope LC keeps it progressing). But it's this kind of service that will bring library metadata into the internet era, in my opinion. For another similar work-in-progress service (although not presented on at DLF) for name authorities, see also the VIAF project at http://viaf.org.

Wednesday, June 24, 2009

SLA 2009 Washington, DC

I spent most of the time talking with publishers and information providers at the exhibition area during the SLA conference. The primary goal was to find out shibboleth enablememt and technical contact person for follow up questions. The "survey" result is being updated in the Internet2 wiki page at https://spaces.internet2.edu/display/inclibrary/RegistryOfResources

With the conference "guest pass", I was able to sit in few sessions that include Collin L. Powell's keynote (I did not know he is that humorous), Meg Smith (The Washington Post): Searching social media sites, and OCLC cataloging services ...

Some observations:
. no ILS vendors exhibite at this year SLA except Sirsidynix with 2 staff telling people to contact their customer agent for questions. a sign of no new production or major development to show off from ILS vendors. In comparsion with the past SLA conference, ILS vendors are among the major players

. Information providers are all trying to show their new google like interface, IEEE for example, developed search interface for mobile devices.

. Ebooks are catching up, Springer showed that their ebooks increased 63% every year, and have designed "MyCopy" service that with subscription to the ebook, customer can request to print the book with $24 per copy. In near future, the ebook can be printed in remote facilty with epub standards.

. Many talks on social network, and impact on library services. I like Meg Smith of the Washington Post talk, because she is able to really use the information from facebook and other social network to improve the quality of her work as a journal writer. We need the "same" story for librarians.

. OCLC Cataloging Services had a difficult start, at least for the session, only 6 people sit in the room prepared for 100. This services is offered so that library needs no cataloger, just send books to OCLC, they will do the cataloging for you.

. OCLC cloud computing/OCLC web scale ILS is still in cloud, by talking with OCLC staff at exhibition and presenter at various OCLC sessions, this web scale ILS is still at planing stage.

. Last but not least, I won a Kindle 2 ebook reader from CQ publisher, I returned it to Amazon to exchange for Kindle DX so that at least I can read some ebooks in Chinese.

-- Foster Zhang, Library systems

Monday, June 22, 2009

Rare Books and Manuscripts June 2009

The RBMS "Preconference" took place in Charlottesville, VA, June 17-20, 2009. The theme of the meeting was "Seas of Change" and there was a special emphasis on "new and emerging voices," but the meeting was also devoted to a retrospective look back over RBMS' fifty-year lifespan.

I presented at a seminar called "Finding Common Ground: CLIR Postdoctoral Fellows on Scholarly Engagement with Hidden Special Collections and Archives." Our aim was to show the value of "scholarly engagement": the value of holdings are increased and research becomes more efficient and exciting when librarians and scholars colloborate. Specifically we focused on the opportunities that arise on the "common ground" of special collections for conversation. What do librarians learn about their holdings when they talk with scholars, and what do scholars learn about their research areas when they talk with librarians? We presented several "case studies" that demonstrated the potential of the special collections conversation to lead to new knowledge. But because the session was a seminar, we were also really interested in hearing about the experiences of session attendees. People told many wonderful stories about special collections interactions that led to concrete outcomes like publications and performances, but also less tangible outcomes like creating excitement for primary source research among high school students. Out of these stories, we drew up a list of ideas about how to encourage meaningful, productive conversations in the special collections environment.

I was pleased to discover that the issues our group addressed were also addressed--somewhat differently--in several other sessions. One in particular that seemed to represent, perhaps, the conference's high point was "Public Services and 'Un-Hidden' Collections: What We Know and What We Need to Know," with Shannon Bowen from the University of Wyoming; Jennifer Schaffner of OCLC Research; and Victoria Steele of the New York Public Library (but until recently, head of special collections at UCLA). After presenting evidence of the impact of the "More Product, Less Process" approach to processing backlogs, the presenters made the case that this shift has led to more pressures on public services. Ie, as more finding aids and even container lists have become available to researchers, especially online, reference requests, copy orders and other kinds of research services have increased to the point that many libraries are finding it difficult to both process and service special collections. It was recommended that special collections staff need to keep careful statistics to show their impact quantitatively, but also that other kinds of evidence needs to be amassed. (Hence the connection to my group's "scholarly engagement" session.) A call also went out for a sexier name for "public services"! If anyone has any good suggestions, let me know and I'll pass them along.

--Gabrielle Dean

SLA 2009

Highlights of some of the SLA 2009 sessions I attended follow. If you’re interested in additional details or specifics from these and other sessions please don’t hesitate to get in touch with me at bpralle@jhu.edu.

Biomedical and Life Sciences Division Contributed Papers Breakfast – A) Excellent presentation on the implementation and use of Vivo, an open source expert database technology, developed at Cornell and being implemented at University of Florida. B) The second presentation focused on an evaluation of a broad range of e-book platforms at the University of Toronto. Interesting findings were shared including the discovery that students prefer Springer over eBrary because they could save single chapters (verses being restricted to only viewing 5 pages at a time.)

ROI 2.0 Corporate Librarians – This session was led by George Scotti from Springer. He highlighted key studies that have looked at how to measure ROI (Return on Investment) in libraries specifically considering key measures in usages, time saved, and impact on decision making. From this research they came up with the following model for measuring ROI:
% of Needs Met X Time saved (in dollars) X Value of service (dollars that would have been spent outsourcing service) X Cost of operating library

Diversity in Leadership: Generation X – The Changing Paradigm in Knowledge-based Society – I came into the discussion part of this session which was quite lively and reminded me of a session I attended at the JHU Diversity Conference. Baby boomers were eagerly asking Gen Xers about how they could work better together. The single most compelling quote in the conversation was a Xer observing the following about Millenials, “We (Xers) are all about work life balance but Millenials are about work life integration.”

Translational Medicine Meets the Semantic Web – Oliver Bodenredier of Lister Hill National Center for Biomedical Communications at NLM spoke on this compelling topic to a completely packed room. He shared an interesting HCLS mash-up on neuroscience resources and went on to explain that you can link data through key resources and shared identifiers. They have developed a system where they identify “triplets” in the data across public datasets that allow the data to all connect providing bridges across these datasets and a robust view. He noted there are billions of triplets to be found across distributed repositories in the “data cloud” and shared the following demo site: http://skr3.nlm.nih.gov/SemMedDemo/index.jsp

Another session which drew a standing room only crowd was “The New Face of the Special Librarian: Embedded Librarians”. Mary Talley Garcia, currently conducting research into embedded librarianship, defined embedded librarianship as: a) librarian who drives interactions with customers, b) hearing unasked questions, c) generating work. She spoke about the need to reframe skills, do more analysis, upscale end work products, and impact the bottom line. Josh Duberman of the NIH Library went on to share with us their informationist model. Informationists are embedded into both physical and virtual research teams. They do rounds, develop protocols, handle individual instruction, analysis the literature, manage current awareness, help with publication preparation, conduct bibiometric analysis, competitive intelligence, complete research for policy decisions, etc. They view themselves as internal consultants. Lessons learned include: have high level of subject expertise, find mentor, be visible, and be flexible. A nice quote at the end of the presentation in response to questions about being virtually embedded was “Embedded-ness is a state of mind. You don’t need a library to be an embedded librarian.” The final speaker was a solo librarian from Suncor who shared how the librarian was embedded into the continuing education and training group. I followed up with Mary Talley Garcia after the meeting and she sent the following link to her second presentation highlighting in greater depth their research: http://embeddedlibrarian.wordpress.com/

60 Sites in 60 Seconds – Too many sites covered in this fast paced session to list here but interesting ones to check out are:
Citebite.com – Paste in a piece of text and a url and go right to that selection in an article.
Drop.io.com – A neat collaboration and file sharing service.
Mashable.com – Biggest blog on social media sites.

Practical Strategies for Improving ROI – This panel of four shared some very specific strategies and examples of how they have communicated the value of their libraries to management. The first speaker, Karen Reczek of the Bureau Veritas, rescued her library by preparing a detailed account of all her services, the impact of these services on the organization, and clarifying if anyone in the organization could take them over if the library closed. Her advice was to a) reach out and make sure that there’s at least one service that upper management uses in the library, b) be flexible and willing to drop services if they are no longer a priority, c) figure out what information will affect the business of your organization. Steve Lastres of Debevoise & Plimpton manages an integrated Knowledge Management Center and Library in a large law firm. He recommended the library be canvassing for opportunities to provide information.

In addition I had numerous conversations with vendors about new products and services and networked with many different librarians, uncovering more than can be squeezed into this posting! Again feel free to get in touch with me if there's anything that is of particular interest.

Sunday, June 21, 2009

CERN Workshop on Innovations in Scholarly Communication (OAI6)

Early this year, I was invited to speak at this years CERN Workshop on Innovations in Scholarly Communication. This years meeting, nicknamed "OAI6", just finished up and I wanted to offer my perspective on some of the highlights of the meeting.

For related tweets on Twitter, search tag #OAI6 or visit this link.

Herbert Van de Sompel (LANL) opened our session and the meeting with an excellent overview of scholarly communications landscape. The presentation is ripe with useful references and I urge you to take a look if you are interested in this topic. [Link]

In the same session, Rob Sanderson (University of Liverpool, soon to be LANL) introduced us to some impressive, yet simple tools to visualize ORE Resource Maps. These tools were developed as part of the Foresite Project and are available as open source for reuse. [Link]

Also in the same session, I talked about our application of ORE to simplify publishing workflows in our project to capture and link data with publications. [Link]

Later in the program, Johan Bollen (LANL, soon to be UI Bloomington) described the work done by himself, Herbert Van de Sompel, and others to perform quantitative analysis and assessment of article and journal value. The system they developed, called MESUR, looks at a variety of facets, not just impact factor. In fact, the work that they have done seems to show that impact factor is not a good indicator of a journals usefulness. [Link]

Thursday, June 18, 2009

SLA 2009

SLA 2009, which also celebrated the centennial of the organization, was held June 13-16 in Washington, D.C.

In the interests of brevity, here are the most significant and/or interesting things that I learned. If you'd like to know more, please call or send me a note at svazakas@jhu.edu.

  • Branding -- More publishers are offering branding; that is, the ability to put a JHU or MSEL logo on their pages. My opinion is that we should brand everything possible.

  • Beilstein -- The name "Beilstein" is going away. If/when we renew in January 2010, we will be purchasing access to something called "reaxys" ("re-AX-is"), which is basically the continuation of Beilstein. (Yes, I gave Elsevier some grief about the stupid name.)

  • Patents -- Another way to get patents -- Free Patent Fetcher (I haven't tried it out yet.)

  • Morgan and Claypool, from whom we get the "Synthesis" series of online short e-books, now has a similar series in life sciences, called "Colloquium." These are research-oriented, not clinical. They have free downloadable MARC records.

  • SCOAP3 -- Attended a talk by Salvatore Mele of CERN, who visited JHU early last year. They're up to 63% of the commitments they need.

  • The Future of Print -- SIAM (Soc. Industrial Applied Math) will henceforth produce all new journals e- only.

  • Discount -- SPIE is giving a 10% price rollback in 2010, and fees will freeze at 2009 rate. If we get a 3-year contract ('10 - '12), the price for all three years will be that of 2010.

  • E-books -- SIAM is launching them, SPIE is launching them,

  • Open Access -- ROARMAP shows which countries have or are considering OA policies, and links to them. Harvard's OA person spoke and said that faculty must *regularly* be reminded of how much things cost. Most faculty are unaware of the copyright rights they do have, but they also self-archive without caring whether or not they're allowed to. She mentioned Sally Morris's 2009 article "Journal authors' rights: perception and reality." She had many talks with publishers about OA; watch PLoS1 (Public Lib of Science) for an article about "which publishers make it easy." Also, around September 1, some other Ivy Leagues will begin underwriting fees.

  • Inst. Repositories -- Harvard now has DASH, "Digital Access to Scholarship at Harvard." They had about 40 undergrads doing the grunt work of entering metadata and other data into their repository.

  • Blog Rankings -- Kent Anderson, who edits the "The Scholarly Kitchen" blog, said that if your blog is on typepad or wordpress (like SLA's and like ours), you don't get authority for it. That is, "popularity ranking" engines will attribute traffic on our sites to typepad and wordpress rather than to SLA and MSEL. This speaker encouraged everyone to spend the time and the $35 to have blogs moved to our own domains. Also, this is the blog that wrote the phony article for Bentham Science and then blew the whistle when it was accepted.

    [Speaking of poor peer review, I have a handout listing peer-reviewed chemistry journals which accepted articles using Wikipedia in their reference lists.]

  • Web 2.0 - Google is now the web's library; Twitter and Facebook are its coffeeshops. More and more things are "out there." Example: an article about diabetes type 1 appeared in NEJM and a diabetic blogger twittered about it; a physician wrote to ask her if she had read the whole article and she said yes, because NEJM made it free; the doc said he would have sent the article to her if she couldn't get to it herself. Another example: Lance Armstrong twitters, so all his followers -- basically the whole cycling world -- now knows about what's going on with him at the same time "Cycling News" does.

  • Data.gov -- Several sessions mentioned data.gov. It's just what it sounds like; check it out.

  • Citation Info -- Great session about WoS, Scopus, and Google Scholar and their citation info. WoS ruled until 2004, and in late '04, both the others started. There were lots of interesting comparison stats, which are supposed to be available somewhere and I'll find them. Note: Scopus lists patents but does not follow who the patents cited. Also, Scopus records go back to 1823, but their citation info goes back only to 1996. Scopus just added 1,450 new arts and humanties journals last month.

    WoS now has Conference Proceedings Citation Index fully integrated. AND they now capture funding and grant data! Lots more new stuff about WoS, too; they're feeling the Scopus heat.

    They did mention Quosa, saying that you can download 50 articles at a time with it. (We have Quosa.)

  • Facebook -- Fascinating Wash Post reporter, who won Pulitzer for her coverage of VA Tech shootings, explained how to use Facebook etc. for doing research. I was riveted.

  • "Summon" -- ProQuest has a new tool they think is the best completely seamless way of getting at all the library's stuff. Here's a picture of what the search pages look like.

    Okay, there's more, but you must be tired of reading by now. So much is going on in our world !!

Wednesday, June 10, 2009

DLF Spring Forum 2009

In early May, I attended the Digital Library Federation Spring Forum 2009. Attendance was extremely low, compared to previous DLF Forums that I've attended. The atmosphere was somewhat somber, as DLF prepares to relinquish its independent status and return to the CLIR fold. Chuck Henry, the President of CLIR, talked about this during his opening remarks. CLIR is currently committed to continuing the two meetings per year format used by DLF. He also talked about expanding the mission into new domains, mentioning specificlly that CLIR is exploring relationships between spy agencies and humanists through its research programs.

Clay Redding gave a very interesting presentation describing how the Library of Congress is building linked data services for authorities and controlled vocabularies. The first data to be made available through this mechanism is the Library of Congress Subject Headings (LCSH). The idea behind the linked data approach is to provide information that can be acted on by both humans and machines. This is done by creating identifiers for each term and using a consistent knowledge representation to describe it. Additionally, each collection and concept has its own URI, making it a resource. These resources can be related to other URIs or terms using RDF. With this framework in place, existing semantic web tools can be brought to bear, including inferencing and visualization. While LCSH is first out of the gate, LC plans to release many more resources using this approach.

David Ruddy of Cornell also gave a talk about a novel approach to linking resources using OpenURL. The approach employed both institutional- and domain-based OpenURL resolvers, with the idea that the domain-specific resolvers could fill in needed data before passing requests on to the institutional resolvers. While still fairly adherent to the OpenURL 0.1 approach, it is heartening to see work like this that starts to move us toward a more sophisticated approach. I would like to see more applications take fuller advantage of the 2004 NISO OpenURL framework, as has the Djatoka service being developed at LANL and presented at Open Repositories 2009 by Ryan Chute.

Tuesday, June 9, 2009

NISO Assessment and Performance Measures Meeting June 1, 2009 Baltimore.

There were a number of really good speakers that day. Here are some highlights from key points from my perspective.

From Steve Hiller - UWashington
We live in a competitive world. Even in academe there is competition. Key questions we need to ask ourselves:
  • What do we need to know about our customers to succeed?
  • What do our stakeholders need to know?
  • How do we measure our services, programs, resources to fill our customers needs and stakeholders expectations?
Libraries have generally collected performance measures in an input/output format (number of checkouts, gate count, number of volumes held, etc.). While these measures are generally easy to document they do not provide real outcome measures. They do not measure value or quality. They do not necessarily measure what is important to either your customers or your stakeholders. We need to be measuring what difference we make to our patrons, to the university, and the research process.

Back to competition. If you take the libraries budget - say $25M over ten years that is a quarter of a billion dollars. What kind of return did the university get on that investment? What other things could it have invested in that it didn't?

As an organization we need to be measuring:
  • Libraries contribution to teaching and research
  • The value of the library to the community
  • Changes in library use and what that means to the community
  • Our organizational effectiveness
  • Our collaborations
We have to keep asking are we measuring stuff that is easy to measure or important to measure?

Ideas for new metrics:
  • Uniqueness of collections
  • Value of consortia
  • Efficiencies of administration and budget
  • ROI
  • Data access, organization, and preservation
  • Contribution to faculty research
  • Generating new knowledge
  • Student outcomes and student learning
From Susan Gibbon's URochester
  • Libraries glorious past as the "heart" of the campus is not our future.
  • For the library, there are competing interest at the university level and we need to be articulating the value we are providing.
  • We need both qualitative and quantitative data.
  • The largeness of the collection does not equal its effectiveness or how it makes a difference.
  • Assessment is a shared responsibility throughout the organization. We need to build a culture that values assessment and is customer centric. Assessment has to be local.
  • By the time something is a trend we are too late.
  • They actually walk around occasionally and document how stuff is being used.
  • They have inserted the library into a number of "non-library" areas. The library is part of the writing program and is part of Student Services where librarians become advisers.
URocherster libraries recently completed a two year study on their graduate students and how they went about becoming the next generation of scholars through their dissertation process. Their report is in their IR.

Monday, June 1, 2009

Open Repositories 2009

In mid-May, I attended the annual Open Repositories meeting for the third time. I noted after attending this conference for the first time in 2007 that it was the most practical conference that I had attended in some time. And the last two years have done nothing to disabuse me of that notion. In fact, if there were anything that I would complain about, it would be that the conference is a bit overwhelming because it is so information rich.

I should note that Sayeed Choudhury and Elliot Metsger also attended this meeting and have already blogged about it.

Data Curation, Archiving, and Preservation

Because of the Data Conservancy (our DataNet project currently in the start-up phase) and our DataPub project currently underway, curation of and long-term access to data is of key importance to the Digital Research & Curation Center (DRCC) and the Sheridan Libraries in general. Many of the presentations covered issues of interest in this area. I'll highlight a few of them below.

As Sayeed mentioned in his post, Michael Witt of Purdue spoke about research into the development of data curation profiles. This work is a collaboration between Purdue and UIUC's Graduate School of Library and Information Science. Their approach is based on discussions with researchers and employs an initial unstructured interview to get the conversion started. One of the most interesting findings thus far relates to issues of the data sharing (with whom, after what activities. Michael presented an earlier version of this work at a Sun Preservation and Archiving Special Interest Group (PASIG) meeting. More information can be found on the project site.

John Kunze of the California Digital Library and our own Sayeed Choudhury both spoke in a session devoted to the recommended NSF DataNet projects. John spoke about the Data Observation Network for Earth (DataONE) project, led by University of New Mexico. Sayeed spoke about our project, the Data Conservancy. My focus was on the IT and data frameworks of the two projects. The approaches are different in many ways and it will be interesting to work together to establish the kinds of data management partnerships envisioned by NSF in the creation of the DataNet program.

In addition to the talks, Sayeed and I pulled together a birds of a feather session, which he was unfortunately unable to attend. I was there to represent the Data Conservancy's process and approach. John Kunze and Stephen Abrams, both of whom I was fortunate enough to wrangle at the last minute, represented the perspective and approach of DataONE.

Simple Web Service Offering Repository Deposit (SWORD) and the Open Archives Initiative Object Reuse and Exchange (OAI-ORE, or ORE for short) are two relatively recent developments meant to, respectively, reduce the burden of content deposit and improve the description and exchange of resource aggregations (think compound/complex objects) on the Web. We are employing both of these technologies in our DataPub (curating published data) project. Elliot has done a nice job of highlighting some of the ORE presentations in his post, so I will just add a few comments about the SWORD talks.

Pablo Fernicola gave a presentation describing work on an authoring add-in for Microsoft Word on the Windows platform. The add-in, currently in beta, will support ORE, SWORD, and the Publishing tagset of the NLM DTD. We have been working with Pablo on the ORE components of the add-in. This technology will allow an author to create a document, link it with data and rich media, describe the relationships of this components, and submit the package to a repository -- all without leaving Microsoft Word. While other approaches will be needed for other authoring environments (e.g., LaTeX), these tools go a long way to lowering the barriers to contributing and reusing content.

Adrian Stevenson and Julie Allinson shared a talk describing ongoing work in the second phase of development (SWORD2) and some of the history behind the development of the original SWORD protocol specification and implementations. It is now possible to deposit content into a properly configured Fedora, DSpace, or Eprints repository through Facebook, a web client, and a desktop client (among others). As I mentioned previously, the Microsoft Word will soon support SWORD deposit via an add-in.

Repository Challenge

The Repository Challenge started last year at the Southampton Open Repositories meeting and was organized by David Flanders, then of the JISC-funded Common Repositories Interface Group, with the goal of getting "developers working in small teams to try to quickly pull together established platforms and services to demonstrate how to achieve real-life, user-relevant scenarios and services."

This year's Challenge was again organized by Flanders, now of JISC proper.

The Repository Challenge winner this year was Tim Donohue of UIUC. Tim used JavaScript (JS) to implement a system he called "Mention It". This JS library allows a web page designer to embed into a web page an aggregation of mentions of a specified string on Twitter, FriendFeed, Technorati, and Google Blog Search. Among many other uses, this would allow repository developers to embed the display of mentions for a digital object by specifying its splash page, item, or Handle URI.

The runner-up, Rebecca Sutton Koesar of Emory, created FedoraFS, which combined a Fedora Commons repository with FUSE (Filesystem in User Space) to support access to repository content as if it were in regular files. For example, a PDF file stored as a datastream within a Fedora digital object could be accessed with a standard desktop PDF viewer. Her entry video is available on vimeo.