Tuesday, May 26, 2009

Cyberinfrastructure Software Sustainability & Reusability Workshop

In late March 2009 I attended the National Science Foundation-sponsored Cyberinfrastructure Software Sustainability and Reusability Workshop. The NSF, which funds millions of dollars worth of software engineering, is examining what needs to be done in order to recoup their investment in software engineering and better support scientific research. (Presentations are hard to link to individually: see a list of all presentations here: http://cisoftwaresustainability.iu-pti.org/nsfworkshop/nsfCalendar/nsfCalendar.html)

One of the most fundamental questions to be answered is what are the distinguishing characteristics of Cyberinfrastrure (CI) software; what distinguishes CI software from other software? How is CI funded, and how is it sustained beyond the initial funding period (both from a fiscal and software engineering perspective). What changes can be made by NSF to enable CI sustainability? Plenary sessions introduced these topics and set the stage for breakout sessions, where the workshop participants examined these questions interactively and in more detail.

Plenary sessions examined existing models of software sustainability, with the idea that they might be adapted for sustaining CI. Brad Wheeler presented on why software sustainability is a problem and how it manifests itself in the higher-ed/research community. One challenge is the diversity of research domains, and the relative maturity and familiarity that these communities have with software engineering. Across these communities Brad discussed commonalities of various (open source?) software projects: Code, Coordination, and Community - noting that the models used in one community may not work in another. Different disciplines may use models to address software sustainability challenges.

Neil Chue Hong (the Director of OMII-UK) gave an excellent presentation on the work done at Open Middleware Infrastructure Institute, an organization which is responsible for supporting e-research software in the UK. OMII-UK was funded in two phases. With their first phase of funding, they focused on hardening existing software and writing code to fill gaps in software functionality. Existing software artifacts were ingested into their workflow, hardened, supplemented with additional functionality (if required), packaged and distributed. With their second phase of funding, OMMII-UK kept their QA and software ingest workflows, and switched their focus from middleware to user relevance and adoption of the software (an aspect of sustainability). Currently they provide community developed software and consulting services. They provide foundational services such as centralized help desk support and testing for software under their umbrella. They also provide responsive development: where gaps in a software artifact are identified, they try to fill them (with original development if needed). OMII-UK is addressing many aspects of sustainability in the UK, and their experience should be invaluable to the NSF.

Neil had a good metaphor. We've heard "free as in beer" and "free as in speech". Neil used "free as in puppy": the kids get a puppy for free from an animal shelter or where ever, but you still have to take care of the puppy and all that entails.

I attended two breakout sessions. The topic of the first session was building inherently reusable software, where the following issues were discussed:
- What is being sustained? Sustaining the abstractions/API (noting that the abstractions/APIs differ across domains, and that many domains (Astronomy is one exception) haven't identified the abstractions or services) or the actual bits?
- Having professional auditors look under the hood at the software build process, testing, and source code management practices
- Identifying metrics to evaluate the health of a software project
- What role does the funding agency play in certifying or auditing software quality

The second breakout session topic was metrics for software sustainability: what can (or should) be measured in order to describe sustainability; a so-called sustainability score. This session disappointedly wasn't able to supply concrete recommendations. General ideas were posited but specifics were left out.

Cliff Lynch wrapped up the meeting with a few observations:
- the need to distinguish between CI software and software
- the assumption that sustainability can only be achieved with open source
- the software lifecycle is too short (OMII-UK says the shelf-life of an artifact is 6mo - 1yr)
- the archival of software is something not well understood (and packaging it up in a VM is a heavy-handed solution)
- increasingly data is seen as an asset; embargoes on data; human subjects data; Issues surrounding privacy and informed consent.
- investment in a stable, backwards compatible, stack costs. Linux distros releasing every 6 months make it easy to be on the cutting edge for "free". Backwards compatibility isn't cheap.

No comments:

Post a Comment