Liveblog: BRDI: Discussion with Sponsors

I’m liveblogging the first meeting of the new Board on Research Data and Information today and tomorrow, and will be liveblogging. Standard liveblogging disclaimers apply. The presentation slides are on the meeting site. Because the slides are online, I’ll focus on what’s not on the slides.

National Science Foundation
Edward Seidel

Data-driven collaborations for complex problems: Affecting every field of science
Science and society being transformed by cyberinfrastructure and data-driven approaches (e.g. Pampers simulating diapers) — cf. Wired, “The End of Science”

NSF vision: national, integrated system to enable new paradigms of science
1. virtual organizations for distributed communities
2. high performance computing (what do we do with the data?)
3. data visualization / interaction
4. education & workforce

any cogent plan must address the phenomenal growth of data in all directions

goal: catalyze development of a system of collections that’s open, extensible, and evolvable; support development of new tools and services
national digital data framework, integrate with national CI

how to build data-driven science? methodologies, culture, education, working cross-disciplines
technological and economic issues: how do we do it? how do we pay for it?
open access: data, software, publications — we need to change the paradigm

rough plan: fill out holes in current CI; workforce development; bring in computation science; new problems: new groups to attach complex problems

Lucy Nowell

current NSF policy on data management is vague — no real consequences. goal: provide for clear, effective, transparent implementation.
concern: if we require sharing, we could damage the science. not everyone within NSF agrees on open data.

data sharing required for replicability and thus integrity of science
we need to accelerate the pace of science to deal with 21st century challenges — “we can’t afford cottage science anymore”

we’re now producing data faster than we can score it
related projects: long-lived repositories project by Bernard Reilly of Center for Research Libraries; Blue Ribbon Task Force on Sustainable Digital Preservation and Access
DataNet: achieve long-term preservation and access with systems that are economically and technologically sustainable

new international task force on data preservation and access? potential first meeting at OR meeting in May 2009

Sylvia Spangler

critical role of BRDI: act as conduit for encouraging National Academies to actively address these issues

Q: What can this group say when NSF program officers tell us that our science will die if we have to share data?
A: Nowell: When the community owns this, then the program officers who stand in the way will find themselves overwhelmed.

Q: Mike Carroll: Is the choice completely open vs. completely closed? Can we have deposit requirements with limited or delayed access?
A: Nowell: There’s a lot of issues — from data on human subjects data, to classified data, issues of data curation.

National Institutes of Health
Elliot Siegel of National Library of Medicine

[P.S. room is overcapacity -- nice that there's interest here]

Data Sharing policy: if you get more than $0.5m, your proposal must include a plan for making data available to other researchers or explain why you can’t
also Public Access Policy, GWAS Policy (must submit de-identified data to dbGaP), Clinical Trials Registration and Results Reporting

unique aspects of biomedical data: wide range of uses (clinical and research) and users, privacy

topics of interest:
how to recognize and reward data sharing by researchers
how to embed informatics/data sharing in training
development of a research agenda for managing scientific data and information
promote effective approaches for data sharing

Q: Cathy Wu: Consideration: collaboration with different sectors. Policy issues with EHRs.
A: Collaboration is being promoted by HHS — will be interesting to see how this plays out.

Q: Robert Chen: When government tells external researchers to share, intramural researchers may not always do so. We discovered a group at NIEHS has lots of useful data that they haven’t gotten around to sharing. How to lead by example?
A: I’ve been involved with the tech transfer officers for each institute, they’re responsible for working for intramural researchers to help them do that.

Q: Lesk: What can we do to help you?
A: Once upon a time, had a problem communicating grant requirements to investigators — less of a problem today — but there’s an educational role to be played. Public access is the will of Congress and the will of the people — at some point you’ve got to stop debating and focus on getting it done.
Q: NIH doesn’t take as much credit as you could — NIH is really doing a lot to support data availability.

Q: Attempt to explore universities taking stewardship for research data — is that a growing trend?
A: Gregory Farber, National Center for Research Resources: I think it is a growing trend. The idea that data’s going to sit in centralized repositories will become the exception rather than the rule.
A: NIH is looking at a number of different ways to promote this.

Defense Technical Information Center
Phil Casey

Our mission fits well with what the board is trying to do
We primarily work for DOD, but we make it publicly available when possible
Primarily medium of exchange is text
About 50% of documents are unclassified, unlimited

Data: taking a few exploratory steps

Potential actions for BRDI:
draft federal policy/responsibilities

National Institute of Standards and Technology
George Arnold

Standard Reference Data Act — allows NIST works to be copyrighted, unlike most gov. works
NIST databases: some free, sells some, subscriptions for others
how do we fund data preservation? if we can charge for access, it’s funding for preservation [This is a worrisome line of thought...]

Q: Robert Chen: NIST – CODATA
A: We’ve been an active participant in CODATA.

Q: Michael Lesk: How do you collaborate with federal agencies which share similar data needs, e.g. data exchange?
A: We haven’t look at this at a strategic level.

Library of Congress
Peter Young

Library’s science collections must address the digital transformation of scientific research

LOC digital initiatives:

  • American Memory
  • National Digital Information Infrastructure & Preservation Program
  • World Digital Library
  • E-Deposit of Electronic Journals

Potential model for data?
2005 LOC report: LOC should collaborate, but necessarily as primary curator
eScience Team est. 2009: to develop position papers, collection policies, policy recommendations

A: Board can help LOC understand what our role should be, and help communicate it to Congress.

Q: If a federal agency had responsible for digital data, I’d think it’d be LOC. Would LOC want this?
A: LOC’s primary audience is Congress — we do what Congress tells us to do.

Institute for Museum and Library Services
Joyce Ray

Funding for repositories since 2002; but not the primary funder
Funded CDL, FCLA – DAITSS, Alabama Digital Preservation Network, U. of Denver – Collaborative Digitization Program, New Jersey Digital Highway, JHU – National Virtual Observatory, UCLA – Cuneiform Digital Library, MIT – preserving CAD architecture designs

Purdue: developing services for researchers who want to share data

