I’m liveblogging the first meeting of the new Board on Research Data and Information today and tomorrow, and will be liveblogging. Standard liveblogging disclaimers apply. The presentation slides are on the meeting site. Because the slides are online, I’ll focus on what’s not on the slides.
National Science Foundation
Data-driven collaborations for complex problems: Affecting every field of science
Science and society being transformed by cyberinfrastructure and data-driven approaches (e.g. Pampers simulating diapers) — cf. Wired, “The End of Science”
NSF vision: national, integrated system to enable new paradigms of science
1. virtual organizations for distributed communities
2. high performance computing (what do we do with the data?)
3. data visualization / interaction
4. education & workforce
any cogent plan must address the phenomenal growth of data in all directions
goal: catalyze development of a system of collections that’s open, extensible, and evolvable; support development of new tools and services
national digital data framework, integrate with national CI
how to build data-driven science? methodologies, culture, education, working cross-disciplines
technological and economic issues: how do we do it? how do we pay for it?
open access: data, software, publications — we need to change the paradigm
rough plan: fill out holes in current CI; workforce development; bring in computation science; new problems: new groups to attach complex problems
current NSF policy on data management is vague — no real consequences. goal: provide for clear, effective, transparent implementation.
concern: if we require sharing, we could damage the science. not everyone within NSF agrees on open data.
data sharing required for replicability and thus integrity of science
we need to accelerate the pace of science to deal with 21st century challenges — “we can’t afford cottage science anymore”
we’re now producing data faster than we can score it
related projects: long-lived repositories project by Bernard Reilly of Center for Research Libraries; Blue Ribbon Task Force on Sustainable Digital Preservation and Access
DataNet: achieve long-term preservation and access with systems that are economically and technologically sustainable
new international task force on data preservation and access? potential first meeting at OR meeting in May 2009
critical role of BRDI: act as conduit for encouraging National Academies to actively address these issues
Q: What can this group say when NSF program officers tell us that our science will die if we have to share data?
A: Nowell: When the community owns this, then the program officers who stand in the way will find themselves overwhelmed.
Q: Mike Carroll: Is the choice completely open vs. completely closed? Can we have deposit requirements with limited or delayed access?
A: Nowell: There’s a lot of issues — from data on human subjects data, to classified data, issues of data curation.
[P.S. room is overcapacity -- nice that there's interest here]
Data Sharing policy: if you get more than $0.5m, your proposal must include a plan for making data available to other researchers or explain why you can’t
also Public Access Policy, GWAS Policy (must submit de-identified data to dbGaP), Clinical Trials Registration and Results Reporting
unique aspects of biomedical data: wide range of uses (clinical and research) and users, privacy
topics of interest:
how to recognize and reward data sharing by researchers
how to embed informatics/data sharing in training
development of a research agenda for managing scientific data and information
promote effective approaches for data sharing
Q: Cathy Wu: Consideration: collaboration with different sectors. Policy issues with EHRs.
A: Collaboration is being promoted by HHS — will be interesting to see how this plays out.
Q: Robert Chen: When government tells external researchers to share, intramural researchers may not always do so. We discovered a group at NIEHS has lots of useful data that they haven’t gotten around to sharing. How to lead by example?
A: I’ve been involved with the tech transfer officers for each institute, they’re responsible for working for intramural researchers to help them do that.
Q: Lesk: What can we do to help you?
A: Once upon a time, had a problem communicating grant requirements to investigators — less of a problem today — but there’s an educational role to be played. Public access is the will of Congress and the will of the people — at some point you’ve got to stop debating and focus on getting it done.
Q: NIH doesn’t take as much credit as you could — NIH is really doing a lot to support data availability.
Q: Attempt to explore universities taking stewardship for research data — is that a growing trend?
A: Gregory Farber, National Center for Research Resources: I think it is a growing trend. The idea that data’s going to sit in centralized repositories will become the exception rather than the rule.
A: NIH is looking at a number of different ways to promote this.
Defense Technical Information Center
Our mission fits well with what the board is trying to do
We primarily work for DOD, but we make it publicly available when possible
Primarily medium of exchange is text
About 50% of documents are unclassified, unlimited
Data: taking a few exploratory steps
Potential actions for BRDI:
draft federal policy/responsibilities
National Institute of Standards and Technology
Standard Reference Data Act — allows NIST works to be copyrighted, unlike most gov. works
NIST databases: some free, sells some, subscriptions for others
how do we fund data preservation? if we can charge for access, it’s funding for preservation [This is a worrisome line of thought...]
Q: Robert Chen: NIST – CODATA
A: We’ve been an active participant in CODATA.
Q: Michael Lesk: How do you collaborate with federal agencies which share similar data needs, e.g. data exchange?
A: We haven’t look at this at a strategic level.
Library of Congress
Library’s science collections must address the digital transformation of scientific research
LOC digital initiatives:
- American Memory
- National Digital Information Infrastructure & Preservation Program
- World Digital Library
- E-Deposit of Electronic Journals
Potential model for data?
2005 LOC report: LOC should collaborate, but necessarily as primary curator
eScience Team est. 2009: to develop position papers, collection policies, policy recommendations
A: Board can help LOC understand what our role should be, and help communicate it to Congress.
Q: If a federal agency had responsible for digital data, I’d think it’d be LOC. Would LOC want this?
A: LOC’s primary audience is Congress — we do what Congress tells us to do.
Funding for repositories since 2002; but not the primary funder
Funded CDL, FCLA – DAITSS, Alabama Digital Preservation Network, U. of Denver – Collaborative Digitization Program, New Jersey Digital Highway, JHU – National Virtual Observatory, UCLA – Cuneiform Digital Library, MIT – preserving CAD architecture designs
Purdue: developing services for researchers who want to share data