I’m liveblogging the first meeting of the new Board on Research Data and Information today and yesterday. Standard liveblogging disclaimers apply. The presentation slides are on the meeting site. Because some of the slides are online, I’ll focus on what’s not on the slides.
We want to make a 2-3 page document for policymakers. We want to point out that the last 8 years of neglecting science have coincided with massive growth of data. “You can’t do evidence-based policymaking without evidence.”
We have to communicate the huge expense in waiting to act. There’s a shift to how computing is done and how science is done. Doing it right from the start could save us lots of money.
Maybe we should analyze the stimulus and spending priorities to see how to make it relevant and urgent.
In the stimulus (Senate), $3b for NSF and $2b for increasing the number of people in science. And we have to spend in 180 days. But people haven’t filed bold new proposals.
Purpose of the note is to get the attention of a few top-level people. Principle purpose is to establish a relationship. Don’t need to cram it with info.
Is there any way that this can influence the billions that NSF will have in 180 days?
We could get attention by saying, we’re drowning in data. But unlikely that NSF will want to give us money.
I disagree — we could give NSF cover for why they should throw money at this. Everyone else will be arguing for why they should get the money.
We’ll do an introductory note, followed up by an overview of what we’re working on. After that, what topics will we look at? I’ve heard a suggestion for studying the economic [and social?] value of data. Second, generally scientists manage their own data or nobody does — what does the right structure look like? Who pays for it, what are the incentive structures, etc.
In government, it’s important to have a clear vision to motivate people throughout the government. We don’t have that here.
There are some disciplinary differences worth mentioning — e.g. data from Antarctica. How to take examples and expand applicability, pushing envelope toward vision of science (and its data) as a global commons?
Getting funds for data storage isn’t hard. It’s getting information for decision-making out of data that matters.
It’s not about the data — it’s about what the data enables. It’s a means to an end.
Focus on evidence-driven policy is good.
What about improving education through the use of scientific data? There’s also issues of privacy, confidentiality, interoperability.
There’s a lot happening on confidentiality — maybe best to wait.
Also other work happening on best practices for data lifecycle — wait here? Blue Ribbon Task Force on Sustainable Repositories, report on distributed stewardship.
Then how do we answer the question of “what do I do with my data”?
There’s so much to lifecycle — we’ll probably break it out into sub-topics.
The current budget is to support 2 symposia a year.
What about work on a vision statement?
We could do a public wiki.
No, we can’t let the public write it on a wiki.
Maybe we could get ideas on the wiki?
Sure, we could get contributions on a wiki.
What about a consensus study?
If we’re going to do that, we’ll get separate funding.
What about scientific data for education?
That was a recommendation to NSF in the cyberlearning report. Is that the best use of our time?
BRDI should include the possibility of data and information serving education as well as research, and look for ways to do that. I suggest that we consider that in everything we do.
Is this a good topic for a symposium?
We could use the symposia to get information to decide how to address these topics — to better define what we want to do.
We can look at successes and failures.
We could probably get joint sponsorship for something on data and open learning with NSF, Hewlett, MacArthur.
We should come up with criteria to decide where we want to put our time.
On the question of privacy, et al. — I don’t think it’s all taken care of. There’s a broader issue: how to overcome the barriers to integrating datasets. That’s the way to look at it. If we could generate a how-to document, it’d be very useful.
There are some interesting computer science questions there.
What are some of the barriers we might include in such a study? Privacy and security, but also IP, cultural issues, statutes, rules on data matching.
How-to manual is exactly what we’re looking for.
This is about federal data?
No, this is about data where people want to share something but there are some difficulties.
There are tools now to do anonymous matching from different data sets, pulling data without combining them.
There’s a way to check who’s checked into a hotel, without the hotel sharing its list or you sharing your list of who you’re looking for.