I try to avoid writing things that may make me sound stupid, but this post falls in that category.
Recently I was reading about efforts related to data sharing: technological infrastructure, curation, educating researchers, and the like. I was struck by the thought that most of the advocacy for data sharing boils down to an exhortation to stick it in a digital repository.
This seems a bit odd considering that much of what propels science is the pressure to publish (written) results (in journals, conferences, monographs, etc.). There is a hierarchy of venues in terms of prestige, which is in turn linked to research funding, promotion, public attention (media coverage, policy influence), etc.
Might the best way to get researchers to share data be to create a similar system for datasets? It might provide a compelling incentive.
Moreover, publishing might provide a compelling incentive to the related issue of data curation (making data understandable / usable to others, e.g. through formatting, annotation, etc.). Currently, much data doesn’t see much use outside the lab where it was generated, so researchers have little incentive to spend time “prettying it up” for others (who may find the way it was recorded to be inscrutable). Even if they are convinced to “share” their data by posting it online, it may seem quite a low priority to spend time making it useful to others. If there was pressure to publish the dataset, though, then researchers would have that incentive to make the data as intuitively useful to others as practicable, so reviewers could quickly identify the novelty of the data.
This doesn’t seem so outlandish to me. There are similar efforts to provide publication fora for materials which were not traditionally unpublished (we might say undersupplied), such as negative results and experimental techniques.
If you think of it in terms of a CV, the difference is between these lines:
- Created and shared large, valuable dataset which is highly regarded by peers
- Publication in J. Big Useful Datasets, impact factor X
It may be hard for a reviewer to quantify or validate the former; the latter demonstrates that the researcher’s contribution has already been validated and provides built-in metrics to quantify the contribution.
There are other ways to skin the same cat. One option would be to build alternative systems for conferring recognition (e.g. awards, metrics for contributions to shared datasets, etc.). The other approach is to make data sharing a more enforceable part of other scientific endeavors, e.g. mandatory as a condition of research funding, mandatory as a condition of publication (of written results) in a journal, etc. I think multiple approaches will yield the best result. It seems to me that creating “journals” (or some other name) for “publishing” datasets could be a useful way to spur participation.
Has this been done already? What are the drawbacks to this approach?