Science (the mag, not the concept) sez:
Science is driven by data. New technologies… blah… publishers, including Science, have increasingly assumed more responsibility for ensuring that data are archived and available after publication… blah… Science’s policy for some time has been that “all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science” (see www.sciencemag.org/site/feature/contribinfo/)… blah… Science is extending our data access requirement listed above to include computer codes involved in the creation or analysis of data
Well, jolly good. I look forward to them insisting the full code for HadCM3 / HadGEM / whatever is published before accepting any GCM papers using them (which, amusingly, will now include all the papers doing the increasingly fashionable “multi-model” studies using the widely available AR4 data archives).
Come to think of it, it would also prevent S+C (but not RSS?) ever publishing in Science.
* One of James / Jules’s posts pushing the appropriate model journal – Geoscientific Model Development.
* Eli comments on Nature’s policy, which is more nuanced.
* Devil in the details Nature 470, 305-306 (17 February 2011) doi:10.1038/470305b To ensure their results are reproducible, analysts should show their workings – nice Nature article on Genomics trubbles, h/t NB.
It hardly needs to be said that the editors of Science, when writing an editorial entitled “Making Data Maximally Available”, meant the whole thing to be maximally readable, but accidentally forced people through a tedious registration process to get to it. So, as a service to them, I’ll reproduce it here.
Science 11 February 2011:
Vol. 331 no. 6018 p. 649
Making Data Maximally Available
Brooks Hanson1, Andrew Sugden2, and Bruce Alberts3
1Brooks Hanson is Deputy Editor for physical sciences at Science.
2Andrew Sugden is Deputy Editor for biological sciences and International Managing Editor at Science.
3Bruce Alberts is Editor-in-Chief of Science.
Science is driven by data. New technologies have vastly increased the ease of data collection and consequently the amount of data collected, while also enabling data to be independently mined and reanalyzed by others. And society now relies on scientific data of diverse kinds; for example, in responding to disease outbreaks, managing resources, responding to climate change, and improving transportation. It is obvious that making data widely available is an essential element of scientific research. The scientific community strives to meet its basic responsibilities toward transparency, standardization, and data archiving. Yet, as pointed out in a special section of this issue (pp. 692-729), scientists are struggling with the huge amount, complexity, and variety of the data that are now being produced.
Recognizing the long shelf-life of data and their varied applications, and the close relation of data to the integrity of reported results, publishers, including Science, have increasingly assumed more responsibility for ensuring that data are archived and available after publication. Thus, Science and other journals have strengthened their policies regarding data, and as publishing moved online, added supporting online material (SOM) to expand data presentation and availability. But it is a growing challenge to ensure that data produced during the course of reported research are appropriately described, standardized, archived, and available to all.
Science’s policy for some time has been that “all data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science” (see http://www.sciencemag.org/site/feature/contribinfo/). Besides prohibiting references to data in unpublished papers (including those described as “in press”), we have encouraged authors to comply in one of two ways: either by depositing data in public databases that are reliably supported and likely to be maintained or, when such a database is not available, by including their data in the SOM. However, online supplements have too often become unwieldy, and journals are not equipped to curate huge data sets. For very large databases without a plausible home, we have therefore required authors to enter into an archiving agreement, in which the author commits to archive the data on an institutional Web site, with a copy of the data held at Science. But such agreements are only a stopgap solution; more support for permanent, community-maintained archives is badly needed.
To address the growing complexity of data and analyses, Science is extending our data access requirement listed above to include computer codes involved in the creation or analysis of data. To provide credit and reveal data sources more clearly, we will ask authors to produce a single list that combines references from the main paper and the SOM (this complete list will be available in the online version of the paper). And to improve the SOM, we will provide a template to constrain its content to methods and data descriptions, as an aid to reviewers and readers. We will also ask authors to provide a specific statement regarding the availability and curation of data as part of their acknowledgements, requesting that reviewers consider this a responsibility of the authors. We recognize that exceptions may be needed to these general requirements; for example, to preserve the privacy of individuals, or in some cases when data or materials are obtained from third parties, and/or for security reasons. But we expect these exceptions to be rare.
As gatekeepers to publication, journals clearly have an important part to play in making data publicly and permanently available. But the most important steps for improving the way that science is practiced and conveyed must come from the wider scientific community. Scientists play critical roles in the leadership of journals and societies, as reviewers for papers and grants, and as authors themselves. We must all accept that science is data and that data are science, and thus provide for, and justify the need for the support of, much-improved data curation.