Share your science, not just your data
Sharing your science: Going beyond Data sharing
Reflection on Goodman et al (2014) different ways to share data
In my field of research, neuroscience and neuroimaging, data sharing
is something a la mode. I can see two obvious reasons for sharing data: 1)
providing data to be added to databases like the Human Connectome Project, the 1000 functional Connectome project, or open fMRI, allows others to use your data,
which in combination with many other data can foster potential new discoveries; 2) this allows your result to be
reproducible – or does it?
Goodman
et al (2014) distinguish
different ways to share data depending on what one wants to do: usable, inspectable or fully
reproducible. Note that the different levels are inclusive: if one provides
all the data information to be reproducible, then the data are ipso facto inspectable
and usable, similarly if one provides enough information for data to be
inspectable, they are also usable.
Sharing data to make them usable
Poline et al. (2012) describe several
ways sharing data with the sole goal of sharing is beneficial: accelerate
progress in our fundamental understanding of the brain, improve publication and
data quality (also relates to sharing data to make them inspectable), reduce
the cost of research and increase the return on current research investments,
foster neuroimaging research and advances in clinical practice and requirement for
reproducible science - but I think this last point is not entirely true.
Goodman
et al (2014), state
that ‘in general, data reuse is most possible when: 1) data; 2) metadata
(information describing the data); and 3) information about the process of
generating those data, such as code, are all provided’. Indeed, an important
aspect of Poline
et al. (2012) that is
often not considered is the Electronic Data
Capture (EDC). For
data to be usable, they have to come with sufficient metadata. An efficient way
to provide such metadata is to ensure that all the information about how the
data came to existence (why, how) is provided. The W3C
Provenance Group
defines information “provenance” as the sum of all of the processes, people
(institutions or agents), and documents (data included!) that were involved in
generating or otherwise influencing or delivering a piece of information.
Sharing data to be inspectable
If one thinks that sharing data can improve publication and data
quality, then additional information are needed. Along with the data, derived
data should be provided. Although this doesn’t
prevent errors in the data analysis process, along with a description of the
data analysis process, it provides sufficient information for an expert to
judge of the quality of the analysis. Providing segmentation results and
statistical mask allow judging if an analysis provides reasonable looking data.
Providing unthresholded images (raw statistical maps) allows judging of the
likelihood of the significant activation clusters (Jernigan
et al., 2003). Providing
scatter plots of any zero-order, partial, or part correlations between
brain activity and other measures
allows judging of the robustness of the results (Rousselet
& Pernet, 2012).
Sharing your science to be fully reproducible
Being
inspectable allows a fair peer review but to be reproducible the analysis
details must be shared too. This does not corresponds to what goes into your
article, because there is no way all is described (check the video on Chris Gorgolewski 's blog - it's funny and pretty convincing). Ince
et al. (2012) make the
case that ‘anything less than release of actual source code is an indefensible
approach for any scientific results that depend on computation, because not
releasing such code raises needless, and needlessly confusing, roadblocks to
reproducibility’. That means that the analysis code should be published too. Scientists generally think the code they
write is poor and this is one of the main reasons scientists give for not
sharing it with others. Yet, as Barnes (2010)software in all trades is written to be good enough for the job intended.
So if your code is good enough to do the job, then it is good enough to
release’.
Once an experiment and analysis is done, and the article ready, get data and code in repositories
before publication so that permanent links can be included in the article. You can put the data and other other scripts on FigShare and link it with specific version of software for instance on GitHub.
Share your science !
Sharing your science is good for science, but it’s also good
for authors. Piwowar et al. 2007
showed that this increases the trust in, interest for, and citation of your
paper. Openness improved both the code used by the scientists and the ability
of the public to engage with their work (Barnes, 2010 orces
good practice for data curation, which is becoming a requirement for
many funding bodies (a plethora of information can be found on the UK Digital Curation Centre website).
Please leave any comments or useful links on 'data' sharing ; I'm eager to collect a maximum of information on this - BTW, does anyone can see 'dangers' in sharing data, code etc?
Comments
Post a Comment