Share your science, not just your data



Sharing your science: Going beyond Data sharing 

Reflection on Goodman et al (2014) different ways to share data




In my field of research, neuroscience and neuroimaging, data sharing is something a la mode. I can see two obvious reasons for sharing data: 1) providing data to be added to databases like the Human Connectome Project, the 1000 functional Connectome project, or open fMRI, allows others to use your data, which in combination with many other data can foster potential new discoveries; 2) this allows your result to be reproducible – or does it?

Goodman et al (2014) distinguish different ways to share data depending on what one wants to do: usable, inspectable or fully reproducible. Note that the different levels are inclusive: if one provides all the data information to be reproducible, then the data are ipso facto inspectable and usable, similarly if one provides enough information for data to be inspectable, they are also usable.

Sharing data to make them usable

Poline et al. (2012) describe several ways sharing data with the sole goal of sharing is beneficial: accelerate progress in our fundamental understanding of the brain, improve publication and data quality (also relates to sharing data to make them inspectable), reduce the cost of research and increase the return on current research investments, foster neuroimaging research and advances in clinical practice and requirement for reproducible science  - but I think this last point is not entirely true.

Goodman et al (2014), state that ‘in general, data reuse is most possible when: 1) data; 2) metadata (information describing the data); and 3) information about the process of generating those data, such as code, are all provided’. Indeed, an important aspect of Poline et al. (2012) that is often not considered is the Electronic Data Capture (EDC). For data to be usable, they have to come with sufficient metadata. An efficient way to provide such metadata is to ensure that all the information about how the data came to existence (why, how) is provided. The W3C Provenance Group defines information “provenance” as the sum of all of the processes, people (institutions or agents), and documents (data included!) that were involved in generating or otherwise influencing or delivering a piece of information. 
 

Sharing data to be inspectable 

If one thinks that sharing data can improve publication and data quality, then additional information are needed. Along with the data, derived data should be provided. Although this doesn’t prevent errors in the data analysis process, along with a description of the data analysis process, it provides sufficient information for an expert to judge of the quality of the analysis. Providing segmentation results and statistical mask allow judging if an analysis provides reasonable looking data. Providing unthresholded images (raw statistical maps) allows judging of the likelihood of the significant activation clusters (Jernigan et al., 2003). Providing scatter plots of any zero-order, partial, or part correlations between brain activity and other measures allows judging of the robustness of the results (Rousselet & Pernet, 2012). 

 Sharing your science to be fully reproducible 

Being inspectable allows a fair peer review but to be reproducible the analysis details must be shared too. This does not corresponds to what goes into your article, because there is no way all is described (check the video on Chris Gorgolewski 's blog - it's funny and pretty convincing). Ince et al. (2012) make the case that ‘anything less than release of actual source code is an indefensible approach for any scientific results that depend on computation, because not releasing such code raises needless, and needlessly confusing, roadblocks to reproducibility’. That means that the analysis code should be published too. Scientists generally think the code they write is poor and this is one of the main reasons scientists give for not sharing it with others. Yet, as Barnes (2010) puts it ‘software in all trades is written to be good enough for the job intended. So if your code is good enough to do the job, then it is good enough to release’.


Once an experiment and analysis is done, and the article ready, get data and code in repositories before publication so that permanent links can be included in the article. You can put the data and other other scripts on FigShare and link it with specific version of software for instance on GitHub.

Share your science !

Sharing your science is good for science, but it’s also good for authors. Piwowar et al. 2007 showed that this increases the trust in, interest for, and citation of your paper. Openness improved both the code used by the scientists and the ability of the public to engage with their work (Barnes, 2010). Overall, sharing data and code forces good practice for data curation, which is becoming a requirement for many funding bodies (a plethora of information can be found on the UK Digital Curation Centre website).

Please leave any comments or useful links on 'data' sharing ; I'm eager to collect a maximum of information on this - BTW, does anyone can see 'dangers' in sharing data, code etc?



Comments

Popular Posts