Material for #PlantBio18 “Reproducibility for Everyone” workshop

By Benjamin Schwessinger

Here we present the extended support material for our ‘Reproducibility for Everyone’ workshop at ASPB #PlantBio18. All material is available under a CC BY 4.0 license.

Please feel free to remix, copy, distribute, use, improve, and snowball

The extended handout is available as pdf, word, and google doc.

The extended presentation is available as pdf, pptx, and google doc.

Supported by:

ASPB, Addgene, Protocols.io, eLife, CodeOcean, and free labor in the love of better science

The handout is also copied below for easy of use.

Reproducibility Resources & Tools

Data management

Harvard University Data Management page https://datamanagement.hms.har…
Kbroman Lab http://kbroman.org/dataorg/ (Short primer on data storage and handling form Kbroman)
Purdue Library http://guides.lib.purdue.edu/c… (Short primer on data management and file naming conventions)
Data One Best Practices https://www.dataone.org/best-p… (Detailed resource on how to handle data throughout its life-cycle)
Mantra https://mantra.edina.ac.uk/ (Free online course for those who handle digital data)

Electronic Lab Notebooks (ELN)

Harvard University ELN guide https://tinyurl.com/Harvard-EL… (Great summary about current ELNs and what they do)
Benchling https://benchling.com/ (free)
Evernote https://evernote.com/ (free and $)
Labguru https://www.labguru.com/ ($)
sciNote https://scinote.net/ (open source, free)
Open Science Framework https://osf.io/ (free)

Code

Github https://github.com/ (code repository; free for public repos)
Jupyter Notebooks http://jupyter.org/ (open source web-app for creating & sharing live code, equations, and more)
Code Ocean https://codeocean.com/ (computational reproducibility platform; free to upload, share & publish executable code with DOI; pay for more computing time over freemium limit)
Conda and BioConda https://conda.io/docs/ and https://bioconda.github.io/  (A operating system independent package environment manager for the command line)
Docker and Biocontainers https://docs.docker.com/ and http://biocontainers.pro (A container ecosystem to package code and data on the command line.
Binder https://mybinder.org/ (A tool to make your github repository an online docker image run in the cloud)
Galaxy https://usegalaxy.org/ (A web and graphic interface based bioinformatics platform. Needs local set-up for larger data handling.)

Reagents

Addgene https://www.addgene.org/ (nonprofit plasmid repository)
CiteAb https://www.citeab.com/ (antibody search engine with results sorted by citations)
Quartzy https://www.quartzy.com/ (manage lab inventory)

Methods

Bio-Protocol https://bio-protocol.org/ (A peer-reviewed protocol journal; free to read & publish)
protocols.io http://protocols.io/ (an open access repository of science methods; free to read & publish)

Data

DataDryad http://datadryad.org/ (curated digital repository; free to access, $120 to publish dataset up to 20GB)
Figshare http://datadryad.org/ (free digital repository, 5GB per file limit)
Zenodo https://zenodo.org/ (free digital repository; 50GB per dataset limit)

Data Visualization

Beyond Bar Graphs (Free Tools & Resources for Creating More Transparent Figures for Small Datasets) https://tinyurl.com/ecrbeyondb…
Interactive Dotplot Tool http://statistika.mfub.bg.ac.r… (create dotplots, box plots, violin plots, show subgroups or display clusters of non-independent data)
Interactive Linegraph Tool (examine different summary statistics, focus on groups, time points or conditions of interest, examine lines for any individual in the dataset, view change scores): http://statistika.mfub.bg.ac.r…
Other free tools: https://twitter.com/T_Weissger…

R

Tutorial – Plotting in R on youtube

Customized interactive visualizations (Shiny) https://www.frontiersin.org/ar…

Ggplot2 https://ggplot2.tidyverse.org/
Claus Wilke blog post http://serialmentor.com/blog/2… (contains several links to his upcoming book about datavisiulization)

Python

Collection of useful resources https://github.com/schmelling/…
Tutorial – Data Analysis and Visualization in Python
Data Carpentry: An Introduction to Python for Data Analysis and Visualization – Tracy Teal PyCon 2016 Tutorial
PyData Packages (incl. Matplotlib, Seaborn, Numpy, Pandas, and many more important for data analysis and visualization) https://pydata.org/downloads.h…

Statistical Analysis

Handbook of Biological Statistics! http://www.biostathandbook.com… and http://rcompanion.org/rcompani… (Web page from John H. McDonald and others form University of Delaware with pdf download links to free book on stats in Biology and its R implementation).
Scipy stats lectures https://tinyurl.com/scipystats (Lecture on stats in python using scipy) see also https://www.statsmodels.org/st… for more stats in python
Nature Stats for Biologist resources https://www.nature.com/collect…

Practical tips for reproducibility

1.    Plan for reproducibility before you start
a.    Write a study plan or protocol and track new versions.
b.    Set-up a reproducible project using an electronic lab notebook to organize and track your work. Avoid saving proprietary file formats.

2.    Keep track of things
a.    Preregister important study design and analysis information. Free tools to help you make your first registration include AsPredicted, Open Science Framework, and Registered Reports. Clinical trials use Clinicaltrials.gov.
b.    Track changes to your files using version control.
c.    Document everything done by hand in a README file and data dictionary. Karl Broman’s Data Organization module: http://kbroman.org/dataorg/pag…

3.    Report your research transparently
a.    Share your protocols and interventions explicitly and transparently.
b.    Write a transparent report. Guidelines from the Equator Network or processes like Registered Reports can help.

4.    Archive & share your materials
a.    Share and licence your research
i.    Data
1.    Avoid supplementary files, licence, and share your data using a repository. How to License Research Data: http://www.dcc.ac.uk/resources/how-guides/license-research-data.
ii.    Materials & reagents
1.    Licence your published materials so they can be reused. Creative Commons License Picker: https://creativecommons.org/ch…
2.    Deposit reagents and seeds with repositories like Addgene, and seed repositories
iii.    Software
1.    Licence your code using Code Ocean or Github. Open Source Initiative: About Open Source Licences: https://opensource.org/licenses.

Further reading

●    Ten Simple Rules for Reproducible Computational Research: http://journals.plos.org/plosc…
●    Reproducibility in Science: http://ropensci.github.io/repr…
●    Open Science MOOC: https://opensciencemooc.eu/ and https://opensciencemooc.github…
●    Tools and Resources for Reproducibility Series at protocols.io: goo.gl/r7GKMA
●    Managing Laboratory Notebooks http://colinpurrington.com/tip…
●    General File and Folder Organization https://zapier.com/blog/organi…
●    File Naming Conventions http://www.exadox.com/en/artic…

Example studies

Gene family innovation, conservation and loss on the animal stem lineage
○    Paper: https://doi.org/10.7554/eLife….
○    Protocols: dx.doi.org/10.17504/protocols.io.kwscxees
○    Data: https://doi.org/10.6084/m9.fig…

A robust method for transfection in choanoflagellates illuminates their cell biology and the ancestry of animal septins
○    Paper: https://doi.org/10.1101/343111
○    Protocols: http://www.protocols.io/groups…
○    Constructs: http://www.addgene.org/Nicole_…

Implicating candidate genes at GWAS signals by leveraging topologically associating domains
○    Paper: https://dx.doi.org/10.1038/ejh…
○    Code: https://zenodo.org/record/1639…
○    Docker workflow: https://zenodo.org/record/1665…

mcSCRB-seq: sensitive and powerful single-cell RNA sequencing
○    Protocol: dx.doi.org/10.17504/protocols.io.p9kdr4w
○    Paper: https://doi.org/10.1101/188367
○    Code: https://github.com/cziegenhain…

TransRate: reference-free quality assessment of de novo transcriptome assemblies
○    Paper: https://dx.doi.org/10.1101%2Fg…
○    Code: https://github.com/Blahah/tran…
○    Tutorial: http://hibberdlab.com/transrat…

Genomic insights into members of the candidate phylum Hyd24-12 common in mesophilic anaerobic digesters
○    Paper: https://doi.org/10.1038/ismej….
○    Code: https://github.com/Kirk3gaard/…

Experimenting with Reproducibility: a case study of Robustness in Bioinformatics
○    Paper: https://doi.org/10.1093/gigasc…
○    Code: https://github.com/sje30/waver…

A Bayesian Mixture Modelling Approach For Spatial Proteomics
○    Paper: https://doi.org/10.1101/282269
○    Code: https://github.com/lgatto/2018…