Experimental Reproducibility 101 (Part 1)
This article is the first of three and is based on a workshop called “Reproducibility for all” presented at PlantBio18 by Benjamin Schwessinger, Sonali Roy, and Lenny Teytelman.
It’s late. You’re the last person in the lab now. The security personnel, who knows your cat by name, wishes you good luck with the experiments as he sets off again on his nightly rounds. You know you should go home, but you’re faced with a problem.
The problem is, you think you’ve made a discovery that is novel and exciting. A morphogen described in literature to exclusively affect leaves, also changes the root architecture!
Wait, you wish that were the problem.
The problem is that the phenotype is statistically significant in exactly two of four experiments. If these data are included in a manuscript, other researchers will lose time, money, and resources trying to replicate a finding that was (perhaps) never real. On the other hand, what if a genuine finding that would add to the knowledge pool about root architecture control, is falsely dismissed?
Why is it so hard to reproduce findings from that initial screen?
In our recent survey (data here), close to 90% of the respondents said they’ve had trouble replicating their own experiments or published findings from other labs or both at some point in their careers. Improving the replicability of experiments can be assisted by many practices and tools currently available to the modern researcher.
This blog post summarizes resources and helpful pointers which can help organize, document, analyze and disseminate all necessary data to help run an experimental analysis again and re-create results. This article is based on a slide deck (here) assembled by ten authors.
1. Have a data management plan ready before you start a project.
Before starting a project consider the Five Ws (and one H) questions. Assuming you know the answers to the Why (…you are undertaking that project) and Where (…you expect to conduct the research) questions, think about
- What data will be produced as part of the project?
- When will the activities take place over the course of the project?
- Who will take responsibility for carrying out activities planned?
- How will each type of data be organized, documented, standardized, stored, protected, shared and archived? The Data Management Plan Tool can help you form a plan tailored to your project requirements.
An important consideration is the naming of directories and files that store the data. Having a clear project directory structure and a consistent naming convention for files, images, and raw data therein, makes it easier to follow the progress of a project. A good naming convention should be easy to read and follow both, by humans and machines.
Another good practice is to include metadata files which detail information about how a file was created.
2. Be a 21st century researcher – use an Electronic Lab Notebook (ELN).
Here’s another scenario.
Your collaborator just emailed you asking for the details of the plant growth system you developed last summer. Oh and he CC’d your boss in.
Instead of flipping through the pages of your lab notebook Number X, you log into your ELN, search by date and find the embedded excel file with all the growth measurement data. You export all of the detailed notes and attached images of the setup you took on the mobile phone app, as a single pdf file.
And click send.
An Electronic Lab notebook or ELN is a software tool that, like a traditional paper lab notebook, can be used to record all forms of data. In addition, it also provides many more benefits to the digital researcher such as searchability and shareability. Before committing to any one ELN though, researchers are highly encouraged to read through this ELN features matrix to select one that suits their needs best.
Follow the link to Part 2!