Open Thoughts

February 2012 archive

Nature Editorial about Open Science

February 28, 2012

The case for open computer programs

Does open source software imply reproducible research?

There was a recent Nature editorial expounding the need for open source software in scientific endeavors. It argues that many modern scientific results depend on complex computations and hence source code is needed for scientific reproducibility. It is nice that a high profile journal has published articles promoting open source software, since it increases visibility. However, some more careful thought is required, as the message of the article is inaccurate in both directions.

Open source provides more benefits than just reproducibility

Actually, open source provides more than is necessary for reproducibility, since the licenses provides the ability to edit and extend the code, as well as preventing discriminatory practices. To be pedantic, for reproducibility, any software (even a compiled executable) would work.

We've said this before but the message is worth repeating. Open source provides:

  1. reproducibility of scientific results and fair comparison of algorithms;
  2. uncovering problems;
  3. building on existing resources (rather than re-implementing them);
  4. access to scientific tools without cease;
  5. combination of advances;
  6. faster adoption of methods in different disciplines and in industry; and
  7. collaborative emergence of standards.

See our paper for more details.

Having source code does not imply reproducibility

As the editorial observes in the final sentence of the abstract "The vagaries of hardware, software and natural language will always ensure that exact reproducibility remains uncertain, ... ". I've personally spent many frustrating hours trying to get somebody's research code compiled. In fact, one of the most common complaints by reviewers of JMLR's open source track is that they are unable to get submissions to work on their computer. The multitude of computing environments, numerical libraries and programming languages means that very often, the user of the software is in a different frame of mind compared to the authors of the source code. My advice to fledging authors of machine learning open source software is to provide a "quickstart" tutorial in the README, because everybody is impatient, and nobody will look into fixing your bugs before they are convinced that your code will do something useful for them. And yes, fixing $PATH can be tricky if you don't know exactly how to do it.

I guess the bottom line is quite an obvious statement: Good open source software will give you reproducibility and a few other additional benefits.

Tagging Project 'Published in JMLR'

February 3, 2012

I did some minor maintenance work today updating email addresses of certain projects and merging libmlpack and MLPACK.

More importantly, I did add the jmlr mloss url to all listed projects that have a corresponding jmlr publication. Since there seemed to be quite some backlog it might be worthwhile to shed some light how one gets flagged 'published in jmlr'.

Naturally, the condition is an corresponding accepted and already online jmlr mloss paper. In addition, you should remind your jmlr action editor to flag the project 'published in jmlr' by giving him the link to the abstract of your publication on the jmlr homepage. This is the field 'abs' under . He will then add this cross-reference and your project benefits from the increased visibility.

I hope that makes the process more transparent and reduces the backlog - ohh and btw we are counting 29 JMLR-MLOSS submissions - keep it going :-)