Open Thoughts

Scientist vs Inventor

Posted by Cheng Soon Ong on March 18, 2013

Mikio and I are writing a book chapter about "Open Science in Machine Learning", which will appear in a collection titled "Implementing Computational Reproducible Research". Among many things, we mentioned that machine learning is about inventing new methods for solving problems. Luis Ibanez from Kitware pounced on this statement, and proceeded to give a wonderful argument that we are confusing our roles as scientists with the pressure of being an inventor. The rest of this post is an exact reproduction of Luis' response to our statement.

“... machine learning is concerned with creating new learning methods to perform well on certain application problems.”.

The authors discuss the purpose of machine learning, but under the untold context of “research on machine learning”, and the current landscape of funding research. To clarify, the authors imply that novelty is the purpose of machine learning research. More explicitly, that “developing new methods” is the goal of research.

This is a view (not limited to machine learning) that is commonly widespread, and that in practice is confirmed by the requirements of publishing and pursuit of grant funding. I beg to differ with this view, in the sense that “novelty” is not part of the scientific process at all. Novelty is an artificial condition that has been imposed on scientific workers over the years, due to the need to evaluate performance for the purpose of managing scarce funding resources. The goal of scientific research is to attempt to understand the world by direct observation, crafting of hypothesis and evaluation of hypothesis via reproducible experiments.

The pursuit of novelty (real or apparent) is actually a distraction, and it is one of the major obstacles to the practice of reproducible research. By definition, repeating an experiment, implies, requires and demands to do something that is not new. This distracted overrating of novelty is one of the reasons why scientific workers, and their institutions have come to consider repeatability of experiments as a “waste of time”, since it takes resources away from doing “new things” that could be published or could lead to new streams of funding. This confusion with “novelty” is also behind the lack of interest in reproducing experiments that have been performed by third parties. Since, such actions are “just repeating” what someone else did, and are not adding anything “new”. All, statements that are detrimental to the true practice of the scientific method.

The confusion is evident when one look at calls for proposals for papers in journal, conferences, or for funding programs. All of them call for “novelty”, none of them (with a handful of exceptions) call for reproducibility. The net effect is that we have confused two very different professions: (a) scientific researcher, with (b) inventor. Scientific researchers should be committed to the application of the scientific method, and in it, there is no requirement for novelty. The main commitment is to craft reproducible experiments, since we are after the truth, not after the new. Inventors on the other hand are in the business of coming up with new devices, and are not committed to understanding the world around us.

Most conference, journals, and even funding agencies have confused their role of supporting the understanding the world around us, and have become surrogates for the Patent Office.

In order to make progress in the pursuit of reproducible research, we need to put “novelty” back in its rightful place of being a nice extra secondary or tertiary feature of scientific research, but not a requirement, nor a driving force at all.


Tom Fawcett (on March 25, 2013, 18:57:17)

I agree with some of his argument, but I'd point out that machine learning isn't exactly a natural science like biology. We don't really study artifacts and phenomena of the natural world (except insofar as datasets can be said to be "artifacts of the natural world"). Hypothesis exploration is not as distinct from invention in our field as it is in others. His claim "we are after the truth, not after the new" sounds like a false dichotomy to me; often to get at the truth you must explore hypotheses (invent new methods) to see how they behave in comparison.

Having said that, I agree that reproducibility should be more important than it currently is.

Perhaps he thinks that the point of machine learning is to come up with a Grand Unified Theory of Adaptive Behavior, in which case I'd agree with him enthusiastically. I don't think we're anywhere close to this yet.

Leave a comment

You must be logged in to post comments.