Experiment Databases for Machine Learning is a large public database of machine learning experiments as well as a framework for producing similar databases for specific goals. It provides a way to share the thousands of machine learning experiments run every day by providing an XML-based language (ExpML) to fully describe (reproducible and annotated) machine learning experiments, as well as an interface to automatically store all such descriptions in an organized way in predefined databases, allowing a very thorough examination of the stored results. The current publicly available database contains over 500,000 classification and regression experiments, and has both an online interface , as well as a stand-alone explorer tool offering various visualization techniques. This framework can also be integrated in machine learning toolboxes, such as , to automatically stream results to a global (or local) experiment database, or to download experiments that have been run before.
This projects aims to bring the information contained in many machine learning experiments together and organize it a way that allows everyone to investigate how learning algorithms have performed in previous studies. Meta-level information about the algorithms and datasets, like properties of the used models or statistical properties of the data, can also be stored to investigate how they affect algorithm behavior. To share such information with the world, a common language is proposed, dubbed ExpML, capturing the basic structure of various machine learning experiments, including the use of ensemble methods, kernel methods and preprocessed datasets, while remaining open for future extensions. This language also enforces reproducibility by requiring links to the used datasets and algorithms and by storing all details of the experiment setup.
The project includes a library that allows to compose experiments as objects, after which they can be automatically translated to ExpML and automatically stored in local databases or uploaded to public ones. Furthermore, it allows to store unfinished experiments and offers a basic interface to retrieve and run these experiments and store their results afterward, for use in distributed systems running large amounts of experiments.
At the moment, the framework fully supports most classification and regression tasks, an SQL query interface and visualization techniques for investigating the retrieved data. In the future, we hope to develop more intuitive interfaces to make the stored information more easily accessible to non-expert users, and also hope to cover more machine learning tasks.
- Changes to previous version:
Initial Announcement on mloss.org.
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.