Modular toolkit for Data Processing (MDP) is a library of widely used data processing algorithms that can be combined according to a pipeline analogy to build more complex data processing software.
The base of available algorithms is steadily increasing and includes, to name but the most common, Principal Component Analysis (PCA and NIPALS), several Independent Component Analysis algorithms (CuBICA, FastICA, TDSEP, and JADE), Slow Feature Analysis, Gaussian Classifiers, Restricted Boltzmann Machine, and Locally Linear Embedding.
From the user's perspective, MDP consists of a collection of supervised and unsupervised learning algorithms, and other data processing units (nodes) that can be combined into data processing sequences (flows) and more complex feed-forward network architectures. Given a set of input data, MDP takes care of successively training or executing all nodes in the network. This allows the user to specify complex algorithms as a series of simpler data processing steps in a natural way.
Particular care has been taken to make computations efficient in terms of speed and memory. To reduce memory requirements, it is possible to perform learning using batches of data, and to define the internal parameters of the nodes to be single precision, which makes the usage of very large data sets possible. Moreover, the 'parallel' subpackage offers a parallel implementation of the basic nodes and flows.
From the developer's perspective, MDP is a framework that makes the implementation of new supervised and unsupervised learning algorithms easy and straightforward. The basic class, 'Node', takes care of tedious tasks like numerical type and dimensionality checking, leaving the developer free to concentrate on the implementation of the learning and execution phases. Because of the common interface, the node then automatically integrates with the rest of the library and can be used in a network together with other nodes. A node can have multiple training phases and even an undetermined number of phases. This allows the implementation of algorithms that need to collect some statistics on the whole input before proceeding with the actual training, and others that need to iterate over a training phase until a convergence criterion is satisfied. The ability to train each phase using chunks of input data is maintained if the chunks are generated with iterators. Moreover, crash recovery is optionally available: in case of failure, the current state of the flow is saved for later inspection.
MDP has been written in the context of theoretical research in neuroscience, but it has been designed to be helpful in any context where trainable data processing algorithms are used. Its simplicity on the user side together with the reusability of the implemented nodes make it also a valid educational tool.
- Changes to previous version:
Initial Announcement on mloss.org.
Other available revisons
Version Changelog Date 3.3
What's new in version 3.3?
- support sklearn versions up to 0.12
- cleanly support reload
- fail gracefully if pp server does not start
- several bug-fixes and improvements
October 4, 2012, 15:17:33 3.2
What's new in version 3.2?
- improved sklearn wrappers
- update sklearn, shogun, and pp wrappers to new versions
- do not leave temporary files around after testing
- refactoring and cleaning up of HTML exporting features
- improve export of signature and doc-string to public methods
- fixed and updated FastICANode to closely resemble the original Matlab version (thanks to Ben Willmore)
- support for new numpy version
- new NeuralGasNode (thanks to Michael Schmuker)
- several bug fixes and improvements
We recommend all users to upgrade.
October 24, 2011, 15:43:59 3.1
This is a bu fix release.
March 30, 2011, 19:20:57 3.0
- Python 3 support
- New extensions: caching and gradient
- Automatically generated wrappers for scikits.learn algorithms
- Shogun and libsvm wrappers
- New algorithms: convolution, several classifiers and several user-contributed nodes
- Several new examples on the homepage
- Improved and expanded tutorial
- Several improvements and bug fixes
- New license: MDP goes BSD!
January 17, 2011, 16:46:51 2.6
- Several new classifier nodes have been added.
- A new node extension mechanism makes it possible to dynamically add methods or attributes for specific features to node classes, enabling aspect-oriented programming in MDP. Several MDP features (like parallelization) are now based on this mechanism, and users can add their own custom node extensions.
- BiMDP is a large new package in MDP that introduces bidirectional data flows to MDP, including backpropagation and even loops. BiMDP also enables the transportation of additional data in flows via messages.
- BiMDP includes a new flow inspection tool, that runs as as a graphical debugger in the webrowser to step through complex flows. It can be extended by users for the analysis and visualization of intermediate data.
- As usual, tons of bug fixes
The new additions in the library have been thoroughly tested but, as usual after a public release, we especially welcome user's feedback and bug reports.
May 14, 2010, 19:26:00 2.5
Get a full list of changes at http://mdp-toolkit.sourceforge.net/CHANGES
June 30, 2009, 14:59:55 2.4
Initial Announcement on mloss.org.
January 23, 2008, 10:20:05
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.