Python module to ease pattern classification analyses of large datasets. It provides high-level abstraction of typical processing steps (e.g. data preparation, classification, feature selection, generalization testing), a number of implementations of some popular algorithms (e.g. kNN, Ridge Regressions, Sparse Multinomial Logistic Regression, GPR. RFE, I-RELIEF), and bindings to external ML libraries (libsvm, shogun, R). While it is not limited to neuroimaging data (e.g. FMRI) it is eminently suited for such datasets.
It is actively developed project, thus you might better off trying it from the version control system. Please see documentation on how to obtain and "build" from sources.
- Changes to previous version:
Initial Announcement on mloss.org.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- URL: Project Homepage
- Supported Operating Systems: Agnostic
- Data Formats: None
- Tags: Shogun, Python, Eeg, Classification, Regression, Support Vector Machines, K Nearest Neighbor Classification, Pca, Rfe, Neuroscience, Fmri, Framework, Gpr, Lars, Smlr, Meg
- Archive: download here
Other available revisons
Version Changelog Date 2.0.0
- 2.0.0 (Mon, Dec 19 2011)
This release aggregates all the changes occurred between official releases in 0.4 series and various snapshot releases (in 0.5 and 0.6 series). To get better overview of high level changes see :ref:
release notes for 0.5 <chap_release_notes_0.5>and :ref:
0.6 <chap_release_notes_0.6>as well as summaries of release candidates below
Fixes (23 BF commits)
- significance level in the right tail was fixed to include the value tested -- otherwise resulted in optimistic bias (or absurdly high significance in improbable case if all estimates having the same value)
- compatible with the upcoming IPython 0.12 and renamed sklearn (Fixes #57)
do not double-train
slaveclassifiers while assessing sensitivities (Fixes #53)
Enhancements (30 ENH + 3 NF commits)
- resolving voting ties in kNN based on mean distance, and randomly in SMLR
ca.estimatesnow contains dictionaries with votes for each class
consistent zscoring in :class:
2.0.0~rc5 (Wed, Oct 19 2011)
Major: to allow easy co-existence of stable PyMVPA 0.4.x, 0.6 development
mvpamodule was renamed into mod:
- compatible with the new Shogun 1.x series
- compatible with the new h5py 2.x series
- mvpa-prep-fmri -- various compatibility fixes and smoke testing
tutorial uses :mod:
- better suppression of R warnings when needed
- internal attributes of many classes were exposed as properties
more unification of
__repr__for many classes
- tutorial uses :mod:
0.6.0~rc4 (Wed, Jun 14 2011)
Finished transition to :mod:
Various adjustments in the tests batteries (:mod:
nibabel1.1.0 compatibility, etc)
- Finished transition to :mod:
Explicit new argument
flattento from_wizard -- default behavior changed if mapper was provided as well
- Explicit new argument
__repr__for some Classifiers and Measures
0.6.0~rc3 (Thu, Apr 12 2011)
- Bugfixes regarding the interaction of FlattenMapper and BoxcarMapper that affected event-related analyses.
Splitternow handles attribute value
Nonefor splitting properly.
More robust detection of mod:
Repeaternode to yield a dataset multiple times and
Sifternode to exclude some datasets. Consequently, the "nosplitting" mode of
Splittergot removed at the same time.
tools/niils-- little tool to list details (dimensionality, scaling, etc) of the files in nibabel-supported formats.
- Added a
- Numerous documentation fixes.
- Various improvements and increased flexibility of null distribution estimation of Measures.
- All attribute are now reported in sorted order when printing a dataset.
fmri_datasetnow also stores the input image type.
Crossvalidationcan now take a custom
Splitterinstance. Moreover, the default splitter of CrossValidation is more robust in terms of number and type of created splits for common usage patterns (i.e. together with partitioners).
CrossValidationtakes any custom Node as
ConfusionMatrixcan now be used as an
LOE(ACC): Linear Order Effect in ACCwas added to
ConfusionMatrixto detect trends in performances across splits.
Nodes postproc is now accessible as a property.
RepeatedMeasurehas a new 'concat_as' argument that allows results to be concatenated along the feature axis. The default behavior, stacking as multiple samples, is unchanged.
Searchlightnow has the ability to mark the center/seed of an ROI in with a feature attribute in the generated datasets.
argsparameter for delayed string comprehensions. It should reduce run-time impact of
debug()calls in regular, non
-Omode of Python operation.
String summaries and representations (provided by
__repr__) were made more exhaustive and more coherent. Additional properties to access initial constructor arguments were added to variety of classes.
New debug target
STDOUTto allow attaching metrics (e.g. traceback, timestamps) to regular output printed to stdout
New set of decorators to help with unittests
@nodebugto disable specific debug targets for the duration of the test.
@reseed_rngto guarantee consistent random data given initial seeding.
@with_tempfileto provide a tempfile name which would get removed upon completion (test success or failure)
Dropping daily testing of
maint/0.5branch -- RIP.
Collections were provided with adequate
Datasetwas refactored to use
update-*Makefile rules automatically should fast-forward corresponding
MVPA_TESTS_VERBOSITYcontrols also :mod:
Dataset.__array__provides original array instead of copy (unless dtype is provided)
Also adapts changes from 0.4.6 and 0.4.7 (see corresponding changelogs).
0.6.0~rc2 (Thu, Mar 3 2011)
Various fixes in the mvpa.atlas module.
0.6.0~rc1 (Thu, Feb 24 2011)
Many, many, many
For an overview of the most drastic changes :ref:
see constantly evolving release notes for 0.6 <chap_release_notes_0.6>
0.5.0 (sometime in March 2010)
This is a special release, because it has never seen the general public. A summary of fundamental changes introduced in this development version can be seen in the :ref:
release notes <chap_release_notes_0.5>.
Most notably, this version was to first to come with a comprehensive two-day workshop/tutorial.
- 0.4.7 (Tue, Mar 07 2011) (Total: 12 commits)
A bugfix release
Addressed the issue with input NIfTI files having
scl_fields set: it could result in incorrect analyses and map2nifti-produced NIfTI files. Now input files account for scaling/offset if
scl_fields direct to do so. Moreover upon map2nifti, those fields get reset.
doc/examples/searchlight_minimal.py- best error is the minimal one
- Addressed the issue with input NIfTI files having
~mvpa.clfs.gnb.GNBcan now tolerate training datasets with a single label
~mvpa.clfs.meta.TreeClassifiercan have trailing nodes with no classifier assigned
0.4.6 (Tue, Feb 01 2011) (Total: 20 commits)
A bugfix release
Fixed (few BF commits):
- Compatibility with numpy 1.5.1 (histogram) and scipy 0.8.0 (workaround for a regression in legendre)
- Compatibility with libsvm 3.0
- Enforce suppression of numpy warnings while running unittests. Also setting verbosity >= 3 enables all warnings (Python, NumPy, and PyMVPA)
doc/examples/nested_cv.pyexample (adopted from 0.5)
Introduced base class :class:
~mvpa.clfs.base.LearnerErrorfor classifiers' exceptions (adopted from 0.5)
- Adjusted example data to live upto nibabel's warranty of NIfTI standard-compliance
- More robust operation of MC iterations -- skip iterations where classifier experienced difficulties and raise an exception (e.g. due to degenerate data)
December 22, 2011, 01:36:32 0.4.5
0.4.5 (Fri, Oct 01 2010) (Total: 27 commits)
A bugfix release
* Fixed (13 BF commits): o Compatible with LIBSVM >= 2.91 (Closes: #583018) o No string exceptions raised (Python 2.6 compatibility) o Setting of shrinking parameter in sg interface o Deducing number of SVs for SVR (LIBSVM) o Correction of significance in the tails of non-parametric tests * Miscellaneous: o Development repository moved to http://github.com/PyMVPA/PyMVPA
October 2, 2010, 16:51:22 0.4.4
0.4.4 (Mon, Feb 2 2010) (Total: 144 commits)
Primarily a bugfix release, probably the last in 0.4 series since development for 0.5 release is leaping forward.
- New functionality (19 NF commits):
o GNB implements Gaussian Naïve Bayes Classifier.
o read_fsl_design() to read FSL FEAT design.fsf files (Contributed by Russell A. Poldrack).
o SequenceStats to provide basic statistics on labels sequence (counter-balancing, autocorrelation).
o New exceptions DegenerateInputError and FailedToTrainError to be thrown by classifiers primarily during training/testing.
o Debug target STATMC to report on progress of Monte-Carlo sampling (during permutation testing).
- Refactored (15 RF commits):
o To get users prepared to 0.5 release, internally and in some examples/documentation, access to states and parameters is done via corresponding collections, not from the top level object (e.g. clf.states.predictions instead of soon-to-be-deprecated clf.predictions). That should lead also to improved performance.
o Adopted copy.py from python2.6 (support Ellipsis as well). ed (38 BF commits):
o GLM output does not depend on the enabled states any more.
o Variety of docstrings fixed and/or improved.
o Do not derive NaN scaling for SVM’s C whenever data is degenerate (lead to never finishing SVM training).
o sg : + KRR is optional now – avoids crashing if KRR is not available.
tolerance to absent set_precompute_matrix in svmlight in recent shogun versions.
support for recent (present in 0.9.1) API change in exposing debug levels.
o Python 2.4 compatibility issues: kNN and IFS
February 7, 2010, 16:48:00 0.4.3
Online documentation editor is no longer available due to low demand – please submit changes via email.
Performance (Contributed by Valentin Haenel) (3 OPT commits):
- Further optimized LIBSVM bindings.
- Copy-if-sorted in selectFeatures.
New functionality (25 NF commits):
- ProcrusteanMapper with orthogonal and oblique transformations.
- Ability to generate simple reports using reportlab. See/run examples/match_distribution.py for example.
- TreeClassifier – construct simple hierarchies of classifiers.
- wtf() to report information about the system/PyMVPA to be included in the bug reports.
- Parameter ‘reverse’ to swap training/testing splits in Splitter .
- Example code for the analysis of event-related dataset using ERNiftiDataset.
- toEvents() to create lists of Event.
- mvpa-prep-fmri was extended with plotting of motion correction parameters.
- ColumnData can be explicitly told either file contains a header.
- In XMLBasedAtlas (e.g. fsl atlases) it is now possible to provide custom ‘image_file’ to get maps or indexes for the areas given an atlas’s volume registered into subject space.
- Updated included LIBSVM version to 2.89 and provided support for its “silencing”.
Refactored (27 RF commits):
- Dataset’s copy() with deep=False allows for shallow copying the dataset.
- FeatureSelectionClassifier s in warehouse not to reuse the same classifiers, but to use clones.
Fixed (70 BF commits):
- OneWayAnova: previously degrees of freedom were not considered while computing F-scores.
- Majority voting strategy in kNN: it was not working.
- Various fixes to ensure cross-platform building (numpy header locations, etc).
- Stability fixes in ConfusionMatrix.
- idsonboundaries(): samples at the end of the sequence were not handled properly.
- Proper “untraining” of FeatureSelectionClassifier s classifiers which use sensitivities: it could lead to various unpleasant side-effects if the same slave classifier was used simultaneously by multiple MetaClassifiers (like TreeClassifier).
September 8, 2009, 20:21:12 0.4.1
Initial Announcement on mloss.org.
May 18, 2008, 17:06:05
Leave a comment
You must be logged in to post comments.