MLPACK is a scalable C++ machine learning library. Its aim is to make large-scale machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users.
The following methods are provided:
- Density Estimation Trees
- Euclidean Minimum Spanning Trees
- Fast Exact Max-Kernel Search (FastMKS)
- Gaussian Mixture Models (GMMs)
- Hidden Markov Models (HMMs)
- Kernel Principal Components Analysis (KPCA)
- K-Means Clustering
- Least-Angle Regression (LARS/LASSO)
- Local Coordinate Coding
- Locality-Sensitive Hashing (LSH)
- Naive Bayes Classifier
- Neighborhood Components Analysis (NCA)
- Nonnegative Matrix Factorization (NMF)
- Principal Components Analysis (PCA)
- RADICAL (ICA)
- Rank-Approximate Nearest Neighbor (RANN)
- Simple Least-Squares Linear Regression
- Sparse Coding
- Tree-based Neighbor Search (all-k-nearest-neighbors, all-k-furthest-neighbors), using either kd-trees or cover trees
- Tree-based Range Search
Command-line executables are provided for each of these, and the C++ classes which define the methods are highly flexible, extensible, and modular. More information (including documentation, tutorials, and bug reports) is available at http://www.mlpack.org/.
- Changes to previous version:
Minor bugfix so that FastMKS gets built.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- URL: Project Homepage
- JMLR MLOSS PaperURL: JMLR-MLOSS Paper Homepage
- Supported Operating Systems: Platform Independent
- Data Formats: Plain Ascii, Ascii, Txt, Hdf, Bin, Csv, Xml
- Tags: Gmm, Hmm, Machine Learning, Sparse, Dual Tree, Fast, Scalable, Tree
- Archive: download here
Other available revisons
Version Changelog Date 2.0.0
- Removed overclustering support from k-means because it is not well-tested, may be buggy, and is (I think) unused. If this was support you were using, open a bug or get in touch with us; it would not be hard for us to reimplement it.
- Refactored KMeans to allow different types of Lloyd iterations.
- Added implementations of k-means: Elkan's algorithm, Hamerly's algorithm, Pelleg-Moore's algorithm, and the DTNN (dual-tree nearest neighbor) algorithm.
- Significant acceleration of LRSDP via the use of accu(a % b) instead of trace(a * b).
- Added MatrixCompletion class (matrix_completion), which performs nuclear norm minimization to fill unknown values of an input matrix.
- No more dependence on Boost.Random; now we use C++11 STL random support.
- Add softmax regression, contributed by Siddharth Agrawal and QiaoAn Chen.
- Changed NeighborSearch, RangeSearch, FastMKS, LSH, and RASearch API; these classes now take the query sets in the Search() method, instead of in the constructor.
- Use OpenMP, if available. For now OpenMP support is only available in the DET training code.
- Add support for predicting new test point values to LARS and the command-line 'lars' program.
- Add serialization support for Perceptron and LogisticRegression.
- Refactor SoftmaxRegression to predict into an arma::Row object, and add a softmax_regression program.
- Refactor LSH to allow loading and saving of models.
- ToString() is removed entirely (#487).
- Add --input_model_file and --output_model_file options to appropriate machine learning algorithms.
- Rename all executables to start with an "mlpack" prefix (#229).
See also https://mailman.cc.gatech.edu/pipermail/mlpack/2015-December/000706.html for more information.
January 11, 2016, 17:24:35 1.0.12
- Switch to 3-clause BSD license.
January 7, 2015, 19:23:51 1.0.11
- Proper handling of dimension calculation in PCA.
- Load parameter vectors properly for LinearRegression models.
- Linker fixes for AugLagrangian specializations under Visual Studio.
- Add support for observation weights to LinearRegression.
- MahalanobisDistance<> now takes root of the distance by default and therefore satisfies the triangle inequality (TakeRoot now defaults to true).
- Better handling of optional Armadillo HDF5 dependency.
- Fixes for numerous intermittent test failures.
- math::RandomSeed() now sets the seed for recent (>= 3.930) Armadillo versions.
- Handle Newton method convergence better for SparseCoding::OptimizeDictionary() and make maximum iterations a parameter.
- Known bug: CosineTree construction may fail in some cases on i386 systems (376).
December 11, 2014, 18:20:35 1.0.10
- Bugfix for NeighborSearch regression which caused very slow allknn/allkfn. Speeds are nwo restored to approximately 1.0.8 speeds, with significant improvement for the cover tree (#365).
- Detect dependencies correctly when ARMA_USE_WRAPPER is not defined (i.e. libarmadillo.so does not exist).
- Bugfix for compilation under Visual Studio (#366).
August 29, 2014, 21:26:18 1.0.9
- GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
- Check for division by 0 in Forward-Backward Algorithm in HMMs (#314).
- Fix MaxVarianceNewCluster (used when re-initializing clusters for k-means) (#314).
- Fixed implementation of Viterbi algorithm in HMM::Predict() (#316).
- Significant speedups for dual-tree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
- Fix for LRSDP optimizer so that it compiles and can be used (#325).
- CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#324).
- CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
- Removed incorrect PeriodicHRectBound (#30).
- Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
- Fix for centering in kernel PCA (#355).
- Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
- HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
- Added Nyström method for kernel matrix approximation by Marcus Edel.
- Kernel PCA now supports using Nyström method for approximation.
- Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
- The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
- A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
- Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
- Sparse autoencoder added by Siddharth Agrawal.
July 28, 2014, 20:52:10 1.0.8
- Memory leak in NeighborSearch index-mapping code fixed.
- GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
- Logistic regression implementation added in methods/logistic_regression.
- Version information is now obtainable via mlpack::util::GetVersion() or the _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH macros.
- Fix typos in allkfn and allkrann output.
January 7, 2014, 05:47:22 1.0.7
- Cover tree support for range_search, rank-approximate nearest neighbors, minimum spanning tree calculation, and FastMKS.
- Dual-tree FastMKS implementation added and tests.
- Added collaborative filtering package that can provide recommendations when given users and items.
- Fix for correctness of Kernel PCA.
- Speedups for PCA and Kernel PCA.
- Fix for correctness of Neighborhood Components Analysis (NCA).
- Minor speedups for dual-tree algorithms.
- Fix for Naive Bayes Classifier (nbc).
- Added a ridge regression option to LinearRegression (linear_regression).
- Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints.
- MVU removed because it is known to not work.
- Minor updates and fixes for kernels (in mlpack::kernel).
October 4, 2013, 22:24:48 1.0.6
Minor bugfix so that FastMKS gets built.
June 13, 2013, 21:26:10 1.0.5
Speedups of cover tree traversers; addition of rank-approximate nearest neighbor (RANN); addition of fast exact max-kernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the Baum-Welch algorithm for HMM training; fix for compilation with clang; fix for k-furthest neighbor search.
May 2, 2013, 07:24:32 1.0.4
Force minimum Armadillo version of 2.4.2; add locality-sensitive hashing (LSH); handle size_t support correctly with Armadillo 3.6.2; better tests for SGD and NCA; better output of types to streams; some style fixes.
February 8, 2013, 22:32:43 1.0.3
Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.
September 17, 2012, 01:27:19 1.0.2
Added density estimation trees, nonnegative matrix factorization, an experimental cover tree implementation, and several bugfixes. See http://trac.research.cc.gatech.edu/fastlab/milestone/mlpack%201.0.2 for a full listing of tickets closed.
August 15, 2012, 20:47:13 1.0.1
Added local coordinate coding, sparse coding, kernel PCA, and several bugfixes.
March 20, 2012, 20:59:53 1.0.0
Yet another announcement on mloss.org.
December 17, 2011, 10:37:05 0.2
Initial Announcement on mloss.org.
November 20, 2009, 04:01:36 0.1
Initial Announcement on mloss.org.
October 7, 2008, 07:12:37
Leave a comment
You must be logged in to post comments.