Project details for MLPACK

Logo MLPACK 1.0.3

by rcurtin - September 17, 2012, 01:27:19 CET [ Project Homepage BibTeX BibTeX for corresponding Paper Download ]

view (5 today), download ( 0 today ), 5 comments, 5 subscriptions

OverallWhole StarWhole StarWhole StarWhole Star1/2 Star
FeaturesWhole StarWhole StarWhole StarWhole StarWhole Star
UsabilityWhole StarWhole StarWhole StarWhole StarWhole Star
DocumentationWhole StarWhole StarWhole StarWhole StarEmpty Star
(based on 1 vote)

MLPACK is a scalable C++ machine learning library. Its aim is to make large-scale machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users.

The following methods are provided:

  • Density Estimation Trees
  • Euclidean Minimum Spanning Trees
  • Gaussian Mixture Models (GMMs)
  • Hidden Markov Models (HMMs)
  • Kernel Principal Components Analysis (KPCA)
  • K-Means Clustering
  • Least-Angle Regression (LARS/LASSO)
  • Local Coordinate Coding
  • Naive Bayes Classifier
  • Neighborhood Components Analysis (NCA)
  • Nonnegative Matrix Factorization (NMF)
  • Principal Components Analysis (PCA)
  • Simple Least-Squares Linear Regression
  • Sparse Coding
  • Tree-based Neighbor Search (all-k-nearest-neighbors, all-k-furthest-neighbors), using either kd-trees or cover trees
  • Tree-based Range Search

Command-line executables are provided for each of these, and the C++ classes which define the methods are highly flexible, extensible, and modular. More information (including documentation, tutorials, and bug reports) is available at

Changes to previous version:

Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.

BibTeX Entry: Download
Corresponding Paper BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Platform Independent
Data Formats: Plain Ascii, Ascii, Txt, Bin, Csv, Xml
Tags: Gmm, Hmm, Machine Learning, Sparse, Dual Tree, Fast, Scalable, Tree
Archive: download here

Other available revisons

Version Changelog Date
  • GMM initialization is now safer and provides a working GMM when constructed with only the dimensionality and number of Gaussians (#314).
  • Check for division by 0 in Forward-Backward Algorithm in HMMs (#314).
  • Fix MaxVarianceNewCluster (used when re-initializing clusters for k-means) (#314).
  • Fixed implementation of Viterbi algorithm in HMM::Predict() (#316).
  • Significant speedups for dual-tree algorithms using the cover tree (#243, #329) including a faster implementation of FastMKS.
  • Fix for LRSDP optimizer so that it compiles and can be used (#325).
  • CF (collaborative filtering) now expects users and items to be zero-indexed, not one-indexed (#324).
  • CF::GetRecommendations() API change: now requires the number of recommendations as the first parameter. The number of users in the local neighborhood should be specified with CF::NumUsersForSimilarity().
  • Removed incorrect PeriodicHRectBound (#30).
  • Refactor LRSDP into LRSDP class and standalone function to be optimized (#318).
  • Fix for centering in kernel PCA (#355).
  • Added simulated annealing (SA) optimizer, contributed by Zhihao Lou.
  • HMMs now support initial state probabilities; these can be set in the constructor, trained, or set manually with HMM::Initial() (#315).
  • Added Nyström method for kernel matrix approximation by Marcus Edel.
  • Kernel PCA now supports using Nyström method for approximation.
  • Ball trees now work with dual-tree algorithms, via the BallBound<> bound structure (#320); fixed by Yash Vadalia.
  • The NMF class is now AMF<>, and supports far more types of factorizations, by Sumedh Ghaisas.
  • A QUIC-SVD implementation has returned, written by Siddharth Agrawal and based on older code from Mudit Gupta.
  • Added perceptron and decision stump by Udit Saxena (these are weak learners for an eventual AdaBoost class).
  • Sparse autoencoder added by Siddharth Agrawal.
July 28, 2014, 20:52:10
  • Memory leak in NeighborSearch index-mapping code fixed.
  • GMMs can be trained using the existing model as a starting point by specifying an additional boolean parameter to GMM::Estimate().
  • Logistic regression implementation added in methods/logistic_regression.
  • Version information is now obtainable via mlpack::util::GetVersion() or the _MLPACKVERSION_MAJOR, _MLPACKVERSION_MINOR, and _MLPACKVERSION_PATCH macros.
  • Fix typos in allkfn and allkrann output.
January 7, 2014, 05:47:22
  • Cover tree support for range_search, rank-approximate nearest neighbors, minimum spanning tree calculation, and FastMKS.
  • Dual-tree FastMKS implementation added and tests.
  • Added collaborative filtering package that can provide recommendations when given users and items.
  • Fix for correctness of Kernel PCA.
  • Speedups for PCA and Kernel PCA.
  • Fix for correctness of Neighborhood Components Analysis (NCA).
  • Minor speedups for dual-tree algorithms.
  • Fix for Naive Bayes Classifier (nbc).
  • Added a ridge regression option to LinearRegression (linear_regression).
  • Gaussian Mixture Models (gmm::GMM<>) now support arbitrary covariance matrix constraints.
  • MVU removed because it is known to not work.
  • Minor updates and fixes for kernels (in mlpack::kernel).
October 4, 2013, 22:24:48

Minor bugfix so that FastMKS gets built.

June 13, 2013, 21:26:10

Speedups of cover tree traversers; addition of rank-approximate nearest neighbor (RANN); addition of fast exact max-kernel search (FastMKS); fix for EM covariance estimation; more parameters for GMM estimation; force GMM and GaussianDistribution covariance matrices to be positive definite during training; add a tolerance parameter to the Baum-Welch algorithm for HMM training; fix for compilation with clang; fix for k-furthest neighbor search.

May 2, 2013, 07:24:32

Force minimum Armadillo version of 2.4.2; add locality-sensitive hashing (LSH); handle size_t support correctly with Armadillo 3.6.2; better tests for SGD and NCA; better output of types to streams; some style fixes.

February 8, 2013, 22:32:43

Armadillo 3.4.0 includes sparse matrix support internally; MLPACK's internal sparse matrix support has thus been removed.

September 17, 2012, 01:27:19

Added density estimation trees, nonnegative matrix factorization, an experimental cover tree implementation, and several bugfixes. See for a full listing of tickets closed.

August 15, 2012, 20:47:13

Added local coordinate coding, sparse coding, kernel PCA, and several bugfixes.

March 20, 2012, 20:59:53

Yet another announcement on

December 17, 2011, 10:37:05

Initial Announcement on

November 20, 2009, 04:01:36

Initial Announcement on

October 7, 2008, 07:12:37


Eileen (on February 13, 2009, 12:13:23)

having this problem when running fl-build-all

/bin/sh: g++4: not found make: * [$FASTLIBPATH/bin/i686_Linux_fast_gcc4_-DDISABLE_DISK_MATRIX/obj/mlpack_allnn_main.o] Error 127

and a whole lot of similar error

Am i missing something?

fastlab (on February 14, 2009, 03:55:05)

You need to install gcc 4. Which platform are you running on?

Paul Rodriguez (on December 21, 2010, 21:38:24)


I've set up the ccmake configuration options as appropriate but now I'm having trouble with the make command described below,

thanks, Paul Rodriguez

Using a santos linux, on an intel 64 bit processor, when I execute "make install" I get the following error regarding pthread_atfork:

-- A library with BLAS API found. -- A library with BLAS API found. -- A library with LAPACK API found. -- Configuring done -- Generating done -- Build files have been written to: /users/sdsc/prodriguez/mlpack-0.2/fastlib/build [ 2%] Built target template_types [ 5%] Built target template_types_detect [ 17%] Built target base [ 20%] Built target col [ 23%] Built target file [ 30%] Built target fx [ 33%] Built target la [ 35%] Built target data [ 35%] Built target tree [ 43%] Built target math [ 46%] Built target par [ 87%] Built target fastlib [ 89%] Built target otrav_test [ 92%] Built target col_test [ 94%] Building CXX object fastlib/data/CMakeFiles/dataset_test.dir/ Linking CXX executable dataset_test /rmount/usr_apps/compilers/intel/Compiler/11.1/038/lib/intel64/ undefined reference to `pthread_atfork' collect2: ld returned 1 exit status make[2]: [fastlib/data/dataset_test] Error 1 make[1]: [fastlib/data/CMakeFiles/dataset_test.dir/all] Error 2 make: * [all] Error 2

Andreas Mueller (on March 20, 2012, 13:29:07)

Two comments: 1) I have not found a way to contact the project on the project website. Having to come to mloss and logging in to contact the developers seems a bit weird.

2) mlpack does not seems to build with armadilla in a non-standard location. After trying to feed cmake the correct pathes for a while I gave up and installed globally. In particular, setting the paths in the CMake configuration doesn't help much. Would be cool if you could fix that.

Cheers, Andy

Ryan Curtin (on March 20, 2012, 20:22:49)

Hello Andy,

I've clarified a bit to note that the Trac site is where bugs can be filed.

As for finding Armadillo, I have not had a problem doing the following (in this instance, I've got Armadillo 2.99.1 built in /home/ryan/src/armadillo-2.99.1/)

build$ cmake -D ARMADILLO_INCLUDE_DIR=/home/ryan/src/armadillo-2.99.1/build/ -D ARMADILLO_LIBRARY=/home/ryan/src/armadillo-2.99.1/ ../

Did those two variables (ARMADILLO_INCLUDE_DIR and ARMADILLO_LIBRARY) not work for you? If you're still having problems (or have other problems) feel free to file a ticket at

Leave a comment

You must be logged in to post comments.