Project details for Apache Mahout

Logo Apache Mahout 0.8

by gsingers - July 27, 2013, 15:52:32 CET [ Project Homepage BibTeX Download ]

view (7 today), download ( 1 today ), 3 subscriptions


Apache Mahout is an Apache Software Foundation project with the goal of creating both a community of users and a scalable, Java-based framework consisting of many machine learning algorithm implementations. The project currently has map-reduce enabled (via Apache Hadoop) implementations of several clustering algorithms (k-Means, Streaming k-Means, Mean-Shift, Fuzzy k-Means, Dirichlet, Canopy), Naïve Bayes and Complementary Naïve Bayes classifiers, Hidden Markov Models, Stochastic Gradient Descent, Latent Dirichlet Allocation, Frequent Patternset Mining, Random Decision Forests, distributed Singular Value Decomposition, distributed collocations, collaborative filtering and more. Mahout also has an extensive linear algebra, statistics, primitive Java collections and other tools available.

Changes to previous version:

Apache Mahout 0.8 contains, amongst a variety of performance improvements and bug fixes, an implementation of Streaming K-Means, deeper Lucene/Solr integration and new scalable recommender algorithms. For a full description of the newest release, see

BibTeX Entry: Download
URL: Project Homepage
Supported Operating Systems: Agnostic
Data Formats: Arff, Lucene, Mahout Vector, Various, Cassandra, Hbase
Tags: Classification, Clustering, K Nearest Neighbor Classification, Genetic Algorithms, Collaborative Filtering, Collocations, Frequent Pattern Mining, Scalable Singular Value Decomposition, Svd, Machine L
Archive: download here

Other available revisons

Version Changelog Date

Apache Mahout introduces a new math environment we call Samsara, for its theme of universal renewal. It reflects a fundamental rethinking of how scalable machine learning algorithms are built and customized. Mahout-Samsara is here to help people create their own math while providing some off-the-shelf algorithm implementations. At its core are general linear algebra and statistical operations along with the data structures to support them. You can use is as a library or customize it in Scala with Mahout-specific extensions that look something like R. Mahout-Samsara comes with an interactive shell that runs distributed operations on a Spark cluster. This make prototyping or task submission much easier and allows users to customize algorithms with a whole new degree of freedom. Mahout Algorithms include many new implementations built for speed on Mahout-Samsara. They run on Spark 1.3+ and some on H2O, which means as much as a 10x speed increase. You’ll find robust matrix decomposition algorithms as well as a Naive Bayes classifier and collaborative filtering. The new spark-itemsimilarity enables the next generation of cooccurrence recommenders that can use entire user click streams and context in making recommendations.

November 9, 2015, 16:12:06

Apache Mahout 0.8 contains, amongst a variety of performance improvements and bug fixes, an implementation of Streaming K-Means, deeper Lucene/Solr integration and new scalable recommender algorithms. For a full description of the newest release, see

July 27, 2013, 15:52:32

We are pleased to announce release 0.4 of Mahout. Virtually every corner of the project has changed, and significantly, since 0.3. Developers are invited to use and depend on version 0.4 even as yet more change is to be expected before the next release. Highlights include:

* Model refactoring and CLI changes to improve integration and consistency
* New ClusterEvaluator and CDbwClusterEvaluator offer new ways to evaluate clustering effectiveness
* New Spectral Clustering and MinHash Clustering (still experimental)
* New VectorModelClassifier allows any set of clusters to be used for classification
* Map/Reduce job to compute the pairwise similarities of the rows of a matrix using a customizable similarity measure
* Map/Reduce job to compute the item-item-similarities for item-based collaborative filtering
* RecommenderJob has been evolved to a fully distributed item-based recommender
* Distributed Lanczos SVD implementation
* More support for distributed operations on very large matrices
* Easier access to Mahout operations via the command line
* New HMM based sequence classification from GSoC (currently as sequential version only and still experimental)
* Sequential logistic regression training framework
* New SGD classifier
* Experimental new type of NB classifier, and feature reduction options for existing one
* New vector encoding framework for high speed vectorization without a pre-built dictionary
* Additional elements of supervised model evaluation framework
* Promoted several pieces of old Colt framework to tested status (QR decomposition, in particular)
* Can now save random forests and use it to classify new data
* Many, many small fixes, improvements, refactorings and cleanup
November 2, 2010, 04:28:34

Added distributed (Map/Reduce) Singular Value Decomposition and Map/Reduce collocations. New high performance collections and matrix/vector libraries (based on Colt with many enhancements). Many new utilities for converting content to Mahout format. See for more details.

April 19, 2010, 14:20:16

Focus on performance and cleanup of APIs on the way to a 1.0 release. Added several new algorithms (LDA, Frequent Patternset Mining, Random Decision Forests). See 0.2 release announcement:

January 3, 2010, 22:33:30

First official public release

September 30, 2008, 01:58:55


No one has posted any comments yet. Perhaps you'd like to be the first?

Leave a comment

You must be logged in to post comments.