Somoclu is a C++ tool for training self-organizing maps on large data sets using a massively parallel resources. It relies on OpenMP for multicore execution and it builds on MPI for distributing the workload across the nodes of the cluster. It is also able to boost training by using CUDA if graphics processing units are available. A sparse kernel is included, which is useful for high-dimensional but sparse data, such as the vector spaces common in text mining workflows. Python, Julia, R, and MATLAB interfaces facilitate use in data analysis. The code is released under GNU GPLv3 licence.
Fast execution by parallelization: OpenMP, MPI, and CUDA are supported.
Python, Julia, R, and MATLAB interfaces for the dense multicore CPU kernel.
Planar and toroid maps.
Rectangular and hexagonal grids.
Gaussian and bubble neighborhood functions.
Both dense and sparse input data are supported.
Large emergent maps of several hundred thousand neurons are feasible.
Integration with Databionic ESOM Tools.
- Changes to previous version:
- Fixed: macOS build works again.
Other available revisons
Version Changelog Date 1.7.1
- Fixed: macOS build works again.
October 2, 2016, 10:48:46 1.7.0
- New: Julia interface is available (https://github.com/peterwittek/Somoclu.jl).
Somocluobject in Python calculates the activation map for all data instances.
Somocluobject in Python allows plotting the activation map for the training data instances or for a new data instance.
Somocluobject in Python visualizes the similarity matrix of data points according to their distance to the nodes in the map.
- Fixed: CRAN-friendliness improved.
September 30, 2016, 15:08:49 1.6.2
- Changed: In-place codebook updates when compiled without MPI. This improves update speed and substantially cuts memory use.
- Changed: Compatible with Visual Studio 15.
- Fixed: The BMUs returned after training were from before the last epoch. Now another round of BMU search is done.
- Fixed: Training can continue on the same data in the Python wrapper.
- Fixed: GPU memory allocation problem on Windows.
August 9, 2016, 14:30:34 1.6.1
- New: Option for PCA initialization is added to the Python interface.
- New: Clustering of the codebook with arbitrary clustering algorithm in scikit-learn is now possible in the Python interface.
February 22, 2016, 10:42:47 1.6
- New: R wrapper integrates with kohonen package.
- New: MATLAB wrapper integrates with soomtoolbox.
- New: Better handling of CUDA compilation in the Python interface.
- Changed: Throws an exception if GPU kernel is requested, but it was compiled without it. The earlier behaviour quietly defaulted to the CPU kernel.
January 11, 2016, 09:40:34 1.5.1
- New: Neighborhood function can be chosen between Gaussian and bubble.
- Fixed: R wrapper passes arrays with correct orientation.
io.cppis no longer required in the wrappers. An exception is thrown when needed.
December 2, 2015, 08:18:27 1.5
- New: Python interface has visual capabilities.
- New: Option for hexagonal grid.
- New: Option for requesting compact support in updating the map.
- New: Python, R, and MATLAB interfaces now allow passing an initial codebook.
- Changed: Reduced memory use in calculating U-matrices.
- Changed: Build system rebuilt and simplified.
September 30, 2015, 13:27:52 1.4.1
- Better support for ICC.
- Faster code when compiling with GCC.
- Building instructions and documentation improved.
- Bug fixes: portability for R, using native R random number generator.
January 28, 2015, 13:19:36 1.4
- Better Windows support.
- Completed CUDA support for Python and R interfaces.
- Faster compilation by removing unnecessary flags for nvcc
- Support for CUDA 6.5.
- Bug fixes: R version no longer needs separate code.
September 5, 2014, 13:01:14 1.3.1
- Initial Windows support through GCC on Windows.
- Better I/O separation for the Python, R, and MATLAB interfaces.
- Bug fixes: major MPI initialization bug fixed.
April 10, 2014, 06:41:38 1.3
- Python, R, and MATLAB interfaces added.
- Learning rate parameter included.
- Linear and exponential cooling strategies added for radius and learning rate.
- CLI interface made more user-friendly.
- Default radius depends on both X and Y of the map.
- Bug fixes: CUDA build without MPI, best matching unit passing without MPI, coordinate order in best matching unit file.
March 31, 2014, 07:53:05 1.2
- Massive improvements in OpenMP parallelization.
- MPI libraries are no longer mandatory.
- Best matching units are saved.
- Option for specifying an initial codebook for the map.
- ESOM .lrn input format added.
- Parsing of white-space characters corrected.
- Long-named command line switches for specifying SOM dimensions.
- Fine-grained control of which interim files to save across epochs
- Option in Makefile for building shared library.
December 17, 2013, 04:31:05 1.1.2
Toroid maps were added. Initial radius is exposed as a parameter via the command line interface. Formats of codebook and U-matrix export are compatible with Databionic ESOM Tools for advanced visualisation. Bug fixes: codebook update with a compact support was removed, NaN entry no longer appears in U-matrices.
November 28, 2013, 03:20:22 1.0
Initial Announcement on mloss.org.
May 14, 2013, 06:21:13
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.