mloss | Project details:SHOGUN

SHOGUN 0.9.1

by sonne - November 16, 2009, 11:02:41 CET [ ]

view ( today), download ( today ), 4 comments, 0 subscriptions

Overall
Features
Usability
Documentation
(based on 6 votes)

Description:

The SHOGUN machine learning toolbox's focus is on large scale kernel methods and especially on Support Vector Machines (SVM). It comes with a generic interface for SVMs, features several SVM and kernel implementations, includes LinAdd optimizations and also Multiple Kernel Learning algorithms. SHOGUN also implements a number of linear methods. It allows the input feature-objects to be dense, sparse or strings and of type int/short/double/char.

The toolbox not only provides efficient implementations of the most common kernels, like the

Linear,
Polynomial,
Gaussian and
Sigmoid Kernel

but also comes with a number of recent string kernels as e.g. the

Locality Improved,
Fischer,
TOP,
Spectrum,
Weighted Degree Kernel (with shifts).

For the latter the efficient LINADD optimizations are implemented. Also SHOGUN offers the freedom of working with custom pre-computed kernels. One of its key features is the combined kernel which can be constructed by a weighted linear combination of a number of sub-kernels, each of which not necessarily working on the same domain. An optimal sub-kernel weighting can be learned using Multiple Kernel Learning. Currently SVM 2-class classification and regression problems can be dealt with. However SHOGUN also implements a number of linear methods like

Linear Discriminant Analysis (LDA)
Linear Programming Machine (LPM),
(Kernel) Perceptrons and features algorithms to train hidden markov models.

The input feature-objects can be

dense
sparse or
strings and of type int/short/double/char

and can be converted into different feature types. Chains of preprocessors (e.g. substracting the mean) can be attached to each feature object allowing for on-the-fly pre-processing.

SHOGUN is implemented in C++ and interfaces to Matlab(tm), R, Octave and Python.

Changes to previous version:

This release contains several enhancements, cleanups and bugfixes:

Features

Integrate LaRank.
Memory Mapped Features (for data sets that don't fit into memory).
Compressor module with compression and decompression support for lzo, gzip, bzip2 and lzma.
Compressed String Features with on-the-fly decompression (CDecompressString preproc).
Parallel computation of get_kernel_matrix().
One may now prefix all shogun print/outputs with file name and line number (obj.io.enable_file_and_line())
Chinese Documentation thanks Elpmis Lee.

Bugfixes

Fix One class MKL testing in static interfaces.
Configure fixes: Let octave not write history on configure; fail when cplex is forcefully enabled but not found; add cplex 12 support.
Fix a problem with regression and CombinedKernels employing only Custom kernels.

Cleanup and API Changes

String Features now (like SimpleFeatures) upon get_feature_vector require an additional do_free argument and need to be freed using free_feature_vector.

BibTeX Entry: Download

Corresponding Paper BibTeX Entry: Download

Supported Operating Systems: Cygwin, Linux, Macosx

Data Formats: Plain Ascii, Svmlight, Fasta, Fastq

Tags: Bioinformatics, Large Scale, String Kernel, Kernel, Kernelmachine, Lda, Lpm, Matlab, Mkl, Octave, Python, R, Svm

Archive: download here

Comments

Soeren Sonnenburg (on September 12, 2008, 16:14:36): In case you find bugs, feel free to report them at [http://trac.tuebingen.mpg.de/shogun](http://trac.tuebingen.mpg.de/shogun).

Tom Fawcett (on January 3, 2011, 03:20:48): You say, "Some of them come with no less than 10 million training examples, others with 7 billion test examples." I'm not sure what this means. I have problems with mixed symbolic/numeric attributes and the training example sets don't fit in memory. Does SHOGUN require that training examples fit in memory?

Soeren Sonnenburg (on January 14, 2011, 18:12:01): Shogun does not necessarily require examples to be in memory (if you use any of the FileFeatures). However, most algorithms within shogun are batch type - so using the non in-memory FileFeatures would probably be very slow. This does not matter for doing predictions of course, even though the 7 billion test examples above referred to predicting gene starts on the whole human genome (in memory ~3.5GB and a context window of 1200nt was shifted around in that string). In addition one can compute features (or feature space) on-the-fly potentially saving lots of memory. Not sure how big your problem is but I guess this is better discussed on the shogun mailinglist.

Yuri Hoffmann (on September 14, 2013, 17:12:16): cannot use the java interface in cygwin (already reported on github) nor in debian.

You must be logged in to post comments.

Manage

Details

RSS Feed for "SHOGUN"

SHOGUN 0.9.1

Comments

Leave a comment