
 Description:
Stochastic neighbor embedding aims at the reconstruction of given distance, dissimilarity, or score neighborhood relations in a lowdimensional Euclidean space. This can be regarded as general approach to multidimensional scaling, but the reconstruction is based on the definition of input (and output) neighborhood probability alone. The present implementation makes use of quasi 2nd order gradientbased (l)BFGS optimization.
Neighbor relationships in the embedding space ('scatter plots') are estimated as probabilities of Gaussian or Studentt distributions; probabilities can be derived from from Gaussians over pairwise Euclidean input distances, or, as in the present case, by novel score neighborhood probabilities that do not require extra settings such as 'Gaussian width' or 'perplexity'. The estimate of neighborhood probabilities is realized as probabilities of input score exceedance for reconstructing probabilities of distances in the embedding by minimizing KLdivergence. In an experimental version soft ranks are used for numerical optimization.
The functionality is SNE, tSNE and softrankbased KL minimization where SNE is ordinary (yet wellworking) stochastic neighbor embedding, tSNE tries to avoid the 'crowding' problem by using Studentt rather than Gaussian neighborhood density assumption on the output space. That original assumption of input relationship symmetry is no longer used in this package, because pointwise reconstruction is supposed to be more specific than for general neighborhood probability distributions in the input and output space.
As additional feature, the embedding quality of data points is assessed by the contributions of embedding point placement to the cost function, i.e. the sum of absolute KLdivergence gradients caused by individual points.
Acknowledgements:
The work on SNE and tSNE I is highly appreciated "Visualizing Data using tSNE", JMLR 9, pp. 25792605, 2008, and the freely available implementations by Laurens van der Maaten.
The great (l)BFGS optimizer (fminlbfgs.m) of DirkJan Kroon found at http://www.mathworks.de/matlabcentral/fileexchange/23245 included here is STRONGLY acknowledged.
 Changes to previous version:
scoretoprob.m replaced by d2p.m
protein score data set added
trank.m computes (mid/max tied) ranks along columns of matrix
local P neighborhood probability estimation added
experimental soft_rank_SNE added for minimizing KL between probabilities of exceedance in source and embedding space
symmetry option removed, because this was strange in previous version
 BibTeX Entry: Download
 Supported Operating Systems: Platform Independent
 Data Formats: Matlab
 Tags: Dimension Reduction, Mds, Multidimensional Scaling, Sne, Stochastic Neighbor Embedding, Neighborhood Probability Estimation
 Archive: download here
Other available revisons

Version Changelog Date 1.2 gradient in xsne_fun.m fixed! (constant factor m was missing)
symmetry option reintroduced allowing for enabling symmetric and asymmetric versions of SNE and tSNE
August 20, 2013, 11:02:21 1.1 scoretoprob.m replaced by d2p.m
protein score data set added
trank.m computes (mid/max tied) ranks along columns of matrix
local P neighborhood probability estimation added
experimental soft_rank_SNE added for minimizing KL between probabilities of exceedance in source and embedding space
symmetry option removed, because this was strange in previous version
November 23, 2012, 15:10:26 1.0 Negligible changes for consolidating the code.
July 23, 2012, 12:18:24
Comments
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.