Non-parametric topic models implemented using efficient Gibbs sampling. Early theory from the ECML-PKDD 2011 paper cited.
Coded in C with no other dependencies. No Chinese restaurant processes or stick breaking so fast (non-parametric methods 1-3 times slower than regular LDA with Gibbs, and marginal increase in memory). Input can be LdaC format, docword format, various Matlab style formats. Implements HDP-LDA ala Teh, Jordan Beal and Blei (2006), HPYP-LDA, symmetric-symmetric, symmetric-asymmetric, asymmetric-symmetric, and asymmetric-symmetric priors ala Wallach, Mimno and McCallum (2009) with Pitman-Yor or Dirichlet processes. Burstiness modelling ala Doyle and Elkan (2009) can combine with any model above for even better performance. Full hyper-parameter fitting, or setting initially.
Estimation of various vectors (document and topic vectors). Diagnostics, control, restarts, test likelihood via document completion. Coherence calculations on results using PMI and normalised PMI.
- Changes to previous version:
Added example on using burstiness.
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.