There are a number of well-established methods such as principal components analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. For example, PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the variables that are drivers of systematic variation captured by PCA. Principal components (and other estimates of systematic variation) are directly constructed from the variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting. We introduce a new approach called the jackstraw that allows one to accurately identify variables that are statistically significantly associated with any subset or linear combination of principal components (PCs). The proposed method can greatly simplify complex significance testing problems encountered in modern high-dimensional data analysis and can be utilized to identify the variables significantly associated with latent variables.
- Changes to previous version:
Initial Announcement on mloss.org.
- BibTeX Entry: Download
- Corresponding Paper BibTeX Entry: Download
- URL: Project Homepage
- Supported Operating Systems: Agnostic
- Data Formats: Agnostic, Any Format Supported By R
- Tags: Bioinformatics, Machine Learning, Resampling, Statistics, Principal Component Analysis, Nonparametric Estimation, Unsupervised Learning, Latent Variable Model, Jackstraw, Statistical Learning
- Archive: download here
No one has posted any comments yet. Perhaps you'd like to be the first?
Leave a comment
You must be logged in to post comments.