Our software package hapFabia identifies short identity by descent (IBD) segments that are tagged by rare variants in large sequencing data. Two haplotypes are identical by descent (IBD) if they share a segment that both inherited from a common ancestor. Current IBD methods reliably detect long IBD segments because many minor alleles in the segment are concordant between the two haplotypes. However, many cohort studies contain unrelated individuals which share only short IBD segments. Short IBD segments contain too few minor alleles to distinguish IBD from random allele sharing by recurrent mutations. New sequencing techniques improve the situation by providing rare variants which convey more information on IBD than common variants, because random minor allele sharing of rare variants is less likely than for common variants.

Short IBD segments are of interest because (i) they resolve the genetic structure on a fine scale and (ii) they can be assumed to be old. In order to detect short IBD segments, both the information supplied by rare variants and information from more than two individuals should be utilized. These two characteristics are the basis for detecting short IBD segments by HapFABIA. We propose biclustering to detect very short IBD segments that are shared among multiple individuals. Biclustering simultaneously clusters rows and columns of a matrix. In particular it clusters row elements that are similar to each other on a subset of column elements. A genotype matrix has individuals (unphased) or chromosomes (phased) as row elements and SNVs as column elements. Entries in the genotype matrix usually count how often the minor allele of a particular SNV is present in a particular individual. Alternatively, minor allele likelihoods or dosages may be used. Individuals that share an IBD segment are similar to each other at minor alleles of SNVs (tagSNVs) which tag the IBD segment. Therefore an IBD segment that is shared among individuals corresponds to a bicluster because these individuals are similar to one another at this segment. Identifying a bicluster means identifying tagSNVs (column bicluster elements) that tag an IBD segment and, simultaneously, identifying individuals (row bicluster elements) that possess the IBD segment.

Changes to previous version:

o IBD lengths correction

o improved support for haploid genomes

Other available revisons

Version Changelog Date

o citation update

o plot function improved

December 28, 2013, 17:22:38

o IBD lengths correction

o improved support for haploid genomes

November 8, 2013, 07:45:38
o haplotype vcf files are now possible
o bug fix vcftoFABIA 
o bug fix split_sparse_matrix
o plot functions with other arguments '...'
o plot arguments grid and pairs
o new function 'plotLarger' (add samples without IBD and borders)
o vcftoFABIA with command line options -s (SNVs_) and -o (output file)
o vcftoFABIA in R with output file name
October 18, 2013, 10:11:31

