New generation sequence technologies and the sequencing of increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the 'novel' sequences in a complex dataset that are of interest and the superfluous sequences need to be removed.

A novel algorithm, FACS (Fast and Accurate Classification of Sequences), is introduced that can accurately and rapidly align sequences to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS is at least three times faster and more accurate than BLAT and SSAHA2 in classifying sequences when using references larger than 50Mbp.



Henrik Stranneheim, Max Käller, Tobias Allander, Björn Andersson, Lars Arvestad, Joakim Lundeberg, Classification of DNA sequences using Bloom filters. Bioinformatics, 2010 July 1; 26(13):1595-1600. Published online 2010 May 13. doi:10.1093/bioinformatics/btq230


Henrik Stranneheim
Lars Arvestad