Fast and accurate site frequency spectrum estimation from low coverage sequence data

Abstract

© The Author 2014. Published by Oxford University Press. All rights reserved. Motivation: The distribution of allele frequencies across polymorphic sites, also known as the site frequency spectrum, is of primary interest in population genetics. It is a complete summary of sequence variation at unlinked sites and more generally, its shape reflects underlying population genetic processes. One practical challenge is that inferring the SFS from low coverage sequencing data in a straightforward manner by using genotype calls can lead to significant bias. To reduce bias, previous studies have used a statistical method that directly estimates the SFS from sequencing data by first computing site allele frequency likelihood for each site using a dynamic programming algorithm. Although this method produces an accurate SFS, computing the SAF likelihood is quadratic in the number of samples sequenced. Results: To overcome this computational challenge, we propose an algorithm, 'score-limited DP' algorithm, which is linear in the number of genomes to compute the SAF likelihood. This algorithm works because in a lower triangular matrix that arises in the DP algorithm, all non-negligible values of the SAF likelihood are concentrated on a few cells around the best-guess allele counts. We show that our score-limited DP algorithm has comparable accuracy but is faster than the original DP algorithm. This speed improvement makes SFS estimation practical when using low coverage NGS data from a large number of individuals.

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 92,497

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Improved FCM Algorithm Based on K-Means and Granular Computing.Zhuang Zhi Yan & Wei Jia Lu - 2015 - Journal of Intelligent Systems 24 (2):215-222.

Analytics

Added to PP
2017-03-08

Downloads
2 (#1,809,554)

6 months
1 (#1,478,830)

Historical graph of downloads
How can I increase my downloads?

Author's Profile

Eun-Ji Han
Seoul National University

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references