Combinatorial Pooling Enables Selective Sequencing of the Barley Gene Space
Presenter
February 14, 2012
Keywords:
- Sequences and sets
Abstract
We propose a new sequencing protocol that combines recent advances in
combinatorial pooling design and second-generation sequencing
technology to efficiently approach de novo selective genome
sequencing. We show that combinatorial pooling is a cost-effective
and practical alternative to exhaustive DNA barcoding when dealing
with hundreds or thousands of DNA samples, such as genome-tiling
gene-rich BAC clones. The novelty of the protocol hinges on the
computational ability to efficiently compare hundreds of million of
short reads and assign them to the correct BAC clones so that the
assembly can be carried out clone-by-clone. Experimental results on
simulated data for the rice genome show that the deconvolution is
extremely accurate (99.57% of the deconvoluted reads are assigned to
the correct BAC), and the resulting BAC assemblies have very high
quality (BACs are covered by contigs over about 77% of their length,
on average). Experimental results on real data for a gene-rich subset of
the barley genome confirm that the deconvolution is accurate (almost
70% of left/right pairs in paired-end reads are assigned to the same
BAC, despite being processed independently) and the BAC assemblies have
good quality (the average sum of all assembled contigs is about 88%
of the estimated BAC length).
Joint work with D. Duma (UCR), M. Alpert (UCR), F. Cordero (U of Torino),
M. Beccuti (U of Torino), P. R. Bhat (UCR and Monsanto), Y. Wu (UCR and Google),
G. Ciardo (UCR), B. Alsaihati (UCR), Y. Ma (UCR), S. Wanamaker (UCR),
J. Resnik (UCR), and T. J. Close (UCR).
Preprint available at http://arxiv.org/abs/1112.4438