Length-agnostic, barcode-directed assembly of gene haplotypes
Barcoding and in silico assembly of mutagenized sequences has expanded the scale at which DNA sequences can be functionally assayed, as sequence spanning multiple short, high-throughput sequencing reads can be assembled into a single allele using unique barcodes (Patwardhan, Hiatt, et al. Nat Biotechnology (2012)). Although this method enabled the resolution of haplotypes of sequences of up to approximately one kilobase (the cluster-generating limit of an Illumina sequencer), current techniques require further molecular biological manipulations, such as the combinatorial removal of internal regions, to resolve the haplotypes of longer sequences. We are developing a method for barcoded assembly of sequences that circumvents this 1 kb limit. We linearize the plasmid containing our barcoded gene of interest upstream of the gene's promoter, then use an endonuclease to digest the fragment from each end. After removing the remaining plasmid backbone, we are left with many barcoded fragments of different lengths. Recircularization brings the endonuclease-digested end of the gene proximal to the barcode. Single-ended, short PCR followed by single-stranded ligation creates sequences that can be clustered and sequenced on an Illumina high-throughput sequencer. These reads can then be merged by their barcode sequences. We will apply this technique first to the selections for the function of large, mutagenized yeast transcription factors, like those encoded by the ADR1 and MSN2 genes, but the technique should be generalizable to many applications that require the resolved haplotypes of large genes.