Hairpin Bisulfite PCR DNA Sequence Processing Program
program written by Diane Genereux and Brooks Miner
Purpose of this Program:
We use this program to process raw DNA sequene data obtained from hairpin-bisulfite PCR. The DNA sequence data is returned as a single strand, but because of a hairpin linker, the single strand is actually two complementary strands attached to each other. Our processing program separates the strands and changes the lettering of CpG dyads such that we are able to rapidly color-code dyads in MS Word.
For example, we can rapidly turn this:
Into this:
How to use this program:
The script is written in Perl, which we run from a terminal windown in Mac OS X. It operates on a separate text file, currently titled "filename" in the first line of the script.
The text file with the sequences for processing has the following format:
>SAMPLE NAME
SEQUENCE
(line break)
>SAMPLE NAME
SEQUENCE
(line break)
The program performs a series of operation on the sequences. First, each sequence is converted into its complement, then the sequence is split into two strands (top and bottom). The split occurs at a defined position, which is within the hairpin linker. The program is currently set to perform this split at position 205.
Once the strands are separated, the program performs operations on the Cs, Ts, and Gs within CpG dyads. The quality of each CpG site (C=methylated cytosine, resistant to bisulfite coversion, T=unmethylated cytosine that was converted to uracil in bisulfite coversion) is encoded with different letterings which allow for rapid color-coding in MS Word. The following changes to the text take place in the program:
All CpG dyads that have Cs are changed from "CG" to "DH" in the top strand.
All CpG dyads that have Cs are changed from "GC" to "HE" in the bottom strand.
All CpG dyads that have Ts are changed from "TG" to "UH" in the top strand.
All CpG dyads that have Ts are changed from "GT" to "HV" in the bottom strand.
All Cs present outside of the CpG dyads are changed to D in the top strand and E in the bottom strand.
Plain Text Code of this Program*
We would be happy to answer question about runnign Perl scripts, color coding sequences, hairpin-bisulfite PCR, or anything else... We would also appreciate feedback about experiences with our program or a derivative of it, especially any bug reports!
*If you use this program to prepare data for publication, please cite it appropriately. You may contact Diane Genereux or Brooks Miner regarding the citation format.