Johanna Eddy

Guanine rich nucleic acids have the potential to form G-quadruplex or G4 DNA. G-rich sequences can form G4 DNA when it becomes transiently single-stranded, such as during transcription or replication. Key cellular processes are identified with G-rich chromosomal regions, most notably the telomeres, the rDNA, and the immunoglobulin heavy chain (IgH) switch regions. At the IgH switch regions, transcription induces formation of G4 DNA structures targeted by factors essential to class switch recombination. G-rich regions can also be sites of unprogrammed genomic instability. Many B-cell lymphomas carry a translocation of proto-oncogenes to the IgH switch region, and the common translocation breakpoints map to G-rich regions that form structures similar to those formed by transcribed G-rich switch regions. We quantitated potential for G4 DNA formation ("G4P") of the known human genes, and then correlated gene function with G4P (Eddy and Maizels (2006)). We found that very low and very high G4P correlates with specific functional classes of genes. Notably, tumor suppressor genes have very low G4P and proto-oncogenes have very high G4P. The differences in G4P between tumor suppressor genes and proto-oncogenes do not reflect enrichment for CpG islands or local chromosomal environment. These results show that genomic structure undergoes selection based on gene function. Selection based on G4P could promote genomic stability (or instability) of specific classes of genes; or reflect mechanisms for global regulation of gene expression. To understand how G4P might influence regulation of gene expression, we examined the 2 kb spanning the transcription start sites (TSS) of the human RefSeq genes (Eddy and Maizels (2008)). Upstream of the TSS, much of the G-richness and G4P derives from the presence of motifs recognized by the transcription factor SP1, and CpG dinucleotides which are sites of regulatory methylation. The simplest interpretation of our results is that transcriptional regulation at sites upstream of the TSS is determined by canonical regulatory mechanisms acting on duplex DNA. Downstream of the TSS, G-richness is concentrated in the first intron, and on the nontemplate strand in almost half of all known genes. Almost 3000 (16%) of human genes contain G4 motifs at the 5Õ end of the first intron that cannot be accounted for by known regulatory motifs. These elements could in principle be recognized either as DNA or as RNA, providing structural targets for regulation at the level of transcription or RNA processing. We are investigating this possibility in our ongoing research.


Back to Top