Decoding

the living genome:

ENCODE

John Stamatoyannopoulos, M.D. ’95, would like to draw a distinction. First, there is the human genome — sequenced for the first time a dozen years ago in an impressive, significant undertaking that shows the basic genetic roadmap of the human body.

The illustration
An ENCODE-enabled map of the core regulatory network of four genes (blue).

Then there is the living genome: the physical packaging of the genome in cells into a molecule called chromatin that determines, at the cellular level, how we function — including the ways in which we are vulnerable to disease or respond to the environment or to medication.

Approximately nine years ago, Stamatoyannopoulos, UW associate professor of genome sciences, and his colleagues at UW Medicine, Fred Hutchinson Cancer Research Center and other institutions around the nation embarked on an expedition to map the living genome, a project called ENCODE (Encyclopedia of DNA Elements). The preliminary results are in.

“The living genome is densely packed with information that appears when the DNA molecule takes the form of chromatin in cells,” says Stamatoyannopoulos. “That form is a kind of living machine — like a microprocessor sitting in every cell. It’s sensing and integrating signals from the cell’s environment and adapting. If we can learn how to read that — how to connect the form of the living genome to its function — it would have game-changing implications for how we diagnose, follow and treat a wide range of different diseases.”

The ENCODE project

Researchers have known for years that only about two percent of human DNA — the double helix that contains the instructions that cells use to grow and divide — consists of traditional genes, the instructions that make proteins. ENCODE’s purpose was to explore the remaining 98 percent.

Not that this non-gene territory was completely unexplored. In the 1970s, UW Medicine researchers Mark T. Groudine, M.D., Res. ’79, Ph.D., UW professor of radiation oncology and executive vice president of the Basic Sciences Division at the Hutchinson Center, and the late Harold Weintraub, M.D., Ph.D., (a founding faculty member of the Basic Sciences Division at the center), had discovered a relationship between the physical structure of DNA — how it is packaged in the cell — and how genes are turned on or off, either to create proteins or to stop creating them.

“The amount of regulatory DNA
encoded in the genome is
far larger than previously imagined.”
— John Stamatoyannopoulos, M.D.

Subsequent work revealed the existence of “instructions” written in the non-gene regions that were responsible for activating or de-activating genes, a process called gene regulation. Conventional technologies made finding these gene-controlling switches difficult and time-consuming, and relatively few such regions had been uncovered prior to ENCODE.

Using powerful new technologies, ENCODE researchers have now mapped these switches — also called regulatory DNA — that are flipped “on”’ or “off” in different combinations in hundreds of kinds of cells and tissues. Two things that surprised Stamatoyannopoulos were the sheer number of switches — in the millions — and the degree to which different combinations of switches were used by cells from different parts of the body — the heart, say, or the liver. “Each kind of cell is appears to be so incredibly specialized that most of the regulatory DNA that it uses is different from other cells,” he says. “The amount of regulatory DNA encoded in the genome is far larger than previously imagined.”

Welcome to the machine

John Stamatoyannopoulos, M.D. ’95, led the multi-year charge for the ENCODE project at UW Medicine. As gene sequencing technology progressed, more data was produced, allowing for a high-resolution look at gene regulation. “We exposed a whole new universe of codes and instructions,” he says.
 
Photo: Clare McLean

ENCODE’s success was boosted by enormous technological advances in gene-sequencing technology. Unlike researchers who use gene sequencing solely to determine the specific DNA letters in a sample, ENCODE project researchers use sequencers to “read out” the results of biochemical reactions that act on DNA in the cell.

As they first began to explore the territory opened by Groudine and Weintraub, Stamatoyannopoulos’ group used early gene-sequencing technology, which could produce a few hundred sequences at a time. They then moved to microarray technology, which enabled them to examine more of the genome, but still at relatively low resolution. They re-developed their technology to use massively parallel sequencers, capable of producing hundreds of millions of sequences at a time, that appeared around 2007. “If you graphed our data production, it looks l ike a hockey stick,” says Stamatoyannopoulos. “It puttered away for a while and then took off tremendously.”

The sheer volume of data allowed an incredibly high-resolution look at gene regulation. When the project began, for example, the researchers could examine what was happening at the level of several hundred base pairs, the chemical bases that form DNA. By 2010, they realized that they could “see” activity all the way down to single base pairs. This advance enabled them to detect more than 8.5 million docking slots for different individual regulatory proteins on the DNA of the living genome; these regulatory proteins recognize specific “words” written in the DNA sequence. Many of the words were compatible with known regulatory proteins, but most had not been seen before.

“We exposed a whole new universe of codes and instructions that the genome is using to control genes,” says Stamatoyannopoulos. And the evaluation system they’ve built is applicable for different kinds of cells. In fact, they’ve now produced detailed descriptions of regulatory protein docking for over 50 cell types, and used these data to understand showing how instructions in regulatory DNA direct gene activity.

“Now we can really see how the machine is working at a level that we couldn’t before,” says Stamatoyannopoulos.

Variations in the heart

Nona Sotoodehnia, M.D., Res. ’99, Fel. ’02, Fel.’03, worked with the ENCODE group on gene regulation related to cardiac tissue.

Epidemiological investigators at the Cardiovascular Health Research Unit (CHRU) are tremendously interested in ENCODE for its potential to inform research into the genetic causes of heart disease. In fact, they participated in the study.

David S. Siscovick, M.D., Res. ’79, Fel. ’81, UW professor of medicine and epidemiology and the CHRU’s co-director, explains that their stock in trade consists of mining data from large studies. One example, CHARGE (Cohorts for Heart and Aging Research in Genetic Epidemiology) brings together data from several large studies involving thousands of people. “CHARGE is a major contributor in identifying genetic associations in large populations,” he says. “And ENCODE is going to help us interpret some of the findings from our studies.”

ENCODE is already doing just that. Another member of the CHRU, Nona Sotoodehnia, M.D., Res. ’99, Fel. ’02, Fel. ’03, is a cardiologist and the lead for the CHARGE working group on electrocardiograms (ECG). She has a vested interest in learning more about the genetic bases of heart diseases, with a focus on life-threatening heart rhythm problems.

“Why does one person clasp their chest and have sudden cardiac death where another with similar age and demographic factors does not?” asks Sotoodehnia, UW associate professor of medicine in the Division of Cardiology. “What puts someone at increased risk of sudden cardiac death? What factors help someone survive?”

In conducting studies of heart disease, she and her CHARGE colleagues found that few of the genetic associations they discovered for the condition were in the coding regions of the DNA. Rather, they fell into the non-gene territories. In cross-referencing her data with those collected by ENCODE researchers, they found that genetic associations with cardiac processes and diseases were concentrated in the regions Stamatoyannopoulos’ group identified as important for gene regulation in heart tissue — the regulatory DNA.

“ENCODE is like a roadmap,” says Sotoodehnia. “It helps us navigate the vast unknown landscape of non-coding genomic regions.”

The network vs. the gene

With the completion of the Human Genome Project a dozen years ago, hundreds of research projects were initiated to connect specific genes to specific diseases. The result, says Stamatoyannopoulos, was that researchers found thousands of DNA variations associated with disease. And, as Sotoodehnia’s cohort found, only about five percent of the variations were associated with the genes themselves.

As Stamatoyannopoulos puts it, there usually won’t be a simple solution to illness: one gene for X disease, or another gene for Y disease. Rather, ENCODE ushers in an era in which many diseases are understood to be the result of complex interactions among regulatory DNA regions and the genes they switch on and off.

Adding to the complexity is proximity, or lack of it. It seems logical to assume that genes would be affected by the regulatory regions closest to them. To examine this assumption, Stamatoyannopoulos’ group developed an approach for connecting regulatory regions to the genes they control. Surprisingly, they found that the great majority of genes were far away from their controlling regulatory regions.

This finding has major implications for piecing together the genetic causes of disease. For example, the group found that many of the genes flagged by genetic studies as connecting a DNA mutation with a specific disease were incorrect. Once the map was straightened out, many findings emerged. For example, researchers realized that 25 percent of all the genetic changes associated with diseases that affect the immune system — conditions like asthma, multiple sclerosis and lupus, for instance — were all affecting the same regulatory systems. One implication: medicine used to treat one condition might be useful in individuals with another condition.

“Most of the signal that seems to be coming out of these genetic studies is pointing to changes in the regulatory DNA rather than changes in the proteins — it’s basically how the cell puts all that together,” says Stamatoyannopoulos. “The way we need to start thinking in terms of diseases now is these larger networks.”

Questions and answers

“While ENCODE is an incredibly important scientific advance…it has raised many questions that will need to be addressed before the findings will be ready for translation into clinical care,” says Siscovick. It also will take time for scientists to figure out exactly how to incorporate the findings from ENCODE in their research.

“ENCODE is like a roadmap. It helps
us navigate the vast unknown landscape
of non-coding genomic regions.”
— Nona Sotoodehnia, M.D.

Stamatoyannopoulos notes that the process has already begun. With ENCODE data widely available, he says, “you can now pull up vastly more information about [certain] genes than [a single lab] could ever generate in a reasonable amount of time.” He notes that the clamor from scientists wanting data on other cell types has begun, too. The brain has barely been investigated. Cancers have a unique-looking regulatory landscape, yet little data on cancers have been produced so far. Important body control systems such as the endocrine system have hardly been explored.

Interesting challenges await, diagnostics and therapeutics are sure to follow, and much more research is on the way. “It’s coming,” Stamatoyannopoulos says. “And it’s going to come at an accelerating pace.”

Note

Researchers around the country participated in ENCODE. If you would like to read more, their findings were published extensively in journals such as Nature, Science and Cell.

Connect With Us

Help us go green

Help us go green
Update your email address

Send a ClassNote

Send a ClassNote
Send an update

Write the editor

Write the editor
Tell us what you think

Make a Gift

UW Medicine UW School of Medicine

Box 358045, Seattle, WA 98195-8045 206.685.1875 | medalum@uw.edu

Seattle, Washington