The Cytochrome P450 superfamily

The cytochrome P450 (Cyp450) family in vertebrates has various features that make it an ideal test case of one of my favorite hypotheses about the evolution of duplicate genes. The specific hypothesis is that genes can be divided into those that perform core organismal functions and those that perform adaptive (or peripheral) functions, and that these classes of genes can be recognized based on their evolutionary patterns (independent of their specific biochemical roles). More specifically, genes that perform peripheral functions are much more likely to be subject to rapid birth-death evolution and positive selection than those that perform core functions. It has long been appreciated that genes that perform core functions tend be more conserved in amino acid sequence over long evolutionary time periods. New findings suggest that this tendency is robust and general, that it extends to duplication stability, and that peripheral genes have markedly higher rates of duplication and deletion. In addition, it is increasingly clear that peripheral genes comprise the large majority of genes in most organisms, and they are likely to underlie many aspects of evolution. Though my analysis is restricted to multicellular eukaryotes, there are many parallels with genome island evolution in bacteria (aka pathogenicity islands).

Here are the reasons the vertebrate Cyp450 family is such a good test case:

Cyp450 family nomenclature

Current Cyp450 naming in mammals is systematic and based roughly on sequence families. Genes and proteins are named "Cyp" followed by a family number, followed by a subfamily letter and a number for each specific gene in a subfamily. For example, Cyp3A5 and Cyp3A7 are two specific genes in the 3A subfamily that encode closely related proteins. Members of the Cyp2, Cyp3, and Cyp5 families are primarily involved in xenobiotic detoxification, and members of all the other families are primarily involved in core metabolic functions such as steroid biosynthesis. Protein and gene names are the same, with the protein name in all caps (CYP3A5).

Duplication patterns in the Cyp450 family

Under development.

A maximum-likelihood protein tree (very large image - if your browser doesn't allow you to magnify and scroll, try downloading it) of all known members of the Cyp450 superfamily from two fish, a frog, chicken, mouse, rat, cow, dog, macaque, chimpanzee, and human shows that CYP450 enzymes that act on endogenous substrates are stable phylogenetically, with very few or no gene duplications or deletions during vertebrate evolution. In contrast, CYP450 enzymes known to act on exogenous substrates (xenobiotic detoxifiers) are unstable, with frequent gene duplications in all lineages. There are a few additional enzymes that are encoded either by a stable or the unstable class of genes, which we predict will prove to similarly divide into those with endogenous and exogenous functions. The tree shows proteins that are color coded by organism, making it easier to see the large lineage-specific expansions. Markups on the tree concerning enzyme activity and other functional features are derived from a wide variety of publications and public genome resources.

Color key:
Homo sapiens, Pan troglodytes (chimp), and Macaca mulata (Rhesus macaque) are in shades of blue.
Canis familiaris (dog) and Bos taurus (cow) are in shades of purple.
Mus musculus (mouse) and Rattus norvegicus (rat) are in shades of pink.
Gallus gallus (chicken) are in gold.
Xenopus tropicalis (frog) are in olive yellow.
Danio rerio (zebrafish) and Takifugu rubripes (fugu fish) are in shades of green.

Notes: At high magnification specific Ensembl gene and protein names and other markups can be read. At lower magnification, stable genes are apparent as rainbow-like blocks of color and unstable genes are apparent as solid blocks of color shades. Bony fish underwent a whole-genome duplication, which is apparent from two copies of many otherwise stable genes in the two fish species. Especially for chimp, macaque, and cow, occasional proteins are missing from the tree, which probably results from incomplete sequence and annotation status (rather than true absence of the gene).


Thomas lab index page