The
yeast genome contains 6270 genes, whose protein products range from about
25 amino acids to nearly 5000. Proteins fold into 3-dimensional structures,
and the function is dependent on this structure. Larger proteins, for
which a structure is known, display a modular design where parts of the
amino acid chain fold into an independent structure, a domain. Statistics
from the protein data bank show that the average length of a domain is
about 163 amino acids for all-alpha or all-beta proteins. Clearly, numerous
proteins in yeast have more than one domain.
Often, a specific
function can be ascribed to a single domain, and domains of the same
amino
acid chain can have different functions. Hence, to describe a function
of a protein, its domain configuration has to be known. To identify
these
domains from a sequence is far from straightforward. Our lab has developed
a domain-parsing algorithm, GINZU, that makes use of several computational
methods and databases to divide the target protein into domains. GINZU
also assigns structural homologues to domains when the domain is identified
by PSI-BLAST
(Position-Specific Iterated blast) or ORFeus,
a threading/fold recognition server.
The domains that
have no structural homologue are subjected to ab initio protein folding
with Rosetta, which most often results in a putative SCOP
(Structural Classification of Proteins) superfamily assignment. Rosetta
is one of the best ab initio algorithms today, as has been demonstrated
in the Critical Assessment of Techniques for Protein Structure Prediction
(CASP).
CASP ||
GINZU ||
Rosetta ||
Examples
Back to Technologies