|
|
 |
| |
Computational
Biology
Interpeting large, complex genomic data sets requires sophisticated
analytical tools. The field of machine learning has contributed an
increasing variety of such tools, including probabilistic methods such
as hidden Markov models and Bayesian networks as well as nonparametric
statistical techniques such as the support vector machine. To be
useful, such methods must produce accurate, interpretable results.
They must also be capable of handling a variety of different types of
data, sometimes simultaneously, and they must scale to very large data
sets.
In the context of yeast genomics and proteomics, we have applied the
support vector machine (SVM) algorithm to many different problems.
The SVM is a classification algorithm, fuctionally similar to decision
trees and artificial neural networks. The SVM is supervised, in the
sense that it relies upon an annotated set of training data for the
learning phase. During the subsequent prediction phase, the SVM
predicts classifications for unannotated test data. From a learning
theoretic perspective, the SVM has a strong foundation. Empirically,
the algorithm has been used to obtain state-of-the-art performance in
applications as diverse as handwriting recognition and natural
language processing. Here are problems that we have addressed using
SVMs:
|
|