Classifying Force Spectroscopy of DNA Pulling Measurements Using Supervised and Unsupervised Machine Learning Methods

Abstract

Dynamic force spectroscopy (DFS) measurements on biomolecules typically require classifying thousands of repeated force spectra prior to data analysis. Here, we study classification of atomic force microscope-based DFS measurements using machine-learning algorithms in order to automate selection of successful force curves. Notably, we collect a data set that has a testable positive signal using photoswitch-modified DNA before and after illumination with UV (365 nm) light. We generate a feature set consisting of six properties of force distance curves to train supervised models and use principal component analysis (PCA) for an unsupervised model. For supervised classification, we train random forest models for binary and multiclass classification of force distance curves. Random forest models predict successful pulls with an accuracy of 94% and classify them into five classes with an accuracy of 90%. The unsupervised method using Gaussian mixture models (GMM) reaches an accuracy of approximately 80% for binary classification.

Publication
JOURNAL OF CHEMICAL INFORMATION AND MODELING
David Ginger
David Ginger
B. Seymour Rabinovitch Endowed Chair in Chemistry

David Ginger is the the B. Seymour Rabinovitch Endowed Chair in Chemistry at the University of Washington, and the PI of the ginger group