Predicting Intrinsically Disordered Protein Feature States Using Machine Learning and Deep Neural Network Algorithms

Brynn Biddle, Department of Biochemistry & Cellular and Molecular Biology, University of Tennessee, Knoxville, TN 37996 Debsindhu Bhowmik, Computational Sciences and Engineering Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830

Identifying accurate representations of full protein structures is challenging due to their complexity and high atoms counts. Atoms of growing peptide chains orient in various ways at each time step. Average protein structures are determined using experimental techniques, but these techniques cannot capture an accurate structure for all possible time steps. Molecular dynamics simulations make it possible to explore the full detail of molecular processes at a given time under specific conditions. One drawback is that of limited sample size caused by time restrictions. This limitation can be overcome by using machine and deep learning techniques to predict future states of the protein. This approach is especially useful for modeling intrinsically disordered proteins which do not take on an ordered secondary or tertiary structure. β-catenin is an intrinsically disordered protein in humans involved in gene transcription and cell-to-cell adhesion. This protein is biologically relevant for study because overexpressed or mutated β-catenin is linked to colon cancer. Using previously obtained molecular dynamics simulation data, we applied different transformation algorithms to one trajectory of four β-catenin systems. This allowed us to find one-dimensional embeddings and learn about underlying embeddings and loss. We implemented a Time-lagged Autoencoder, Time-lagged Variational Autoencoder, Principal Component Analysis, and Time-Structure Independent Components Analysis to study the protein. We featurized the dihedral angles phi and psi, and centered and scaled these angles by their respective interquartile ranges. We also analyzed the discrete states with Markov State Models and Macrostate Models. By implementing and analyzing both linear and non-linear algorithms, we will be able to determine the best approach for predicting the structure of β-catenin in specific time states. Predicting how the protein may orient under certain conditions can inform biomedical research, which aims to prevent the overexpression or mutation of intrinsically disordered proteins, and thus prevent certain diseases.

Additional Abstract Information

Presenter: Brynn Biddle

Institution: University of Tennessee at Knoxville

Type: Poster

Subject: Biology

Status: Approved

Time and Location

Session: Poster 3
Date/Time: Mon 4:30pm-5:30pm
Session Number: 3133