UWB Logo

Computer Science and Software Engineering Capstone Presentations

Summer Quarter

August 21, 2020

Jerad Broc Hardman

"Data Collection and Filtering for DeepTracer"

(UWB CSS Faculty Research)

Faculty Advisor: Dr. Dong Si

Abstract

DeepTracer is a program created to predict the protein structures of viruses to help aid in the creation of vaccines and treatments. The DeepTracer model is trained using Cryo-Electron Microscopy images, which are stored as density values in a 3-dimensional matrix. DeepTracer also uses the solved model structure (PDB) of the same protein and its atom coordinates as the ground truth for training. My contribution to DeepTracer was to automate the collection of any new data for the EMDataResource databank and add it to our current dataset. I also created a filtering process to check the correlation of the density map to the solved model structure. The filtering process will then remove any poorly correlated pairs from the training dataset. By filtering out pairs with poor correlation, we will be able to more accurately train the DeepTracer model. The filtering was done using two different methods. The first method consisted of finding the location of each CA (Carbon Alpha) atom in the density map and then remove any density values around it. This will result in leaving only density values that do not line up with the solved model structure. This technique lets us know when the density map is larger than the solved model structure. The second method consisted of checking the location of each CA atoms and deciding if it lies within the high-density values of each axis of the density map. By knowing the total percentage of atoms that are positioned outside of the density map, we can determine if the solved model structure is larger than the density map. When utilizing these two techniques together we can look at the calculations from both tests and determine if the pair is offset from each other. If both results were 100%, this means there was no overlap between the density map and the solved model structure. To access the accuracy of the filtering process, I visually classified 100 maps as good or bad. These classifications were then compared to the results from the filtering process. In total, 74 of the maps classifications were the same between visual inspection and the algorithm. By adjusting the threshold parameter these results may be able to be improved with more testing.

Updated August 20, 2020, 3:11