back to CSSE
Capstone schedule
Computer Science and Software Engineering Capstone Presentations
Summer Quarter
August 21, 2020
Jerad Broc Hardman "Data
Collection and Filtering for DeepTracer" (UWB CSS Faculty Research) Faculty Advisor: Dr. Dong Si |
Abstract DeepTracer is a program created to predict the
protein structures of viruses to help aid in the creation of vaccines and
treatments. The DeepTracer model is trained using Cryo-Electron Microscopy
images, which are stored as density values in a 3-dimensional matrix.
DeepTracer also uses the solved model structure (PDB) of the same protein and
its atom coordinates as the ground truth for training. My contribution to
DeepTracer was to automate the collection of any new data for the
EMDataResource databank and add it to our current dataset. I also created a
filtering process to check the correlation of the density map to the solved
model structure. The filtering process will then remove any poorly correlated
pairs from the training dataset. By filtering out pairs with poor
correlation, we will be able to more accurately train the DeepTracer model.
The filtering was done using two different methods. The first method
consisted of finding the location of each CA (Carbon Alpha) atom in the
density map and then remove any density values around it. This will result in
leaving only density values that do not line up with the solved model
structure. This technique lets us know when the density map is larger than
the solved model structure. The second method consisted of checking the
location of each CA atoms and deciding if it lies within the high-density
values of each axis of the density map. By knowing the total percentage of
atoms that are positioned outside of the density map, we can determine if the
solved model structure is larger than the density map. When utilizing these
two techniques together we can look at the calculations from both tests and
determine if the pair is offset from each other. If both results were 100%,
this means there was no overlap between the density map and the solved model
structure. To access the accuracy of the filtering process, I visually
classified 100 maps as good or bad. These classifications were then compared
to the results from the filtering process. In total, 74 of the maps
classifications were the same between visual inspection and the algorithm. By
adjusting the threshold parameter these results may be able to be improved
with more testing. |
|
Updated August 20, 2020, 3:11