TY - GEN
T1 - On-line hierarchy of general linear models for selecting and ranking the best predicted protein structures
AU - Girgis, Hani Zakaria
AU - Corso, Jason J.
AU - Fischer, Daniel
PY - 2009
Y1 - 2009
N2 - To predict the three dimensional structure of proteins, many computational methods sample the conformational space, generating a large number of candidate structures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model quality assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We have devised the STPdata algorithm, a consensus method based on five model quality assessment programs. We have applied it to build an on-line "custom-trained" hierarchy of general linear models to select and rank the best predicted structures. By "custom-trained", we mean for each target protein the STPdata algorithm trains a unique model on data related to the input target protein. To evaluate our method we participated in CASP8 as human predictors. In CASP8, the STPdata algorithm has trained 128 hierarchical models for each of the 128 target proteins. Based on the official results of CASP8 our method outperformed the best server by 6% and won the fourth position among human predictors. Our CASP results are purely based on computational methods without any human intervention.
AB - To predict the three dimensional structure of proteins, many computational methods sample the conformational space, generating a large number of candidate structures. Subsequently, such methods rank the generated structures using a variety of model quality assessment programs in order to obtain a small set of structures that are most likely to resemble the unknown experimentally determined structure. Model quality assessment programs suffer from two main limitations: (i) the rank-one structure is not always the best predicted structure; in other words, the best predicted structure could be ranked as the 10th structure (ii) no single assessment method can correctly rank the predicted structures for all target proteins. However, because often at least some of the methods achieve a good ranking, a model quality assessment method that is based on a consensus of a number of model quality assessment methods is likely to perform better. We have devised the STPdata algorithm, a consensus method based on five model quality assessment programs. We have applied it to build an on-line "custom-trained" hierarchy of general linear models to select and rank the best predicted structures. By "custom-trained", we mean for each target protein the STPdata algorithm trains a unique model on data related to the input target protein. To evaluate our method we participated in CASP8 as human predictors. In CASP8, the STPdata algorithm has trained 128 hierarchical models for each of the 128 target proteins. Based on the official results of CASP8 our method outperformed the best server by 6% and won the fourth position among human predictors. Our CASP results are purely based on computational methods without any human intervention.
UR - http://www.scopus.com/inward/record.url?scp=77951011412&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77951011412&partnerID=8YFLogxK
U2 - 10.1109/IEMBS.2009.5332706
DO - 10.1109/IEMBS.2009.5332706
M3 - Conference contribution
C2 - 19963875
AN - SCOPUS:77951011412
SN - 9781424432967
T3 - Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society: Engineering the Future of Biomedicine, EMBC 2009
SP - 4949
EP - 4953
BT - Proceedings of the 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society
T2 - 31st Annual International Conference of the IEEE Engineering in Medicine and Biology Society: Engineering the Future of Biomedicine, EMBC 2009
Y2 - 2 September 2009 through 6 September 2009
ER -