TY - JOUR
T1 - Predicting malaria interactome classifications from time-course transcriptomic data along the intraerythrocytic developmental cycle
AU - Mitrofanova, Antonina
AU - Kleinberg, Samantha
AU - Carlton, Jane
AU - Kasif, Simon
AU - Mishra, Bud
PY - 2010/7
Y1 - 2010/7
N2 - Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.
AB - Objective: Even though a vaccine for malaria infections has been under intense study for many years, it has resisted several different lines of attack attempted by biologists. More than half of Plasmodium proteins still remain uncharacterized and therefore cannot be used in clinical trials. The task is further complicated by the metamorphic life-cycle of the parasite, which allows for rapid evolutionary changes and diversity among related strains, thus making precise targeting of the appropriate proteins for vaccination a technical challenge. We propose an automated method for predicting functions for the malaria parasite, which capitalizes on the importance of the intraerythrocytic developmental cycle data and expression changes during its five phases, as determined computationally by our segmentation algorithm. Materials and methods: Our method combines temporal gene expression profiles with protein-protein interaction data, sequence similarity scores, and metabolic pathway information to produce a set of predicted protein functions that can be used as targets for vaccine development. We use a Bayesian approach, which assigns a probability of having (or not having) a particular function to each protein, given the various sources of evidence. In our method, each data source is represented by either a functional linkage graph or a categorical feature vector. Results and conclusions: The methods are tested on Plasmodium falciparum, the species responsible for the deadliest malaria infections. The algorithm was able to assign meaningful functions to 628 out of 1439 previously unannotated proteins, which are first-choice candidates for experimental vaccine research. We conclude that analyzing time-course gene expression profiles in separate phases leads to much higher prediction accuracy when compared with Pearson correlation coefficients computed across the time course as a whole. Additionally, we demonstrate that temporal expression profiles alone are able to improve the predictive power of the integrated data.
KW - Bayesian probabilistic approach
KW - Intraerythrocytic developmental cycle
KW - N-terminal host targeting motif
KW - Pexel
KW - Plasmodium falciparum
KW - Protein function prediction
KW - Red blood cell membrane proteins
KW - Time-course gene expression data
UR - http://www.scopus.com/inward/record.url?scp=77954315665&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77954315665&partnerID=8YFLogxK
U2 - 10.1016/j.artmed.2010.04.013
DO - 10.1016/j.artmed.2010.04.013
M3 - Article
C2 - 20580212
AN - SCOPUS:77954315665
SN - 0933-3657
VL - 49
SP - 167
EP - 176
JO - Artificial Intelligence in Medicine
JF - Artificial Intelligence in Medicine
IS - 3
ER -