TY - JOUR
T1 - Detecting clinically related content in online patient posts
AU - VanDam, Courtland
AU - Kanthawala, Shaheen
AU - Pratt, Wanda
AU - Chai, Joyce
AU - Huh, Jina
N1 - Publisher Copyright:
© 2017
PY - 2017/11
Y1 - 2017/11
N2 - Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
AB - Patients with chronic health conditions use online health communities to seek support and information to help manage their condition. For clinically related topics, patients can benefit from getting opinions from clinical experts, and many are concerned about misinformation and biased information being spread online. However, a large volume of community posts makes it challenging for moderators and clinical experts, if there are any, to provide necessary information. Automatically identifying forum posts that need validated clinical resources can help online health communities efficiently manage content exchange. This automation can also assist patients in need of clinical expertise by getting proper help. We present our results on testing text classification models that efficiently and accurately identify community posts containing clinical topics. We annotated 1817 posts comprised of 4966 sentences of an existing online diabetes community. We found that our classifier performed the best (F-measure: 0.83, Precision: 0.79, Recall:0.86) when using Naïve Bayes algorithm, unigrams, bigrams, trigrams, and MetaMap Symantic Types. Training took 5 s. The classification process took a fraction of 1 s. We applied our classifier to another online diabetes community, and the results were: F-measure: 0.63, Precision: 0.57, Recall: 0.71. Our results show our model is feasible to scale to other forums on identifying posts containing clinical topic with common errors properly addressed.
KW - Classification
KW - Clinical topic
KW - Diabetes
KW - Health information seeking
KW - Human-computer interaction
KW - Online health communities
KW - Patient
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85031277894&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031277894&partnerID=8YFLogxK
U2 - 10.1016/j.jbi.2017.09.015
DO - 10.1016/j.jbi.2017.09.015
M3 - Article
C2 - 28986329
AN - SCOPUS:85031277894
SN - 1532-0464
VL - 75
SP - 96
EP - 106
JO - Journal of Biomedical Informatics
JF - Journal of Biomedical Informatics
ER -