TY - GEN
T1 - Empirical analysis of multi-task learning for reducing identity bias in toxic comment detection
AU - Vaidya, Ameya
AU - Mai, Feng
AU - Ning, Yue
N1 - Publisher Copyright:
Copyright © 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
PY - 2020
Y1 - 2020
N2 - With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have developed a variety of models to identify toxicity in online conversations, reviews, or comments with mixed successes. However, many existing approaches have learned to incorrectly associate non-toxic comments that have certain trigger-words (e.g. gay, lesbian, black, muslim) as a potential source of toxicity. In this paper, we evaluate several stateof- the-art models with the specific focus of reducing model bias towards these commonly-attacked identity groups. We propose a multi-task learning model with an attention layer that jointly learns to predict the toxicity of a comment as well as the identities present in the comments in order to reduce this bias. We then compare our model to an array of shallow and deep-learning models using metrics designed especially to test for unintended model bias within these identity groups.
AB - With the recent rise of toxicity in online conversations on social media platforms, using modern machine learning algorithms for toxic comment detection has become a central focus of many online applications. Researchers and companies have developed a variety of models to identify toxicity in online conversations, reviews, or comments with mixed successes. However, many existing approaches have learned to incorrectly associate non-toxic comments that have certain trigger-words (e.g. gay, lesbian, black, muslim) as a potential source of toxicity. In this paper, we evaluate several stateof- the-art models with the specific focus of reducing model bias towards these commonly-attacked identity groups. We propose a multi-task learning model with an attention layer that jointly learns to predict the toxicity of a comment as well as the identities present in the comments in order to reduce this bias. We then compare our model to an array of shallow and deep-learning models using metrics designed especially to test for unintended model bias within these identity groups.
UR - http://www.scopus.com/inward/record.url?scp=85099543453&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85099543453&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85099543453
T3 - Proceedings of the 14th International AAAI Conference on Web and Social Media, ICWSM 2020
SP - 683
EP - 693
BT - Proceedings of the 14th International AAAI Conference on Web and Social Media, ICWSM 2020
T2 - 14th International AAAI Conference on Web and Social Media, ICWSM 2020
Y2 - 8 June 2020 through 11 June 2020
ER -