TY - JOUR
T1 - Mitigating Large Language Model Bias
T2 - Automated Dataset Augmentation and Prejudice Quantification
AU - Mondal, Devam
AU - Lipizzi, Carlo
N1 - Publisher Copyright:
© 2024 by the authors.
PY - 2024/6
Y1 - 2024/6
N2 - Despite the growing capabilities of large language models, concerns exist about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers that can be useful in a variety of industries, especially ones that are “restricted” and have limited data. We consider that bias can occur due to intrinsic model architecture and dataset quality. The two aspects are evaluated using two different metrics we created. We show that our dataset augmentation algorithm reduces bias as measured by our metrics. Our code can be found on an online GitHub repository.
AB - Despite the growing capabilities of large language models, concerns exist about the biases they develop. In this paper, we propose a novel, automated mechanism for debiasing through specified dataset augmentation in the lens of bias producers that can be useful in a variety of industries, especially ones that are “restricted” and have limited data. We consider that bias can occur due to intrinsic model architecture and dataset quality. The two aspects are evaluated using two different metrics we created. We show that our dataset augmentation algorithm reduces bias as measured by our metrics. Our code can be found on an online GitHub repository.
KW - computational social science
KW - dataset augmentation
KW - large language models
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85196788823&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85196788823&partnerID=8YFLogxK
U2 - 10.3390/computers13060141
DO - 10.3390/computers13060141
M3 - Article
AN - SCOPUS:85196788823
VL - 13
JO - Computers
JF - Computers
IS - 6
M1 - 141
ER -