TY - JOUR
T1 - When Positive Sentiment is not so Positive
T2 - Textual Analytics and Bank Failures
AU - Gupta, Aparna
AU - Lu, Cheng
AU - Simaan, Majeed
AU - Zaki, Mohammed J.
N1 - Publisher Copyright:
© The Author(s) 2025.
PY - 2025
Y1 - 2025
N2 - We examine U.S. publicly traded bank holding companies (BHCs) that failed during the 2007–2009 global financial crisis. Using consolidated data at the BHC level and 10-K filings, we investigate the determinants of bank failures during this period using nonlinear machine learning (ML). The in-sample analysis demonstrates that 90% of the failed banks can be classified during 2007–2009. In addition, our sensitivity analysis for interpretable ML shows that net tone is among the top five important features. However, the power of tone/text is less evident when we consider predictive (out-of-sample) analysis. While nonlinear ML models such as random forest and support vector regressions benefit from textual data in forming predictions, linear models that rely on actuarial data attain a similar or even better performance. Overall, our paper demonstrates that the least complex linear models use conventional financial ratios efficiently in predicting the failure of publicly traded banks, deeming more complex ML algorithms with 10-K textual data redundant. Our findings remain robust, even when incorporating large language models such as FinBERT (Huang et al. in Contemp Account Res 40(2):806–841, 2023).
AB - We examine U.S. publicly traded bank holding companies (BHCs) that failed during the 2007–2009 global financial crisis. Using consolidated data at the BHC level and 10-K filings, we investigate the determinants of bank failures during this period using nonlinear machine learning (ML). The in-sample analysis demonstrates that 90% of the failed banks can be classified during 2007–2009. In addition, our sensitivity analysis for interpretable ML shows that net tone is among the top five important features. However, the power of tone/text is less evident when we consider predictive (out-of-sample) analysis. While nonlinear ML models such as random forest and support vector regressions benefit from textual data in forming predictions, linear models that rely on actuarial data attain a similar or even better performance. Overall, our paper demonstrates that the least complex linear models use conventional financial ratios efficiently in predicting the failure of publicly traded banks, deeming more complex ML algorithms with 10-K textual data redundant. Our findings remain robust, even when incorporating large language models such as FinBERT (Huang et al. in Contemp Account Res 40(2):806–841, 2023).
KW - Interpretability
KW - Large language models
KW - Machine learning
KW - Predictive analytics
KW - Risk management
UR - https://www.scopus.com/pages/publications/105014285070
UR - https://www.scopus.com/pages/publications/105014285070#tab=citedBy
U2 - 10.1007/s10614-025-10969-2
DO - 10.1007/s10614-025-10969-2
M3 - Article
AN - SCOPUS:105014285070
SN - 0927-7099
JO - Computational Economics
JF - Computational Economics
ER -