When Positive Sentiment is not so Positive: Textual Analytics and Bank Failures

Research output: Contribution to journalArticlepeer-review

Abstract

We examine U.S. publicly traded bank holding companies (BHCs) that failed during the 2007–2009 global financial crisis. Using consolidated data at the BHC level and 10-K filings, we investigate the determinants of bank failures during this period using nonlinear machine learning (ML). The in-sample analysis demonstrates that 90% of the failed banks can be classified during 2007–2009. In addition, our sensitivity analysis for interpretable ML shows that net tone is among the top five important features. However, the power of tone/text is less evident when we consider predictive (out-of-sample) analysis. While nonlinear ML models such as random forest and support vector regressions benefit from textual data in forming predictions, linear models that rely on actuarial data attain a similar or even better performance. Overall, our paper demonstrates that the least complex linear models use conventional financial ratios efficiently in predicting the failure of publicly traded banks, deeming more complex ML algorithms with 10-K textual data redundant. Our findings remain robust, even when incorporating large language models such as FinBERT (Huang et al. in Contemp Account Res 40(2):806–841, 2023).

Original languageEnglish
JournalComputational Economics
DOIs
StateAccepted/In press - 2025

Keywords

  • Interpretability
  • Large language models
  • Machine learning
  • Predictive analytics
  • Risk management

Fingerprint

Dive into the research topics of 'When Positive Sentiment is not so Positive: Textual Analytics and Bank Failures'. Together they form a unique fingerprint.

Cite this