TY - JOUR
T1 - Techniques to avoid pitfalls in empirical modeling
AU - Wojciechowski, Edward
AU - Vaccari, David A.
PY - 1999
Y1 - 1999
N2 - The development of a mathematical model that adequately captures and describes the interactions among the various system components is critical to the understanding and control of physical, chemical or biological phenomena. This often involves developing a multivariate model that will be used to forecast future events. Once the model has been proposed, it must be validated to check its adequacy in terms of its ability to forecast future events. However, such empirical models are subject to a number of pitfalls including overfitting, chance correlation, extrapolation, and lack of parsimony. In this paper, we describe the application of techniques to avoid these problems. The techniques described here are stratified data sampling, cross-validation, summed independent variables, and the use constraints to model complexity. Although most of these techniques can be applied to any type of data model (e.g. linear, polynomial, non-linear, artificial neural network, etc.), we have studied their application for polynomial autoregressive models with exogenous variables (e.g. PARX). By using these techniques we are able to validate parsimonious models with reduced risk of overfitting, extrapolation, or chance correlation. As applied to PARX models we were able to develop higher order polynomials which significantly reduce forecast errors over traditional linear, autoregressive models.
AB - The development of a mathematical model that adequately captures and describes the interactions among the various system components is critical to the understanding and control of physical, chemical or biological phenomena. This often involves developing a multivariate model that will be used to forecast future events. Once the model has been proposed, it must be validated to check its adequacy in terms of its ability to forecast future events. However, such empirical models are subject to a number of pitfalls including overfitting, chance correlation, extrapolation, and lack of parsimony. In this paper, we describe the application of techniques to avoid these problems. The techniques described here are stratified data sampling, cross-validation, summed independent variables, and the use constraints to model complexity. Although most of these techniques can be applied to any type of data model (e.g. linear, polynomial, non-linear, artificial neural network, etc.), we have studied their application for polynomial autoregressive models with exogenous variables (e.g. PARX). By using these techniques we are able to validate parsimonious models with reduced risk of overfitting, extrapolation, or chance correlation. As applied to PARX models we were able to develop higher order polynomials which significantly reduce forecast errors over traditional linear, autoregressive models.
UR - http://www.scopus.com/inward/record.url?scp=85072485917&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85072485917&partnerID=8YFLogxK
U2 - 10.4271/1999-01-2045
DO - 10.4271/1999-01-2045
M3 - Conference article
AN - SCOPUS:85072485917
SN - 0148-7191
JO - SAE Technical Papers
JF - SAE Technical Papers
T2 - 29th International Conference on Environmental Systems
Y2 - 12 July 1999 through 15 July 1999
ER -