Ockham’s Razor: A Good Shave (?) to the Regression Analysis

Occam's (or Ockham's) razor is an idea attributed to William of Ockham, a 14th century logician. The idea suggests that explanatory entities should not be multiplied beyond necessity. In statistical context, when you have two competing models that fit the data equally well, Occam’s razor recommends to ‘shave away all but what is necessary’. The concept of parsimony is based on Occam’s razor, which also proposes that the model with fewer parameters to be preferred to the one with more.

Principle of Occam’s razor finds one of its applications in regression analysis. In regression analysis, one of the most important issues is which predictor variables to include in the model. One common question arises: ‘Are these many predictor variables needed in the model or a model with fewer number of predictor variables will be as accurate?’ It should be noted that when the number of predictor variables is large relative to the total number of observations, models tend to overfit the data. An overfit model actually fits even the noise or unexplained variation in the data.

To avoid overfitting, we would therefore like to have fewer predictors in our model. But how minimal can we get? A model with too many variables will have low precision whereas a model with too few variables will be biased. A simple example of underfitting could be, summarizing an outcome measure by just specifying its mean, but this will leave all of the variance in the outcome measure unexplained, and hence the decision biased.

In model building the researcher can avoid overfitting by using the Occam razor or parsimony principle. However, this is also true that historical cases of parsimony principles in practice may have been put forward based on reasons that were theological rather than purely scientific. There is abundant evidence that elimination of variables only in the name of parsimony may not be a productive approach. It is possible for individual variables in a model to be statistically in-significant, but to be collectively important as predictors.

A good statistical model needs a careful balance between bias and accuracy and therefore, involves tradeoffs among simplicity and overfitting.

How one can get this trade off? Every research problem is different by its own nature; however, some general steps could be:

The research problem should be well defined. It is crucial to identify right outcome measure and set of potential predictor variables.
Get enough and good-quality data relevant to the research problem. Check that the data have been collected in appropriate manner.
Avoid considering unnecessary predictor variables. The maximum number of variables the data will support can be checked by using some rule of thumb. For example, a general rule of thumb is that logistic and Cox models should be used with a minimum of 10 outcome events per predictor variable. However, sometimes it can be too conservative.
Some variables can be eliminated if literature suggest those to be unimportant or if there are a large number of missing observations corresponding to the variables.
If you have two or more predictor variables which are correlated, remove one from the model because they supply redundant information.
A good way to make a choice between a too simple and a too complex model is to use information theoretic approaches such as AIC (Akaike's Information Criterion), BIC (Bayesian Information Criterion or Schwarz Information Criterion), MDL (Minimum Description Length) etc. and to select the model having the smallest value of the information criterion chosen. To make it simple, given a set of candidate models for the data, the preferred model is the one with the minimum AIC (or BIC or MDL) value.
Statistical modeling is an important technique and appropriate modeling calls for careful thought about the nature of the problem and the data. It has been aptly said by Albert Einstein that, “Things should be kept as simple as possible—but no simpler.’’

References:
- Akaike, Hirotogu, 1973. Information Theory and an Extension of the Maximum Likelihood Principle. In: B. N. PETROV and F. CSAKI, eds. Second International Symposium on Information Theory. Budapest: Akademiai Kiado: 267–281.
- Burnham Kenneth P., and David R. ANDERSON, 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach. Second ed. New York: Springer-Verlag.
- Schwarz, G., 1978. Estimating the dimension of a model. Annals of Statistics 6: 461–464.
- Vittinghoff E, McCulloch CE, 2007. Relaxing the rule of ten events per variable in logistic and Cox regression. Am. J. Epidemiol.: 165: 710–18.
-Wears, R.L., Lewis, R.J., 1996. Statistical Models and Occam's Razor. Acad. Emerg. Med.: 6: 93–94.

Leave a Reply Cancel reply