Have you developed a robust model?

The critical aspect of model development process.

Dr Amar Saxena
4 min readMar 27, 2023

You have developed a model. Many Congratulations.

This mode would have been developed on historical static data. Before you present this model to the business team, there is one critical stage still left. And this is the most critical stage of the model development process.

Will your model perform as expected when put in a live environment?

Is your answer in affirmative? If so, then why?

But, why will my model not perform in live environment?

One of the problems is over-fitting. We can always get a model that fits the training data extremely well.

These models will have excellent accuracy and separation characteristics.

But you implement it on new data — and it fails.

So, it is essential to ensure that the model developed is robust and will work well in a live environment as well. This is the process of model validation.

Which dataset to use for validation?

The sample available for model development is split randomly into 2 groups –

- Training sample, and

- Validation sample (aka testing sample).

The training sample is used for developing the model — training the model on the data. The validation sample is kept virgin — it is not touched at all during the data cleaning process. So, the validation sample is in its raw form. It is used for checking if the model developed on the training data is good or not. The performance of the model on the training sample should be replicated on the validation sample as well.

As you may have observed, validation sample has been randomly taken from the total population available for the model development process. And this is an issue. As the validation is being done in the same time period as the training data — so the characteristics of the variables are likely to be similar. What this means is that any structural change would not have an impact on the model. For example, if the growth rate of economy has increased, then the likelihood of delinquency will reduce. So, will the model developed when the delinquency rates were high still give good results?

While it is not possible to account for the structural changes in its entirety, an alternative is to use an Out-Of-Time (OOT) sample for validation. Say we have data up to Dec 2022 for developing a model. The team will take at least a month to discuss the model and possibly another month to develop the model. So new data has come in during these 2 months. This data can also be used to test whether the model performance remains stable.

So, in short, the total data available is split randomly into 2 groups — training and validation. We develop the model on the training dataset. We then test if the model performance is stable on the validation dataset. This is In-Time Validation. Besides, we also test on the out-of-time sample — the OOT validation.

What is the validation process?

There are 2 parts to the validation process –

a. Test the final model on the validation datasets (both in-time and out-of-time).

The model finalized is being validated (i.e. tested) here.

Primary check — is the model performance remaining similar to the final model?

Model performance implies metrics like accuracy statistics and separation statistics.

b. Use the variables in the final model and redevelop the model on the validation datasets.

There are quite a few aspects to check here –

i. The sign of the independent variables should be the same as in the final model.

If the sign is different in the validation exercise, as compared to the model developed — it is a BIG issue. The only alternative is to drop the variable — and use a variable collinear to it.

ii. The coefficients of the independent variables should be similar.

iii. The independent variables should remain important even on the validation datasets. There can be situation when the variables lose their importance on the validation datasets. This implies that we must improve the model robustness before finalizing the model.

iv. Is the importance of the variables remaining stable?

Check the t-stat to find out the importance of variables. Higher the t-stat, greater is the importance of that independent variable.

Generally, the importance of variables would reduce. However, the importance of variables should remain similar. At least there should not be a drastic change in the order of importance. Like, the most important variable suddenly becoming not important at all.

v. And of course,

The model performance should also remain like that found in the model development stage. The accuracy and separation characteristics should remain similar.

Summary

There are 2 validation sample — in-time and out-of-time. And there are 2 tests to be conducted on both the samples. Pictorially,

--

--

Dr Amar Saxena

Amar is a senior DS professional with over 25 yrs of experience - as a corporate leader, a consultant, a faculty and a trainer. He is a PhD from IIM-Ahmedabad.