

Not only do you damage business objectives, but your stakeholders will lose confidence in value data science may offer.įrom my experience, there are two common reasons why overfitting occurs. When overfitting causes your model to suggest misleading relations in the data, and you find out only after deployment, money blows out the window. In extreme cases, these models might even perform worse than not using a model at all… Predictive modeling can be an amazing tool, but if you point the gun down, it is eminently possible to shoot yourself in the foot. Consequently, the final model winds up performing worse than you expected based on the “fit” to the training data.

Overfitting of models occurs when idiosyncrasies in the training data become part of the model you use in production. Not in the least because so many colleagues (me too) have been stung by it! One problem that is fairly well understood, though, is “overfitting” of models. In the old days, “data mining” used to have a bad reputation because “if you torture data for long enough, they will confess to anything.” Although it is fairly easy to lie with statistics, I would like to point out that it is much easier to lie without them! We have come a long way in data science, and yet there is still lots and lots of ground to cover.
