There are three parts to this answer.
- What is overfitting and underfitting
- Why do they occur
- How can you overcome both of them.
Overfitting is the result of over training the model while underfitting is the result of keeping the model too simple, both leading to high generalization error. Overtraining leads to a more complex model. Complex model leads to irregular decision boundary such that each example(some might be noise) is getting preference leading to overfitting.
Overfitting could be due to
- The noise in the data which gets prioritized while training.
- Too less data compared to the amount required for a generalizable model.
Underfitting as it appears to be the opposite of overfitting occurs due to
- Too simple model or less number of parameters.
- Overly regularization which is done to control overfitting
- Less number of features or bad features used in training
Overfitting can be overcome by
- Use a simpler model by either reducing parameters or a simpler model itself. Like using a linear model instead of a complex higher degree polynomial model. This is called regularization.
- Use more training data
- Reduce noise in the data by removing outliers or dealing with missing values.
Underfitting requires following solutions to overcome it
- Select a more powerful model by either increasing parameters in the model or making the model more complex
- Extracting better features during feature engineering
- Reducing regularization parameter as sometimes controlling overfitting might lead to underfitting in extreme cases.