Every supervised learning problem encounters either bias or variance error. Please refer to this page if you want to get more intuition about bias and variance error as it will help in understanding this post. Once you know where(bias or variance) your model is doing wrong, it becomes easier to get the next direction. This part of decision making comes under error analysis of the machine learning model. Error analysis involves either manual or automated analysis to find the direction that leads to maximum improvement. There are two major parts in error analysis that we will cover here:
- Find out whether to reduce bias or variance error
- Compare training and testing error(or validation error) with the Bayes error
- Find out reasons for bad performance from random mislabeled samples
- Manually analyse some 100 random mislabeled samples
- List down the issues responsible for mislabeling
- Prioritise issues based on the proportion of samples contributing to various issues.
Continue reading for complete explanation of the above points.
Reduce bias or variance error
When a model doesn’t perform well either on training data or testing data, either of bias error or variance error or both might be the issue.
Since Bayes error is the lowest possible error, we will use it to deduce what to reduce first because any model’s training or testing error will always be greater than or equal to Bayes error. Bayes error can be best approximated by human-level performance or human error. Read this post for more explanation of Bayes error and its best approximation.
Suppose for any supervised learning task, we have the following errors:
Training error – 10%
Testing error – 12%
Bayes error which is the lowest possible error- 5%
Here, gap between training and Bayes error(also called Avoidable Bias) is 5%. But the gap between testing error and Bayes error(caused by variance error) is 7%. So there are more chances of reducing testing error(7% > 5%) as compared to the training error. Hence in this case we should reduce variance error as variance error is the reason for more testing error than training error. This can be done by collecting more training data or some regularisation techniques.
Training error – 12%
Testing error – 10%
Bayes error – 5%
Now the gap between training error and Bayes error is more than the gap between testing error and Bayes error, hence we will choose to reduce bias error. This is accomplished by making models more complex by either
- Adding more layers in the neural network or
- Increasing the degree in polynomial regression or
- Decreasing the value of k in k-NN algorithm
Here is a more structured way of performing error analysis. Assume dev error playing the role of testing error discussed above.
Follow up question is what is bayes error!
Manual Error Analysis
Consider a sample application like building a model for classifying into two classes. If the model doesn’t perform well, in other words, error metric like accuracy is bad, one of the possible solutions is to collect more data. However, collecting more data might take several months that delays the delivery of the project. This is not at all efficient but instead the following approach of manual error analysis should be implemented.
- Select 100-200 mislabeled samples from the dev set. Read here to find out what is dev set. Note that dev set is also called as validation set.
- Do manual error analysis by virtue of the table shown below. This table analysis is more structured and efficient as it helps in determining what kind of data to collect for improving model performance.
Below table is constructed from 100-200 mislabeled samples from the dev set. Analysis is performed to find out various issues in the mislabeled samples. Instead of collecting data blindly that might be a costly and time consuming operation, one can get the right and most impactful direction from such analysis.
As shown in the above structured error analysis, incorrect labels contribute to 6% of 100 randomly selected images from the mislabeled ones. Note the difference between mislabeled and incorrect labels. Mislabeled term is used when the model predicts incorrectly and incorrect is used when the actual data has wrong labels whether in training set or dev set.
Now given the above situation depicted in the table, there are two possibilities:
Overall dev set error ===============10%
Errors due to incorrect labels ========0.6% ( 6% of 10% = 0.6%)
Errors due to other causes ==========9.4%
===> fix other causes first
Overall dev set error ===================2.0%
Say errors due to incorrect labels is still =====0.6% ( 30% of 2% = 0.6% )
Errors due to other causes ===============1.4%
===> fix incorrect label issue first as 30% of mislabeled samples are due to incorrect labels issue.
Therefore one should not blindly collect more data in order to improve model performance. But perform the error analysis as described above by making a table using sample examples from the mislabeled dev set.
Note that there will be multiple ideas that can lead to improved model. One can execute multiple ideas or strategies in parallel. For example, data collection for clear images because of blurred issue and data collection for single object instead of multiple objects in the image can happen simultaneously.