How to handle incorrectly labeled samples in the training or dev set ?

While doing error analysis, it might be revealed that your dataset has incorrectly labelled samples. These incorrectly labelled samples can be present in training set, dev set or test set. Note that dev set is also called as validation set.

  1. Incorrect labels in training set: There are two possibilities when the incorrect labels exist in the training set:
    1. Labels are incorrect because of some random error. Deep learning algorithms are quite robust to random errors in the training set. In this case, one need not do much and it is okay to leave those samples as it is. Hence, fixing of those incorrect labels is not required unless the proportion of such mistakes is not high.
    2. Labels are incorrect due to some systematic error. When errors are present such that they are forming some kind of pattern in the data, the algorithm may learn from those systematic errors. This learning is incorrect learning for obvious reasons and will lead to bad performance on dev set and test set. Hence, systematic errors should be dealt by either removing or correcting such samples.
  2. Incorrect labels in the dev set: This is quite hard to figure out if the performance in the dev set is due to incorrect labels in the dev set. Hence, one should perform manual error analysis of few selected samples as explained again below.
error analysis for machine learning model
Structured and efficient error analysis for better model performance

As shown in the above structured error analysis, incorrect labels contribute to 6%  of 100 randomly selected images from the mislabeled ones. Note the difference between mislabeled and incorrect labels. Mislabeled term is used when the model predicts incorrectly and incorrect is used when the actual data has wrong labels whether in training set or dev set.

Now given the above situation depicted in the table, there are two possibilities:

  1. Total dev error is 10% and error due to incorrect labels is 6% of 10% which is 0.6% of the entire mislabeled dev set samples. In this case, 9.4% out of 10% is contributed by other causes which is much more than 0.6%. Hence one should focus on other issues and leave incorrect labels as is until further improvements.
  2. Total dev error is 2% and error due to incorrect labels is 6%. This leads to 30% contribution from incorrect labels compared to 70% contribution by other causes. In this case, it is worthwhile to spend effort in correcting the labels in the dev set. It is important to correct labels in dev set as the purpose of dev set is to help in selecting the best model from various options. If incorrect labels are not dealt with in this scenario, one might end up with a sub-optimised classifier.

Leave a Reply

Your email address will not be published. Required fields are marked *