What are the drawbacks of oversampling minority class in imbalanced class problem of machine learning ?

Imbalanced dataset or imbalanced class problem offers various challenges. One of the many possible ways of solving this problem is via oversampling from the minority class. However, oversampling without addressing the following issues can be dangerous: Usually, we begin with splitting entire dataset into training and testing set. Training set is further split into training…

Error analysis in supervised machine learning

Every supervised learning problem encounters either bias or variance error. Please refer to this page if you want to get more intuition about bias and variance error as it will help in understanding this post. Once you know where(bias or variance) your model is doing wrong, it becomes easier to get the next direction. This…

How to handle incorrectly labeled samples in the training or dev set ?

While doing error analysis, it might be revealed that your dataset has incorrectly labelled samples. These incorrectly labelled samples can be present in training set, dev set or test set. Note that dev set is also called as validation set. Incorrect labels in training set: There are two possibilities when the incorrect labels exist in…

What is the best strategy for choosing evaluation metric ?

Any machine learning model has an evaluation stage. There are various metrics possible, however one must follow the below mentioned rules as one of the best strategies: Application level tradeoffs influence the ML level tradeoffs which in turn leads to multiple metrics. Always have one metric for optimising and for rest put some constraints. As…

Why is named entity recognition hard ?

Named entity recognition is the problem to find and classify a name in text. Consider a sentence “State Bank of India provides good interest rates for the National Public School” Hard to work out boundaries of entity. For example, we don’t know which among “State Bank of India” or “State Bank” is the entity. Hard…

What is cross entropy loss in deep learning?

Cross Entropy loss, serving as a loss function, is heavily used in deep learning models. This is derived from information theory. To explain the cross entropy, let true probability distribution be p  computed model probability be q  Then cross entropy loss or error is given by H(p,q) as: Cross entropy measures how is predicted probability…