Imbalanced dataset or imbalanced class problem offers various challenges. One of the many possible ways of solving this problem is via oversampling from the minority class. However, oversampling without addressing the following issues can be dangerous: Usually, we begin with splitting entire dataset into training and testing set. Training set is further split into training…
Tag: imbalanced datasets
What are the challenges of imbalanced dataset in machine learning?
Many machine learning problems come with an issue of imbalance dataset. This could be either due to the property of the problem itself or because of the way data has been collected. For eg, applications like fraud detection has relatively less frauds compared to normal transactions. Such problems are in the first category. In other…
How do you deal with dataset imbalance in a problem like spam filtering ?
Class imbalance is a very common problem when applying ML algorithms. Spam filtering is one such application where class imbalance is apparent. There are many more non-spam emails in a typical inbox than spam emails. The following approaches can be used to address the class imbalance problem. Designing an Asymmetric cost function where the cost…