What would you care more about – precision or recall for spam filtering problem?

To understand precision and recall, it is important to know about

  1. False positives(FP) : mail was NOT a SPAM but it WAS LABELLED as spam
  2. False negatives(FN): mail WAS a SPAM but was NOT LABELLED as spam
  3. True positives(TP): mail WAS a SPAM and also LABELLED as spam
  4. True negatives(TN): mail was NOT a SPAM and also LABELLED as NOT a SPAM
  • Precision is defined as (TP / TP + FP)  and Recall = (TP / (TP + FN)).
  • Increasing precision involves decreasing FP and increasing recall means decreasing FN. This often leads to precision-recall tradeoff
  • Ideally, users don’t want to miss the important mails, hence decreasing FP is priority and thus, care more for precision.

Here are more evaluation metrics used in machine learning, including classification tasks such as spam filtering.

Leave a Reply

Your email address will not be published. Required fields are marked *