What are the advantages and disadvantages of using naive bayes for spam detection?


Naive bayes is based on the conditional independence of features assumption – an assumption that is not valid in many real world scenarios. Hence it sometimes oversimplifies the problem by saying features are independant and gives sub par performance. There are chances of under-fitting due to this assumption. 


However, naive bayes is very efficient. It is a model you can train in a single iteration and hence fast to execute. It can be parallelized easily. Naive Bayes works when there is less data and lots of features, like bag of words with text data. Due to independence assumption, number of parameters are less and constant w.r.t data (unlike other algorithms like decision trees). There are less chances of overfitting.


Leave a Reply

Your email address will not be published. Required fields are marked *