Why does ensemble methods have better chances of giving a better model than an individual model ?

Analogy: To understand the reasoning, let us take an analogy of estimating a biased coin parameter.

  • A coin has one parameter which is the probability of predicting head or tail. Suppose it is given that the coin is biased with 51% chances of coming up with heads.
    • For first few tosses, say in order of 10s, we may not see this probability coming, i.e. out of 100 tosses, number of heads might be way off from 51.
    • Now if we toss the coin 1000 times, we will be much closer to 510 heads and 490 tails leading to majority of heads(51 against 49). Probability of obtaining a majority of heads( # of heads > # of tails) after 1000 tosses is around 75%.  And this probability increases to around 97% with 10,000 tosses.
    • The theory behind the above phenomena is called law of large numbers. If we use ensembles (say 1000 tosses) instead of one big set of tosses like 10,000 or 100,000, we can still get 75% probability of getting more heads than tails using voting mechanism.
    • From above, each individual set of tosses is like a weak learning model trained on a small to medium size training set. Ensemble method can aggregate(using majority vote) predictions from all the weak models. This can increase the chances of getting more heads than tails to 75% from 51% of one weak model!
    • Thus we showed how ensemble method can give better results than an individual model.

More deeper explanation: 

  • Each individual model works on some aspect of the problem. Most likely each separate model will work on some independent aspect of the problem. When combined all these individual models prediction, the ensemble method is actually working on more(than individual model) aspects of the problem either due to weightage to more features or due to more training examples.
  • Each individual model would perform suboptimal as it may not target the problem from all different angles. However, an ensemble of all weak individual models combined can target the problem from more angles than just few like any individual weak model. Read here for more understanding.

Leave a Reply

Your email address will not be published. Required fields are marked *