Which of the following data problems is solved using stratified sampling ?

  1. Poor Quality data
  2. Less representative data
  3. Less amount of data
  4. All of the above

(2) as Stratified sampling(as explained here) is done to ensure that training and test data represents the same proportion of different cases present in real world. This is to avoid any sampling bias. Even if one collects large amount of data but the dataset doesn’t represent all the scenarios, algorithm won’t be able to to generalize well. For ex. Let us say you’re given a task to predict which political party will win the election. Now you collect the data samples by talking to various people. You collect data only from urban class but not much from rural people as they’re difficult to approach through digital mediums. This will result in sampling bias and your training data won’t represent the entire class of population who gets to vote in election.

Leave a Reply

Your email address will not be published. Required fields are marked *