Just randomly sampling the test set can introduce sampling bias as the test set might not be representative of entire population. For ex. taking the example of predicting the winning party in an election, suppose there are 30% rural class voters and 70% are from urban class. If we sample test set uniformly, these proportions…

# Tag: train test split

## Which of the following data problems is solved using stratified sampling ?

Poor Quality data Less representative data Less amount of data All of the above (2) as Stratified sampling(as explained here) is done to ensure that training and test data represents the same proportion of different cases present in real world. This is to avoid any sampling bias. Even if one collects large amount of data…

## What is stratified sampling and why is it important ?

Stratified sampling is a sampling method where population is divided into homogenous subgroups called strata and the right number of instances are sampled from each stratum. For further explanation visit here. This sampling is important to ensure that sampled dataset is representative of the entire population. To realise this point, consider an example of predicting…