Just randomly sampling the test set can introduce sampling bias as the test set might not be representative of entire population. For ex. taking the example of predicting the winning party in an election, suppose there are 30% rural class voters and 70% are from urban class. If we sample test set uniformly, these proportions will be 50% for each class but this is not representative of entire population. To avoid this one must use stratified sampling as explained here.
sklearn has a function for stratified sampling called StratifiedShuffleSplitimported in this way in python “from sklearn.model_selection import StratifiedShuffleSplit”. For more on its usage and parameters visit here.