How does bias and variance error gets introduced ?

Any supervised learning model is the result of optimizing the errors due to model complexity and the training error(prediction error on examples during training).

Understanding-bias-variance tradeoff
Fig 1. Set up for a Typical Supervised Learning problem

Example:

Ridge Regression or Regularized Linear Regression cost function(with parameters \theta) is given by

    \[J(\theta) = \frac{\lambda}{2}\sum_{i=1}^{n}\theta_{i}^2 +MSE(\theta)\]

MSE(\theta) is the mean squared error of the prediction made(by the model with parameters \theta) on training samples. MSE(\theta) is given by

    \[MSE(\theta) = \sum_{i=1}^{n}(\hat{y_i} - y_i)^2\]

  where y_i is the actual label and \hat{y_i} is the prediction given by

    \[\hat{y} = \theta_1x_1+\theta_2x_2+...+\theta_nx_n\]

    \[\implies \hat{y_i} = \theta^Tx_i\]

From all the above equations,

    \[\implies J(\theta) = \frac{\lambda}{2}\sum_{i=1}^{n}\theta_{i}^2 +\sum_{i=1}^{n}( \theta^Tx_i - y_i)^2\]

 

OR

    \[J(\theta) = Model\ Complexity\ +\ Training\ Error\]

i.e. \theta values determines the model complexity and training error is determined from the prediction made on training samples.

Other models:

In a model like K-NN classification, lower the value K, more complex the model is or in a Polynomial Regression, higher the degree of polynomial, more complex the model is. These are the examples of model complexity.

Training Error:

Prediction error on training examples(called Training Error above) exists due to the way data has been collected for training. Suppose we have to build a model to predict the political party which can win the election. We collect the data such that we chose only those people who have access to internet. Thus, we introduced bias in the data by not selecting samples from entire population. Hence, in a way, training error is introduced due to data and model complexity is due to the model obviously. 

As we can see from above explanation and figure 1, there are two kinds of errors in supervised learning optimization problem. One might argue why do we need to optimize model complexity as complex model can do better prediction. A valid argument but not good for generalisation. Read here more on overfitting.

Error due to model complexity is also called the variance error. Error introduced due to some biases in the data is called bias error. Note that this may not be intentional but the way data was collected or features were extracted. Now that you’re convinced variance error is related to model complexity, it also means that reducing variance error will lead to less irregular decision boundary. 

Understanding Bias and Variance Error: 

Imagine if model building task could be repeated for different training datasets(D_1, …, D_n), i.e. we train a new model for different dataset every time(shown in figure 2). Let us fix a test data point Test_1 whose actual prediction value is P. If we evaluate the model prediction on this point only, the predictions(P_1, …, P_n) will be different(varying) due to randomness in the model generation process. That is, every time we train a model(M_i) on a different dataset(D_i), we get a different model and hence different prediction(P_i) for the same data point(Test_1). 

Intuition behind the bias and variance error
Fig 2. Understanding how bias and variance errors get introduced

Bias Error is due to the difference between mean of these predictions and the correct value.

    \[Bias\ Error\ \propto \ \hat{P} - P\]

 

Variance Error is nothing but the variance in these predictions, i.e. how varied are these predictions.

    \[Variance\ Error\ \propto \ \frac{1}{n}\sum_{i=1}^{n}(\hat{P} - P_i)^2\]

 

where

    \[\hat{P} = \frac{1}{n}\sum_{i=1}^{n}P_i\]

This is the intuition behind bias and variance error. Till now, we have shown how does bias and variance error gets introduced. Next we need to understand why is there a tradeoff between bias and variance error?

Understanding the bias variance tradeoff:

Figure 3 shows the plot of predictions P_1, … ,P_n(dark blue dots) against the correct value,P(dark red), at the centre. Note that the predictions P_1, …, P_n might be concentrated(Low Variance Error) but may not be close to the correct value(High Bias Error). Also they may be close to the correct value(Low Bias Error) but may not be concentrated(High Variance Error) as shown in the figure. This should be clear from the above equations of bias error and variance error. For bias error to be low, all the predictions or the mean of predictions should be close to the actual one(near the centre). For variance error to be low, all the predictions should lie close to the mean of predictions. In other words, variance error will be low if all the predictions lie close enough to each other. 

Intuition behind the bias-variance tradeoff
Fig 3. Predictions by each model with actual value at the centre

Image Source: Understanding the Bias Variance tradeoff 

So ideally we want low bias and low variance, i.e. not only the predictions should be close to the centre but also should very close to each other(concentrated). In reality low bias and low variance is possible only if there is an infinite amount of data for training which is impractical. Hence both low bias and low variance are difficult to achieve and there is a trade off. This is called bias-variance tradeoff. 

Note that there are other explanation to bias-variance tradeoff and error. This is just another way to look at bias-variance error and not the only way.

1 Reply to “How does bias and variance error gets introduced ?”

Leave a Reply

Your email address will not be published. Required fields are marked *