## Why does ensemble methods have better chances of giving a better model than an individual model ?

Analogy: To understand the reasoning, let us take an analogy of estimating a biased coin parameter. A coin has one parameter which is the probability of predicting head or tail. Suppose it is given that the coin is biased with 51% chances of coming up with heads. For first few tosses, say in order of 10s,…

## What is precision recall tradeoff ?

Tradeoff means increasing one parameter would lead to decreasing of other. Let us explain this in context to binary classification and first define what is precision and recall. Let us call one class as positive and other as negative. Then,  TP represents the true positives, which is the number of positive predictions which are actually positive….

## Why is logistic regression a linear classifier?

There are two parts to this question – Why is it a classifier when it does regression ? Logistic regression is used to estimate the probability that an instance belongs to a particular class. If the estimated probability for an instance is greater than 0.5, then the model predicts that the instance belongs to that…

## How do you detect outliers in data ? How do you deal with them ?

This is a very common problem in almost any Machine Learning problem. There is no one fixed solution to this but heuristics depending upon the problem and the data. There are two types of outliers – univariate and multivariate. Univariate outliers exist when one of the feature value is deviating from other data points on…

## How does bias and variance error gets introduced ?

Any supervised learning model is the result of optimizing the errors due to model complexity and the training error(prediction error on examples during training). Example: Ridge Regression or Regularized Linear Regression cost function(with parameters ) is given by     is the mean squared error of the prediction made(by the model with parameters ) on training…

## What is the difference between parametric and nonparametric models ?

One of the obvious answers to this question is parametric models have parameters while nonparametric models do not. But there is a precise explanation to this statement. Parametric models have predetermined number of parameters before the training starts. This in turn limits the degree of freedom for such models. Limited degree of freedom reduces the…

## Machine Learning Evaluation Metrics

The purpose of any Machine Learning algorithm is to predict right value/class for the unseen data. This is called generalization and ensuring this, in general, can be very tricky. This can depend on the algorithm being used for both supervised and unsupervised learning tasks. There are two things to consider in this process – the…

## What is overfitting and underfitting ? Why do they occur? How do you overcome them?

There are three parts to this answer. What is overfitting and underfitting Why do they occur How can you overcome both of them. Overfitting is the result of over training the model while underfitting is the result of keeping the model too simple, both leading to high generalization error. Overtraining leads to a more complex…

## How do you manage not to get overwhelmed by data?

It is important to get comfortable dealing with data as a data scientist. One might have done a PhD and have learnt many statistical techniques. HOWEVER: Given a problem,  first try to think how you can solve the problem – Data Science or no data science. Try to spend time visualizing data in  a different…

## Is the run-time of an ML algorithm important? How do I evaluate whether the run-time is OK?

Runtime considerations are often important for many applications.  Typically you should look at training time and prediction time for an ML algorithm. Some common questions to ask include: Training: Do you want to train the algorithm in a batch mode? How often do you need to train? If you need to retrain your algorithm every…