Whether to reduce bias error or variance error ?

Supervised learning model’s error can be decomposed in form of bias error and variance error. Read here to get the best intuition about bias and variance error.  When a model doesn’t perform well either on training data or testing data, either of bias error or variance error or both error might be the issue. Therefore…

How to evaluate word vectors ?

Word vectors whether derived from word2vec or glove or by using co-occurrence statistics, they need to be evaluated for performance reasons. This can be done in 2 major ways as mentioned below: Intrinsic ways are used when word vectors are build or evaluated for a specific or an intermediate subtask. Such evaluations are fast to compute…

How do you detect sarcasm?

Sarcasm detection is an unsolved problem. Some things that make sarcasm detection hard are:  Sarcasm can be identified from the tone of the voice but not the text alone. This is often missing in traditional textual NLP attempts. Sarcasm often requires context to detect. For instance, the sentence ‘I love being rich’, is not necessarily…

What is the most efficient way of serialising the machine learning models?

There are three ways of serialising machine learning models in Python. These are JSON, Pickle and Joblib. However, Joblib is the most efficient way of serialising the machine learning models because it is stores large multi-dimensional numpy arrays efficiently. Scikit-learn estimators represent model parameters in the form of numpy arrays. Hence it makes sense to use joblib for…

How do you serialise and deserialise machine learning model after training?

It is important to serialise the models for its later use for prediction on unseen data. Think of serialisation just as storing the machine learning model in form of a file. That file should have model itself, preprocessing object scikit-learn(or any other) version, and testing accuracy if possible. Instead of maintaining model and preprocessing object…

How do you deploy machine learning models in production?

Machine learning models can be deployed on production in the following way and in order: HTTP endpoint used by the application like an app or a web UI to get the prediction.  Web server running web applications behind the http endpoint with one of the following options: Web hosting frameworks   Flask Django Serverless compute AWS…

Why does ensemble methods have better chances of giving a better model than an individual model ?

More often, ensemble methods are a fusion of weaker models like decision trees,  with low depth or subset of features used to split. Following are the main reasons why ensemble methods work better than individual models Each individual model works on some aspect(some features) of the dataset. Hence ensemble methods is a mixture of many…