What is the best strategy for choosing evaluation metric ?

Any machine learning model has an evaluation stage. There are various metrics possible, however one must follow the below mentioned rules as one of the best strategies:

  1. Application level tradeoffs influence the ML level tradeoffs which in turn leads to multiple metrics. Always have one metric for optimising and for rest put some constraints.
Complex machine learning application forces you to choose multiple metrics for your machine learning model
Many applications force you to choose multiple metrics

As the above image shows, many times we end up choosing multiple metrics like accuracy and false positives OR precision and recall. Multiple metrics slows down the decision making process.

Another example is shown in the below image:

Having single metric allows you to quickly select a machine learning model and iterate faster
Model with region wise performance poses a dilemma of which model to choose

Not only tradeoffs, but many times, models built for multiple geographies end up with different performance for each region. End result is your model having multiple evaluation results and hence the dilemma of which one to choose.

Hence always choose one single metric by:

Optimising and Satisficing strategy: optimise just one metric but put constraints on rest of the metrics. For eg, improve accuracy by limiting latency of prediction time to 1000ms OR restrict the number of false positives to 5 per day and optimising the accuracy!

always have single evaluation metric for a machine learning model
Having single metric allows you to select a model and iterate faster

One needs to incorporate application level restrictions also in the model or model metric. Application like YouTube should restrict adult content for kids channel or videos. Metric should penalise examples from one class more than the other to accomplish this. Please read here for a detailed explanation via blog.

Leave a Reply

Your email address will not be published. Required fields are marked *