Any machine learning model has an evaluation stage. There are various metrics possible, however one must follow the below mentioned rules as one of the best strategies:
- Application level tradeoffs influence the ML level tradeoffs which in turn leads to multiple metrics. Always have one metric for optimising and for rest put some constraints.
As the above image shows, many times we end up choosing multiple metrics like accuracy and false positives OR precision and recall. Multiple metrics slows down the decision making process.
Another example is shown in the below image:
Not only tradeoffs, but many times, models built for multiple geographies end up with different performance for each region. End result is your model having multiple evaluation results and hence the dilemma of which one to choose.
Hence always choose one single metric by:
Optimising and Satisficing strategy: optimise just one metric but put constraints on rest of the metrics. For eg, improve accuracy by limiting latency of prediction time to 1000ms OR restrict the number of false positives to 5 per day and optimising the accuracy!
One needs to incorporate application level restrictions also in the model or model metric. Application like YouTube should restrict adult content for kids channel or videos. Metric should penalise examples from one class more than the other to accomplish this. Please read here for a detailed explanation via blog.