This is a very common problem in almost any Machine Learning problem. There is no one fixed solution to this but heuristics depending upon the problem and the data. There are two types of outliers – univariate and multivariate. Univariate outliers exist when one of the feature value is deviating from other data points on the same feature value. When data has less number of dimensions, one can find univariate outliers by plotting the data and detecting the outliers if they lie far apart from most of the data. One such visualization is box plot where outliers will be visible in dots or points and majority of the data will be inside the box. Multivariate outliers can be found out by looking at n-dimensional feature set which is difficult for humans. Though bivariate outliers can be detected using scatter plots. Automated methods to detect outliers include Z-score, Probabilistic Modeling, Clustering, Linear Regression models etc.
The most simplest method is Z-score which indicates how many standard deviations far is the data point from the mean assuming gaussian distribution. Z-score is useful for parametric distributions in low dimensional space.
DBSCAN is a density based clustering method useful for outlier detection. Points which do not get assigned to any cluster or form their own clusters are labelled outliers.