Blog
Feature Engineering (Outliers)

An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism [D. Hawkins. Identification of outliers, Chapman and Hall]
Ways to handle outliers
- Trimming: Removing outliers from the dataset
- Missing Data: Treat outliers as missing data and perform missing data imputation
- Discretisation: Putting outliers into upper or lower bins
- Censoring: Capping, Top/Bottom coding , winsorization
Trimming or Truncation
Trimming, also known as truncation, involves removing the outliers from the dataset. We only need to decide on a metric to determine outliers.
Censoring or Capping
Censoring or capping, means capping the maximum and/or minimum of a distribution at an arbitrary value. In other words, values bigger or smaller than the arbitrarily determined ones are censored.