Uncategorized

Feature Engineering (Outliers)

Feature Engineering (Outliers)

An outlier is an observation which deviates so much from the other observations as to arouse suspicions that it was generated by a different mechanism [D. Hawkins. Identification of outliers, Chapman and Hall]

Ways to handle outliers

  • Trimming: Removing outliers from the dataset
  • Missing Data: Treat outliers as missing data and perform missing data imputation
  • Discretisation: Putting outliers into upper or lower bins
  • Censoring: Capping, Top/Bottom coding , winsorization

Trimming or Truncation

Trimming, also known as truncation, involves removing the outliers from the dataset. We only need to decide on a metric to determine outliers.

Censoring or Capping

Censoring or capping, means capping the maximum and/or minimum of a distribution at an arbitrary value. In other words, values bigger or smaller than the arbitrarily determined ones are censored.