After looking at the data she got from her client, Bailey knew she had a lot of work in front of her.
Many features were categorical, and some had a very high cardinality. Baily planned to use a neural network to process everything, so she had to encode every column.
Which of the following encoding techniques could Bailey use to replace the categorical features of her dataset?
Label encoding
Ordinal encoding
One-hot encoding
Target encoding
1, 2, 3, 4
Every option is a valid encoding technique.
Label encoding replaces each category with a consecutive number starting from 0. We can use Label encoding for nominal variables, where order doesn’t matter. For example, “cloudy,” “rainy,” and “sunny.”
On the other hand, Ordinal encoding works similarly to Label encoding, but we use it when the order matters. For example, “first,” “second,” and “third.”
One-Hot encoding creates a new feature for each unique value of the original categorical variable. For example, a “weather” feature with three values will get us three new features, one for each value of the original “weather” column.
Finally, Target encoding helps process categorical features with high cardinality. If we use one-hot encoding on a column with too many different values, we will end up with a sparse representation that will be cumbersome. Instead, target encoding replaces the categories of a column with the average target value of all data points belonging to that category.
Recommended reading