Junior suggestions
Amaya leads the team responsible for maintaining a machine learning model that helps run their company’s marketing budget. The model has been running for a long time, and Amaya’s team makes periodic updates and improvements.
But they want more.
The team aims to find some breakthroughs to improve the model considerably. Surprisingly, the most junior person on the team was the one coming up with two different ideas:
- Take each row separately, and count the missing values across all columns. Then add a new column to the dataset with that number.
- Replace a categorical feature on their dataset with the number of times each value appears across all samples.
Which of the following would be your recommendation for Amaya regarding these two ideas?
Amaya shouldn’t consider any of these techniques because they aren’t valid forms of feature engineering.
Amaya should only consider the first technique. The second one is not a valid form of feature engineering.
Amaya should only consider the second technique. The first one is not a valid form of feature engineering
Amaya should consider both methods.