Baijayanta Roy
1 min readOct 8, 2019

--

Hi Vikrant Arora, Thanks for your queries and comments. I don’t recommend you to drop any variable after one-hot-coding is done, using feature engineering. It is rightly mentioned in some of the stack overflow comments, this will introduce bias. Intention of having one less variable is to have one less variable to deal with if computation is one major challenge for training as N-1 variables able to encapsulate all categorical variables uniquely. One different approach could be to combine categories below a threshold value under a separate category say “other” so that information is not lost (this is also a preferred option where test data could contain additional categories which were not present in training data, thus act as a placeholder for any unseen categories). None of these encoding method is panacea for all kind of problem. One has to try different methods to check which one fits best for the problem in hand. There are few benchmark test results available which can be easily found through internet search but those results are also subjective to specific dataset.

--

--

Baijayanta Roy
Baijayanta Roy

Written by Baijayanta Roy

Writes about Life, Natural and Artificial Intelligence

No responses yet