Six months of data
Valerie’s company gave her access to their entire dataset from day one.
She used six consecutive months of data: the first five months to train the model and the sixth to test it.
After trying multiple times, Valerie’s model performance on the training data had a large gap with the performance on the test data. The model works well with the training dataset but struggles to keep up with the test data.
Which of the following could be valid reasons for this problem?
Valerie’s optimizer is incorrect for this particular problem.
Valerie is not using the correct loss function for this particular problem.
Valerie’s model doesn’t have enough complexity to capture the relevant signal from the data.
The training data does not come from the same distribution as the test data.