How effective is early stopping when the validation accuracy varies - validation

I am building a time series model for future forecasting which consists of 2 BILSTM layers followed by a dense layer. I have a total of 120 products to forecast their values. And I have a relatively small dataset (Monthly data for a period of 2 years => maximum 24-time steps). Consequently, if I look into the overall validation accuracy, I got this:
At every epoch, I saved the model weights into memory so that I load any model anytime in the future.
When I look into the validation accuracy of different products, I got the following (This is roughly for a few products):
For this product, can I use the model saved at epoch ~90 to forecast for this model?
And the following product, can I use the saved model at epoch ~40 for forecasting?
Am I cheating? Please note that products are quite diverse and their purchasing behavior differs from one to another. To me, following this strategy, is equivalent to training 120 models (given 120 products), while at the same time, feeding more data per model as a bonus, hoping to achieve better per product. Am I making a fair assumption?
Any help is much appreciated!

Related

Huge difference between test and validation accuracy

I have a model (XGBoost) and Im using it on the stock price prediction.
My model is working well on a validation set (for example it makes a small profit in 3 months of data I used for validation) but it's not generating good predictions in the test dataset (no profit in another 3 months of data).
I'm not sure why is that and I'm looking for some possible hypotheses about why the model is not working properly so I can design experiments and test that hypothesis.
so far I have these hypotheses:
time difference makes validation dataset unrepresentative
parameters are heavily tuned on the validation set and I have a selection bias
there is a problem with codes (normalization, data leak, noise, ....)
stock price prediction is not possible
model is really working randomly and we just selected the best random model on the validation set

should I train the autoML model for a day?

I have 12,000 images spread acorss 12 categories. I uploaded and trained for 1 hour (free plan). I am not happy with the average percision of 20%
If I train the data for 6 or 12 hours, will I get better precision? If yes, will it be around 70 to 80%?
I am asking this because the training cost is very high and I am not sure if I will get good returns on the investment :)
It's mention in the Image prediction pricing that the Free plan runs for one hour and works only with 1,000 images. Meaning that your model didn't trained with your 12,000 images dataset. This might explain your low accuracy prediction.
Yes, the price per training hour is high, but you should consider paying so you will let your model training with your whole dataset.
I don't know if you'll get 70% or 80% accuracy, because the accuracy of your model generally depends on how long you allow it to train and the quality of your training dataset.
Hope this is helpful :D

Alternatives to validate Multi Linear regression time series

I am using multi linear regression to do sales quantity forecasting in retail. Due to practical issues, I cannot use use ARIMA or Neural Networks.
I split the historical data into train and validation sets. Using a walk forward validation method would be computationally quite expensive at this point. I have to take x number of weeks preceding current date as my validation set. The time series prior to x is my training set. The problem I am noting with this method is that accuracy is far higher during the validation period as compared to the future prediction. That is, the further we move from the end of the training period, the less accurate the prediction / forecast. How best can I control this problem?
Perhaps a smaller validation period, will allow the training period to come closer to the current date and hence provide a more accurate forecast; but this hurts the value of validation.
Another thought is to cheat and give both the training and validation historical data during training. As I am not using neural nets, the selected algo should not be over-fitted. Please correct me if this assumption is not right.
Any other thoughts or solution would be most welcome.
Thanks
Regards,
Adeel
If you're not using ARIMA or DNN, how about using rolling windows of regressions to train and test the historical data?

h2o autoML track convergence

The autoML stops on a clock. I compared two auto-ML's where one used a subset of what the other had to make the same predictions, and at 3600 seconds runtime the fuller model looked better. I repeated this with a 5000 second re-run, and the subset model looked better. They traded places, and that isn't supposed to happen.
I think it is convergence. Is there any way to track convergence-history of stacked ensemble learners to determine either if they are relatively stable? We have that for parallel and series CART ensembles. I don't see why a heterogeneous ensemble wouldn't do the same.
I have plenty of data, and especially with cross-validation, I would like to not think that the difference was because of the training vs. validation set random draws.
I'm running on relatively high performance hardware so I don't think it is a "too short runtime". My "all" model count is between hundreds and a thousand, for what it's worth.

Clustering+Regression-the right approach or not?

I have a task of prognosing the quickness of selling goods (for example, in one category). E.g, the client inputs the price that he wants his item to be sold and the algorithm should displays that it will be sold with the inputed price for n days. And it should have 3 intervals of quick, medium and long sell. Like in the picture:
The question: how exactly should I prepare the algorithm?
My suggestion: use clustering technics for understanding this three price ranges and then solving regression task for each cluster for predicting the number of days. Is it a right concept to do?
There are two questions here, and I think the answer to each lies in a different domain:
Given an input price, predict how long will it take to sell the item. This is a well defined prediction problem, and can be tackled using ML algorithms. e.g. use your entire dataset to train and test a regression model for prediction.
Translate the prediction into a class: quick-, medium- or slow-sell. This problem is product oriented - there doesn't seem to be any concrete data allowing you to train a classifier on this translation; and I agree with #anony-mousse that using unsupervised learning might not yield easy-to-use results.
You can either consult your users or a product manager on reasonable thresholds to use (there might be considerations here like the type of item, season etc.), or try getting some additional data in order to train a supervised classifier.
E.g. you could ask your users, post-sell, if they think the sell was quick, medium or slow. Then you'll have some data to use for thresholding or for classification.
I suggest you simply define thesholds of 10 days and 31 days. Keep it simple.
Because these are the values the users will want to understand. If you use clustering, you may end up with 0.31415 days or similar nonintuitive values that you cannot explain to the user anyway.

Resources