Rolling origin forecast for New values - arima

I've been hearing about rolling forecast origin but from what I 've Seen si far, it IS mostly used for cross validation operations. I haven't Seen something interesting yet about forecasting New values using it. I wanted to know if it IS possible to forecast New values using rolling forecast origin using ARIMA. My programming language is python

Related

CMIP6 future climate monthly timeseries

I am very new to working with future climate datasets. I would like to know if future climate data (total precipitation and min-max temperature) from CMIP6 are available as monthly timeseries. For example, I would like total precipitation and temperature data for each month of the years between 2022 and 2040. Popular sites such as Chelsa and Worldclim host downscaled future data as monthly averages for select intervals of time (e.g. 12 files representing each of the 12 months between 2022 - 2040). Presumably these were created by averaging data available for individual months? Are there other options to access downscaled future climate timeseries data? Any help would be greatly appreciated.
Future climate projections sources
The goto solution for accessing climate projections data is the ESGF network, for instance you will be able to find and download CMIP6 projections from this page (and CMIP5 here, Cordex here, etc.).
Alternatively, Copernicus Climate Datastore may be more user friendly but is less exhaustive. See this page for CMIP6.
In both cases the data are available as NetCDF, using them may require some effort if you are not familiar with this format.
How (not) to use these data
The use and interpretation of these data is not straightforward. In particular, you should definitely not download one dataset, select a year and month and consider the result as a forecast for that period.
Among other issues:
Projections, ie: a simulation of a possible state of the atmosphere at a given point in the future, are not previsions
They are plenty of climate models and they can yield different results even for similar emissions pathways, it is risky to pick just one
Climate projections are generally biased and will need to be adjusted before use
You can find in-depth tutorials on how to use climate projections for local impact assessment using Python here. Copernicus also offers free training and online courses.
I'm not sure that SO is the correct platform for this question, but for what it is worth, the copernicus climate data store has a lot of the core CMIP5 and CMIP6 output at monthly or daily temporal resolutions.
Here is the direct link to CMIP6
The nice thing about this site is that if you are already using the API to access other data such as ERA5, it is very quick and easy to grab CMIP data too - it downloads as a zip file of netcdf files.

Alternatives to validate Multi Linear regression time series

I am using multi linear regression to do sales quantity forecasting in retail. Due to practical issues, I cannot use use ARIMA or Neural Networks.
I split the historical data into train and validation sets. Using a walk forward validation method would be computationally quite expensive at this point. I have to take x number of weeks preceding current date as my validation set. The time series prior to x is my training set. The problem I am noting with this method is that accuracy is far higher during the validation period as compared to the future prediction. That is, the further we move from the end of the training period, the less accurate the prediction / forecast. How best can I control this problem?
Perhaps a smaller validation period, will allow the training period to come closer to the current date and hence provide a more accurate forecast; but this hurts the value of validation.
Another thought is to cheat and give both the training and validation historical data during training. As I am not using neural nets, the selected algo should not be over-fitted. Please correct me if this assumption is not right.
Any other thoughts or solution would be most welcome.
Thanks
Regards,
Adeel
If you're not using ARIMA or DNN, how about using rolling windows of regressions to train and test the historical data?

Historical data on Google Maps Distance Matrix API

Can i get travel time data from google maps distance matrix API on the past dates, such as November 2017, using driver modes ? And if it possible, how can i do it ?
Thank you so much
I've been trying to but it looks like there is a time threshold that after you just get Zero_Results response.
I tried some dates in 2017 and even 2 weeks back with no luck.
you can try yourself finding the date that you want in epoch time [1] and and fill this request with the time and your API key.
https://maps.googleapis.com/maps/api/distancematrix/json?origins=San%20Francisco,%20CA&destinations=San%20Jose,%20CA&mode=transit&departure_time=(epoch_time)&key=(your_api_key)
[1]https://www.epochconverter.com/
I am testing and I believe the limit for historical data is one week only for transit mode.

Cache Geospatial calculations or calculate on the fly?

I'm a developer on a service vehicle dispatching web app. It's written in .Net 4+, MVC4, using SQL server.
There are 2000+ locations stored in the database as geography data-types. Assuming we send resources from location A to location B, the drive time / distance etc... needs to be displayed at one point. If I calculate the distance with SQL Server's STDistance it will only give me the "As the crow flies" distance. So the system will need to hit a geo spatial service like bing, Google, or ESRI and get the actual drive time or suggested routes. the problem is this is a core function and will happen ALOT.
Should I pre-populate a lookup table with pre-calculated distances or average drive times? The down side is even without adding more locations that's 4Million records to search every time the information is needed.
On top of this, most times the destination is not one of our stored geospatial coordinates and can instead be an address or long/lat point anywhere on the continent which makes pre-calculating impossible.
I'm trying to avoid performance issues having to hit some geoservies endpoint constantly.
Any suggestions on how best to approach this?
-thanks!
Having looked at these problems before, you are unlikely to be able to store them all.
it is usually against almost all of the routing providers TOS for you to cache the results. You can sometimes negotiate this ability but it costs alot.
Given that there is not a fixed set of points you are searching against, doing one calculation gives you little information for the next calculation.
I would say maybe you can store the route for pair once they have been selected so you can show that route again if needed. Once the transaction is done I would remove the route from your DB.
If you really want to cache all this or have more control over it you can use PGRouting (with Postgresql) and then obtain street data. though I doubt it is worth your effort.

Weather prediction algorithm variety

Currently there's a big 'storm' over the predictions by the MetOffice in the UK. They predicted a mild, wet winter, while we have the coldest temperature on record in Northern Ireland and solid snow on the ground, normally rare in December.
It's something I'd love to have a play with, not that I'm claiming I can beat them, but was wondering what algorithms are out there currently that people are working with? What datasets do they base it on?
Possibilities presumably include neural networks modelling input with fitness being the accuracy of the prediction, complex mathematical models, or even the 'same as yesterday' prediction which I've heard claim (although not seen evidence) that it's more reliable for single-day prediction (although obviously drops off after that).
Ideally like to hear from some developers in weather centres or who get access to the supercomputers, it'd be interesting to hear approaches...
In short, if you intend to build and run your own forecasting model, you will face three major problems:
Access to observations
Development of a mathematical model
Computational power to run your model
Access to observation
As far as I know, access to good meteorological observations costs a lot of money.
You need to have observations from all over the globe and model the state of oceans and atmosphere for the whole planet. Alternatively, you need to obtain so-called lateral boundary conditions from someone who calculates a global model.
Development of a mathematical model
I'm not and I've never been affiliated with Met Office, but I used to port and optimize a version of their Unified Model to a supercomputer at our center a couple of years ago. Here's how I remember the model.
Met Office has been developing their Unified Model for the last 20+ years, we're talking about millions of lines of code that contain state of the art ocean/atmospheric models and numerical algorithms. Check out this section of (outdated) User Guide for a glimpse of scientific methods used in their model. It's a fruit of, give or take, half a century of well-funded, extensive research by a large community of smart people. If there was a simple solution that would consistently give better results than the complex models, someone would've probably implemented it by now.
To conclude, I guess it's very hard to get even remotely satisfactory results in weather forecasting by building a model from scratch, unless you're a MSc/PhD in atmospheric physics and you've got a couple of years of free time on your hands.
Computational power to run your model
The first forecasting models were run in the middle of 20th century on machines that cannot match with today's cellphones, so, technically, you could calculate something on your PC. However, this type of job is often done on very, very powerful machines. In fact, 10 systems in the Top500 are dedicated solely to weather forecasting and climate research.
Interesting reads
http://en.wikipedia.org/wiki/Weather_forecasting#How_models_create_forecasts
http://en.wikipedia.org/wiki/Numerical_weather_prediction
http://research.metoffice.gov.uk/research/nwp/numerical/operational/index.html
http://ncas-cms.nerc.ac.uk/html_umdocs/UM55_User_Guide/
UPDATE It's possible to obtain the source code of the WRF model for free, together with some met data. Note that WRF, Unified Model, COAMPS, and many other models are written primarily in Fortran.
First off, you can import raw data from http://tgftp.nws.noaa.gov and other weather data. The best way for the computer to understand the data is putting it on a map. Each point on the map reacts with each other. Data at each point can represent Temp, Pressure, Wind and Direction, Cloud Coverage, Where sun is in the sky, Visibility, last 100hrs of precipitation. You could make predictions, then compare them later to the actual predictions as well as the Weather Service's predictions. Then update a climate model for that data point. That way, it could be a self learning neural network. As far as computation power is concerned, Get a Titan, Big Mac!
It seems to be possible to construct simple forecast model. My watch features a barometer and a thermometer (which is not usable at all, because the watch is warmed by the hand). Solely on those measurements, it has several times warned me of incoming rain, in spite of sunny forecasts from internet sites. (the cloud picture at upper left corner)
A quick search leads us to the Sager Algorithm, which uses only very simple input data. However, while the implementation claims to be open-source, I have failed to locate both the code and scientific papers on the algorithm.

Resources