VAR model in satatsmodels (autoregression) - statsmodels

What is the exog_future in VARResults.forecast(y, steps, exog_future=None)?
I have my exog and endog, but in forcasting function, I dont know what to put in exog_future?
I am sure is not None because I have an exog set!
Any help please

You need to provide values for the exog during the forecast period. If you cannot provide it, then you could include it in the VAR or use a separate forecast for it.
exog could be a deterministic variable like trend or seasonal dummies that we can define based on the time periods or time index.
For some variables we might have a better external estimate, for example if one exog is temperature and we have a weather forecast for the forecast period.
In scenario analysis and decision variables, we might want to forecast for different given paths of the explanatory variable, for example what is the forecast if we lower the price by 10 % compared to historical average or trend.

Related

Which machine learning algorithm I have to use for sequence prediction?

I have a dataset like below. I have datetime column as index, type is a column with sequence. For ex; R,C,D,D,D,R,R is a sequence.
start_time type
2019-12-14 09:00:00 RCDDDRR
2019-12-14 10:00:00 CCRD
2019-12-14 11:00:00 DDRRCC
2019-12-14 12:00:00 ?
I want to predict what would be the next sequence at time 12:00:00? which is the best algorithm to predict the next sequence?
I know that we can use Markov chain to predict the probable sequence. However, are there any other better algorithms?
Thanks
you can use from knn,svm for prediction.but the first of all you have to change database and define feature for training dataset for example
you can use from another method base on deep learning , I think this link can help you
https://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/
LSTMs have an edge over conventional feed-forward neural networks and RNN in many ways. This is because of their property of selectively remembering patterns for long durations of time.
LSTMs on the other hand, make small modifications to the information by multiplications and additions. With LSTMs, the information flows through a mechanism known as cell states. This way, LSTMs can selectively remember or forget things. The information at a particular cell state has three different dependencies.
Let’s take the example of predicting stock prices for a particular stock. The stock price of today will depend upon:
The trend that the stock has been following in the previous days, maybe a downtrend or an uptrend.
The price of the stock on the previous day, because many traders compare the stock’s previous day price before buying it.
The factors that can affect the price of the stock for today. This can be a new company policy that is being criticized widely, or a drop in the company’s profit, or maybe an unexpected change in the senior leadership of the company.
These dependencies can be generalized to any problem as:
The previous cell state (i.e., the information that was present in the memory after the previous time step).
The previous hidden state (this is the same as the output of the previous cell).
The input at the current time step (i.e., the new information that is being fed in at that moment).
Maybe this link and method could help you
https://www.bioinf.jku.at/publications/older/2604.pdf
https://www.analyticsvidhya.com/blog/2017/12/fundamentals-of-deep-learning-introduction-to-lstm/

Design an algorithm to match trajectory?

I have a dataset in the form of (timestamp,latitude,longitude). I'm going to be given n entries where each entry is of the form of (timestamp,latitude,longitude). This is for one user.
User1:(timestamp1,latitude1,longitude1)....(timestamp_n,latitude_n,longitude_n)
Now let's say we have 100 users each has a set of points of (timestamp,latitude,longitude)
I want to know which set of users might have matching trajectory.
Matching trajectory would be the same route taken, hence in a given set of timestamps the latitude and longitude should be same or close enough as well as the timestamp should be same or close enough. Close enough can be about 30 seconds for timestamp while for space let it be like 200 metres. I can do this via a brute force approach and I'm looking for better solutions.
You can use a k-dtree or a range tree to index your data. These will let you efficient perform a range query over all three dimensions to your data.
This is quite unrelated to whether the algorithm will still be bruteforce or not.
What I want to present here is how to measure the difference between 2 paths.
It just that I think defining precisely how to quantify the difference will be important.
If you want something faster, then you can probably approximate this quantity later.
Ok, I think the difference between 2 paths is:
The average distance between 2 users over time.
You should be able to interpolate between 2 given data points to find out where the user is at any given time. Just linear interpolation might suffice.
When I say average over time, one would discretize the time so it is easier to compute.
Let's say:
The average distance between 2 users every 10 seconds period.
Edit: The above suggestion assumed that you care about "timing".
Since you mention the timestamp and all.
If you didn't care about it, you shouldn't have put it into the question in the first place.
Anyway, I kind of imagine that it is possible you want to just look at the path itself.
In that case, you could still use the above definition of path difference
simply by ignoring the actual timestamp and imagine that the users start at the same time at the begining of the paths.
The travel speed can be set in various ways... such as making both users complete the path at the same time no matter if one path is longer than another, or maybe just let both travel at the same speed.
Anyway, it all comes down to defining how do you want to measure the path difference.
You need to give more details in the question.

How can I find nearest point in a time series data

I need to calculate the nearest dataPoint in a time series chart from a specific point in a chart
I obviously cannot use d=sqrt(x*x+y*y) as my x axis is in time series, hence it wont make sense to have an equation where I am adding distance and time together (x,y need to have same units). Moreover visually it may seem right, but it still depends upon the scale of the x axis.
So what best logic can I use to find the nearest point?
I can think of using a quadratic form of x (i.e. time) so as that my final function can ne f(x*x,y), but then it is just a subjective equation.
Does anyone have a better and more logical approach to this. If there is an intuitive logical approach I will love it. And if there is a complicated model I would still like to know about it and explore it.
Thanks
EDIT
TO give background: I am polling people to predict where the stock price will be in April(they have to mention exact date when the expect price to be there) ... How do I measure their performance?
One intuitive way is by calculating the average absolute change per day.
i.e.
Sum of Absolute changes every day from previous day / Total number of days in series.
Thereafter I can translate each day in terms of prices i.e. the average price change per day.
Thus if average absolute change per day is lets say 2, then a price that is 10 days away can be said to be 20 price points away.
Thereafter I can calculate the distance based on sqrt(x*x+y*y) formula.
This can be fine tuned by using a bell curve (std dev and mean) rather than just mean of absolute change per day. But then it will make solution more ocmplicated.

Fuzzy Matching on Date-Type values

I don't have a real question but I'm more like seeking for creative input for a problem.
I want to compare two (most likely unequal) Date values and calculate the ratio of their similarity. So for example if I'd compare 08.01.2013 and 10.01.2013 I would get a relative high value but between 08.01.2013 and 17.04.1998it would be really low.
But now I'm not sure how I should exactly calculate the similarity. First I was thinking about turning the Date values into Strings and then use the EditDistance on them (number of single char operations to transform one String into another). This seems like a good idea for some cases and I'll definitly implement it but I also need an appropriate calculation for something like 31.01.2013 and 02.02.2013
Why not use the difference in days between two dates as a starting point?
It is "low" for similar dates and "high" for unequal dates, then use arithmetic to obtain a "similarity ratio" which matches your requirements.
Consider a fixed reference date "early enough" in the past if you get stuck.
The edit distance can be calculated using the Levenshtein distance.
A change in the year would mean a lot more "distance" than a change in the day.
The usual way to compare days would be to calculate the distance in days or hours. To do that, you'd convert both dates in a serial day number. Microsoft offers a DateDiff() function for date comparisons and distance calculations.

how to find interesting points in time series

i have an array of date=>values, like this
"2010-10-12 14:58:36" =>13.4
"2010-10-17 14:58:36" =>12
"2010-10-22 14:58:36" =>17.6
"2010-10-27 14:58:36" =>22
"2010-11-01 14:58:36" =>10
[...]
I use this date-value combination to paint an graph in javascript.
Now i like to mark those dates, who are "very special".
My problem (and Question) is, which aspect should consider to find those specific dates?
As an human, i prefer the date "2010-10-17 14:58:36", because "something" should be happens on this date, because the value on the next dates rises for 5.6 points, which is the biggest step up followed by one mor big step up. On the other hand, also the date "2010-10-27 14:58:36" is an "highlight", because this is
the top of all values and
after this date, there comes the biggest step down.
So as an human, i would be choose both dates.
My problem is: how could an algorithm look like?
I tried averages values for n dates before and after the current values, which results in an accumulation of those specifics dates at the beginning and at the end of the graph
So i tried to find the biggest percentage step up (depending on the date before), but I'm not sure, if i really find the specific dates, I'm looking for?!
How would you tackle the problem?
Thank you.
Looks like financial stocking issue :-) You are looking for Time series analysis - this is a statistical issue. I'd recommend to use R programming language to play with it (you can do complex statistical things very fast). There are tens of special packages, for sure financial one's too. Once you know what you want, you may implement the solution in any other language.
Just try to google time series analysis r.
EDIT: note that R is very powerful - I'd bet there is a tool how to use R packages from other languages.
If you have information over a timeline you could use Inerpolation.
A Polynomial interpolation will give you an approximated polynomial that goes through the points.
What's nice about this is you can then use Mathematical analysis which is easy on polynomials to find interesting points (large gradients, min-max points etc...)
Also you get an approximation of how the function behaves, so you could "future" points and see what may happen in the near future.
Of course looking into the future isn't so accurate, but forms of interpolation are used in analytic to see trends and behaviors.
And of course, it's easy to plot a polynomial, which is always nice.
This is really a question of Statistics http://en.wikipedia.org/wiki/Statistics and the context of your data and what you're looking to highlight, for example, the fact that between 12/10 and 17/10 the data moved negative 1.4 units may be more useful in some scenarios than a larger positive step change.
You need sample data, on which build up a function which can calculate an expected value for any given date; for instance averaging the values of the day before, the same week day of the previous week, of the previous month and so on. After that decide a threshold: interesting date are those for which real value is outside expected value +- threshold

Resources