How to use Kalman Filter Theorm to calculate next event date - algorithm

I am having a problem statement which requires that if a particular error/event happens on
1-Jan-2017 and then on
22-Feb-2017,
3-April-2017,
9-July-2017
so i have to predict that when the next event is gonna occur , I am planning to try it with Kalman Filter theorm but it has very statistical terms and on the internet I didn't found any easy explanation or easy programming example for Kalman filter algorithm which estimates the next event dates . Can someone explain in simple terms or any parellel algorithm which can be used for the same purpose ?

Let Ei be the ith event, and let IETi = Ei+1 - Ei be the ith Inter-Event Time, i.e., the time between one event and the next. Then Ei+1 = Ei + IETi — the next event can be forecast from the most recent event based on IET.
Since the past is already determined the only thing random when you're projecting the next event is IET, so E[Ei+1] = Ei + E[IETi] (where E[] denotes a common notation for expected value). You don't need to know the distribution of IET to estimate its expected value, you only need to assume that the IETs are identically distributed. (They don't even need to be independent.) In other words, if IETs are identically distributed then the average of historical IETs is an unbiased estimator of their expected value.
There is a simple Kalman filter estimator to update estimates of an average as you obtain new data. See equations (2) & (3) from this post on math.stackexchange.
Note that this approach just gives a point predictor for the expected value. It won't allow you to make any probability statements about how likely it is the next event happens before or after some specified date. To do that you would need distributional information about the IETs.

Edit: Thanks to #pjs for his remarks. I will update my answer accordingly as soon as I can. However, many authors in the robotics/computer vision communities (e.g. Thrun et al) seem to directly define Kalman filters as Gaussian filters (and for those that are familiar with the computer vision/SLAM litterature, some computer vision works seem to discard standard EKF-based SLAM Since the Gaussian assumption doesn't hold for 3d points). In #pmj's answer, the Gaussian filter is actually nothing more than a running average and doesn't provide covariance (which can in some applications, be considered as the only justification for using a Kalman filter instead of non-linear minimizations on an equivalent cost-function) , so it seems pretty useless without an assumption on the distribution... So I wonder if this is what motivates said authors choices, or if it is just to simplify the discussion.
I think that Kalman filtering has very little to do with what you want to achieve. I will detail why after briefly explaining, in simple terms, what a Kalman filter does.
A Kalman filter estimates the current state x_t of a dynamic system based on all the previous observations, or in more mathematical terms, it models the probability distribution
p(x_t|z_1,...,z_t)
where the z_i are you observations (i.e measurements). Moreover, it is designed with a Gaussian assumption in mind. That is, it assumes that the distribution of your state/errors, including the one above, are Gaussian. Furthermore, it requires a model that links the measurements to the states, something like
z_t=f(x_t)+some_gaussian_noise
and you alos need a transition model, that links the previous state with the current one, e.g.
x_t=g(x_{t-1})+some_gaussian_noise
This comes with the assumption of having a "complete state": the knowledge of the current state is taken to be enough to predict the next one.
So, this is why I think it won't work with your model:
Given the information you've given, I see no sign that you can assume the distribution of the events is Gaussian. Its is probably not.
You don't have any transition equation, I don't even think it is possible to define one for your problem. Moreover, state completion doesn't hold.
Your state, as well as its observations, seem to be discrete, while Kalman filters are designed with continuous parameter spaces in mind.
Unfortunately, you haven't provided much information, so I can only suggest that you model your problem as a Markov chain, which I think you already had thought about.
Hope this helps a bit.

Related

How `vw --audit` internally computes the weights of the features?

In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
From what I understand vowpawabbit tries to fit one linear model to each arm.
So if weights were calculated using an average across all the arms, then they would correlate with getting a reward generally, instead of which features makes the model pick one variant from another.
I am interested know out how they are calculated to see how I can interpret the results obtained. I tried searching its Github repository but could not find anything meaningful.
I am interested know out how they are calculated to see how I can interpret the results obtained.
Unfortunately knowing the first does not lead to knowing the second.
Your question is concerned with contextual bandits, but it is important to note that interpreting model parameters is an issue that also occurs in supervised learning. Machine learning has made progress recently (i.e., my lifetime) largely by focusing concern on quality of predictions rather than meaningfulness of model parameters. In a blog post, Phoebe Wong outlines the issue while being entertaining.
The bottom line is that our models are not causal, so you simply cannot conclude because "the weight of feature X is for arm A is large means that if I were to intervene in the system and increase this feature value that I will get more reward for playing arm A".
We are currently working on tools for model inspection that leverage techniques such as permutation importance that will help you answer questions like "if I were to stop using a particular feature how would the frequency of playing each arm change for the trained policy". We're hoping that is helpful information.
Having said all that, let me try to answer your original question ...
In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
The format is documented here. Assuming you are using --cb (not --cb_adf) then there are a fixed number of arms and so the offset field will increment over the arms. So for an example like
1:2:0.4 |foo bar
with --cb 4 you'll get an audit output with namespace of foo, feature of bar, and offset of 0, 1, 2, and 3.
Interpreting the output when using --cb_adf is possible but difficult to explain succinctly.
From what I understand vowpawabbit tries to fit one linear model to each arm.
Shorter answer: With --cb_type dm, essentially VW independently tries to predict the average reward for each arm using only examples where the policy played that arm. So the weight you get from audit at a particular offset N is analogous to what you would get from a supervised learning model trained to predict reward on a subset of the historical data consisting solely of times the historical policy played arm N. With other --cb_type settings the interpretation is more complicated.
Longer answer: "Linear model" refers to the representation being used. VW can incorporate nonlinearities into the model but let's ignore that for now. "Fit" is where some important details are. VW takes the partial feedback information of a CB problem (partial feedback = "for this example you don't know the reward of the arms not pulled") and reduces it to a full feedback supervised learning problem (full feedback = "for this example you do the reward of all arms"). The --cb_type argument selects the reduction strategy. There are several papers on the topic, a good place to start is Dudik et. al. and then look for papers that cite this paper. In terms of code, ultimately things are grounded here, but the code is written more for performance than intelligibility.

How to deal with asyncronous data in a kalman filter

I'm implementing a Kalman Filter which fuses 3d position data (provided from 2 different computer vision algorithms). I am modeling the problem with a 9-dimensional state vector (position, velocity, and acceleration). However, the data from each sensor does not come at the same time. Since I compute the velocity by considering the time step between the reception of the previous data and the current data point, two consecutive data points can be quite different but separated by only a very small time step, thus making it seem like the position has changed rapidly.
I am wondering if anyone has insight or direction on the best way to approach this problem- will the kalman filter itself be tolerant of this behaviour? Or should I place all data received within a time window into a bin, and perform the update/predict cycle less frequently on a batch of data? The resources I've seen for utilizing kalman filter in object tracking have used only one camera (i.e. synchronous data), so I'm having trouble finding information related to my use case.
Any help is very much appreciated! Thank you!
From all what I got to know from your question and our conversation in the comments let me first shortly describe the issue and suggest the solution.
A quick recap
You have a system with two independent sensors, which take measurements with different rates (30Hz and 5Hz) (and maybe have some time jitter). The good news is that each such measurement is completely sufficient to proceed an update step of your kalman filter. Each measurement have a time stamp.
Another important point is, that the measurements (maybe) have poor precision, so that the change in position looks not plausible.
A possible solution
Define a smallest time interval for calling your kalman filter, so that none of the recieved measurements has to wait too long to be processed. It looks for me like a 100Hz rate could be a good first choice. In this case your dt would be 0.01s.
Design your F and Q matrices based on the chosen dt (they both strongly depend on this value).
In each call without measurement execute the prediction step. As soon as a measurement comes, do update. So your call sequence would look like:
call sequence:
init()
predict()
predict()
predict()
predict()
update(sensor1)
predict()
update(sensor2)
update(sensor1)
predict()
predict()
update(sensor1)
predict()
and so on...
To deal with the precision issue you could use a reference signal (the ground truth). Analyze the error in each sensor reading for each signal (x, y, z) compared to the reference. A kalman filter can work well ONLY with readings, whose error is normally distributed with a zero mean. If you see some systematical offset, may be you can get rid of it. From the observed error you can calculate the standard deviation (and the variance), so you can tell your filter how good the measurements are. It will be your R matrix.
If you don't have a reference you can take some measurements while standing still on the same place. So your reference position would be constant and you could have a look at the dispersion of the readings.
Tune elements of your Q matrix and describe the possible dynamic of your state elements. A smaller Q element for position would tell the filter not to change it too fast. So the (possible) poor performance of your sensors will be partially eliminated (think of a low pass filter as intuition).
I hope it can help you. Please correct me if I understood something wrong.
It would be helpful to see a plot of your sensor readings (and if possible of the reference trajectory).

Binary classification of sensor data

My problem is the following: I need to classify a data stream coming from an sensor. I have managed to get a baseline using the
median of a window and I subtract the values from that baseline (I want to avoid negative peaks, so I only use the absolute value of the difference).
Now I need to distinguish an event (= something triggered the sensor) from the noise near the baseline:
The problem is that I don't know which method to use.
There are several approaches of which I thought of:
sum up the values in a window, if the sum is above a threshold the class should be EVENT ('Integrate and dump')
sum up the differences of the values in a window and get the mean value (which gives something like the first derivative), if the value is positive and above a threshold set class EVENT, set class NO-EVENT otherwise
combination of both
(unfortunately these approaches have the drawback that I need to guess the threshold values and set the window size)
using SVM that learns from manually classified data (but I don't know how to set up this algorithm properly: which features should I look at, like median/mean of a window?, integral?, first derivative?...)
What would you suggest? Are there better/simpler methods to get this task done?
I know there exist a lot of sophisticated algorithms but I'm confused about what could be the best way - please have a litte patience with a newbie who has no machine learning/DSP background :)
Thank you a lot and best regards.
The key to evaluating your heuristic is to develop a model of the behaviour of the system.
For example, what is the model of the physical process you are monitoring? Do you expect your samples, for example, to be correlated in time?
What is the model for the sensor output? Can it be modelled as, for example, a discretized linear function of the voltage? Is there a noise component? Is the magnitude of the noise known or unknown but constant?
Once you've listed your knowledge of the system that you're monitoring, you can then use that to evaluate and decide upon a good classification system. You may then also get an estimate of its accuracy, which is useful for consumers of the output of your classifier.
Edit:
Given the more detailed description, I'd suggest trying some simple models of behaviour that can be tackled using classical techniques before moving to a generic supervised learning heuristic.
For example, suppose:
The baseline, event threshold and noise magnitude are all known a priori.
The underlying process can be modelled as a Markov chain: it has two states (off and on) and the transition times between them are exponentially distributed.
You could then use a hidden Markov Model approach to determine the most likely underlying state at any given time. Even when the noise parameters and thresholds are unknown, you can use the HMM forward-backward training method to train the parameters (e.g. mean, variance of a Gaussian) associated with the output for each state.
If you know even more about the events, you can get by with simpler approaches: for example, if you knew that the event signal always reached a level above the baseline + noise, and that events were always separated in time by an interval larger than the width of the event itself, you could just do a simple threshold test.
Edit:
The classic intro to HMMs is Rabiner's tutorial (a copy can be found here). Relevant also are these errata.
from your description a correctly parameterized moving average might be sufficient
Try to understand the Sensor and its output. Make a model and do a Simulator that provides mock-data that covers expected data with noise and all that stuff
Get lots of real sensor data recorded
visualize the data and verify your assuptions and model
annotate your sensor data i. e. generate ground truth (your simulator shall do that for the mock data)
from what you learned till now propose one or more algorithms
make a test system that can verify your algorithms against ground truth and do regression against previous runs
implement your proposed algorithms and run them against ground truth
try to understand the false positives and false negatives from the recorded data (and try to adapt your simulator to reproduce them)
adapt your algotithm(s)
some other tips
you may implement hysteresis on thresholds to avoid bouncing
you may implement delays to avoid bouncing
beware of delays if implementing debouncers or low pass filters
you may implement multiple algorithms and voting
for testing relative improvements you may do regression tests on large amounts data not annotated. then you check the flipping detections only to find performance increase/decrease

What are good algorithms for detecting abnormality?

Background
Here is the problem:
A black box outputs a new number each day.
Those numbers have been recorded for a period of time.
Detect when a new number from the black box falls outside the pattern of numbers established over the time period.
The numbers are integers, and the time period is a year.
Question
What algorithm will identify a pattern in the numbers?
The pattern might be simple, like always ascending or always descending, or the numbers might fall within a narrow range, and so forth.
Ideas
I have some ideas, but am uncertain as to the best approach, or what solutions already exist:
Machine learning algorithms?
Neural network?
Classify normal and abnormal numbers?
Statistical analysis?
Cluster your data.
If you don't know how many modes your data will have, use something like a Gaussian Mixture Model (GMM) along with a scoring function (e.g., Bayesian Information Criterion (BIC)) so you can automatically detect the likely number of clusters in your data. I recommend this instead of k-means if you have no idea what value k is likely to be. Once you've constructed a GMM for you data for the past year, given a new datapoint x, you can calculate the probability that it was generated by any one of the clusters (modeled by a Gaussian in the GMM). If your new data point has low probability of being generated by any one of your clusters, it is very likely a true outlier.
If this sounds a little too involved, you will be happy to know that the entire GMM + BIC procedure for automatic cluster identification has been implemented for you in the excellent MCLUST package for R. I have used it several times to great success for such problems.
Not only will it allow you to identify outliers, you will have the ability to put a p-value on a point being an outlier if you need this capability (or want it) at some point.
You could try line fitting prediction using linear regression and see how it goes, it would be fairly easy to implement in your language of choice.
After you fitted a line to your data, you could calculate the mean standard deviation along the line.
If the novel point is on the trend line +- the standard deviation, it should not be regarded as an abnormality.
PCA is an other technique that comes to mind, when dealing with this type of data.
You could also look in to unsuperviced learning. This is a machine learning technique that can be used to detect differences in larger data sets.
Sounds like a fun problem! Good luck
There is little magic in all the techniques you mention. I believe you should first try to narrow the typical abnormalities you may encounter, it helps keeping things simple.
Then, you may want to compute derived quantities relevant to those features. For instance: "I want to detect numbers changing abruptly direction" => compute u_{n+1} - u_n, and expect it to have constant sign, or fall in some range. You may want to keep this flexible, and allow your code design to be extensible (Strategy pattern may be worth looking at if you do OOP)
Then, when you have some derived quantities of interest, you do statistical analysis on them. For instance, for a derived quantity A, you assume it should have some distribution P(a, b) (uniform([a, b]), or Beta(a, b), possibly more complex), you put a priori laws on a, b and you ajust them based on successive information. Then, the posterior likelihood of the info provided by the last point added should give you some insight about it being normal or not. Relative entropy between posterior and prior law at each step is a good thing to monitor too. Consult a book on Bayesian methods for more info.
I see little point in complex traditional machine learning stuff (perceptron layers or SVM to cite only them) if you want to detect outliers. These methods work great when classifying data which is known to be reasonably clean.

Predict next event occurrence, based on past occurrences

I'm looking for an algorithm or example material to study for predicting future events based on known patterns. Perhaps there is a name for this, and I just don't know/remember it. Something this general may not exist, but I'm not a master of math or algorithms, so I'm here asking for direction.
An example, as I understand it would be something like this:
A static event occurs on January 1st, February 1st, March 3rd, April 4th. A simple solution would be to average the days/hours/minutes/something between each occurrence, add that number to the last known occurrence, and have the prediction.
What am I asking for, or what should I study?
There is no particular goal in mind, or any specific variables to account for. This is simply a personal thought, and an opportunity for me to learn something new.
I think some topics that might be worth looking into include numerical analysis, specifically interpolation, extrapolation, and regression.
This could be overkill, but Markov chains can lead to some pretty cool pattern recognition stuff. It's better suited to, well, chains of events: the idea is, based on the last N steps in a chain of events, what will happen next?
This is well suited to text: process a large sample of Shakespeare, and you can generate paragraphs full of Shakespeare-like nonsense! Unfortunately, it takes a good deal more data to figure out sparsely-populated events. (Detecting patterns with a period of a month or more would require you to track a chain of at least a full month of data.)
In pseudo-python, here's a rough sketch of a Markov chain builder/prediction script:
n = how_big_a_chain_you_want
def build_map(eventChain):
map = defaultdict(list)
for events in get_all_n_plus_1_item_slices_of(eventChain):
slice = events[:n]
last = events[-1]
map[slice].append(last)
def predict_next_event(whatsHappenedSoFar, map):
slice = whatsHappenedSoFar[-n:]
return random_choice(map[slice])
There is no single 'best' canned solution, it depends on what you need. For instance, you might want to average the values as you say, but using weighted averages where the old values do not contribute as much to the result as the new ones. Or you might try some smoothing. Or you might try to see if the distribution of events fits a well-kjnown distribution (like normal, Poisson, uniform).
If you have a model in mind (such as the events occur regularly), then applying a Kalman filter to the parameters of that model is a common technique.
The only technique I've worked with for trying to do something like that would be training a neural network to predict the next step in the series. That implies interpreting the issue as a problem in pattern classification, which doesn't seem like that great a fit; I have to suspect there are less fuzzy ways of dealing with it.
The task is very similar to language modelling task where given a sequence of history words the model tries to predict a probability distribution over vocabulary for the next word.
There are open source softwares such as SRILM and NLTK that can simply get your sequences as input sentences (each event_id is a word) and do the job.
if you merely want to find the probability of an event occurring after n days given prior data of its frequency, you'll want to fit to an appropriate probability distribution, which generally requires knowing something about the source of the event (maybe it should be poisson distributed, maybe gaussian). if you want to find the probability of an event happening given that prior events happened, you'll want to look at bayesian statistics and how to build a markov chain from that.
You should google Genetic Programming Algorithms
They (sort of like the Neural Networks mentioned by Chaos) will enable you to generate solutions programmatically, then have the program modify itself based on a criteria, and create new solutions which are hopefully closer to accurate.
Neural Networks would have to be trained by you, but with genetic programming, the program will do all the work.
Although it is a hell of a lot of work to get them running in the first place!

Resources