Anomaly detection using TensorFlow Probability - anomaly-detection

I was checking for the option for anomaly detection in univariate timseries data and thought of using TensorFlow Probability anomaly_detection module.
I have daily data for 3 months which is spread between 12.5 to 20.0 range.
On every Friday I induced the value between 25.5 to 27.5.
The code worked good and did not show these values as anomalies.
My question is; for one Friday if my value is again 15.5, it should have treated it as anomaly, because on Fridays the values are expected to be between 25.5 to 27.5.
Do I need to change any parameter to consider seasonality?
My code is as below
predictions = tfp_ad.detect_anomalies(data, anomaly_threshold=0.01,
use_gibbs_predictive_dist=False,
num_warmup_steps=50,
num_samples=100,
jit_compile=False,
seed=None)
Also, what is the importance of "anomaly_threshold" parameter?

Related

amCharts 4: How to handle real gaps?

I have date based Data with real gaps in it, not only the value is missing, also the date is missing. In amCharts 3, graph.connect = false, the Date based chart produced gaps.
In amCharts 4 series.connect = false only works if at least one data point follows with only the date, but without the value.
Is it some how possible to produce gaps when the whole data point is missing?
The demos with gaps always have at least on data point for the gap.
To continue with David Liang's answer, you have to put connect=false and use autoGapCount. But be careful as it won't work with XYCharts!
Taken from amcharts:
The feature (available since version 4.2.4) responsible for that is called autoGapCount, and it works like this: if the distance between two adjacent data points is bigger than baseInterval * autoGapCount and connect = false the line breaks.
The default for this setting is 1.1 which means that if the distance is at least 10% greater than the granularity of data, we consider it a gap.
Looking at the sample data above, we have daily granularity: distance between each data point is one day (24 hours). Now, since April 22nd is missing the distance between 21st and 23rd becomes two days (48 hours) which satisfies "gap" criteria (24 hours * 1.1 = 26.4 hours). Boom - a gap.
This allows fine-grained control of what to consider a gap in the data. Say we want a line to break only if there's a distance of three days between data points: we set autoGapCount = 3. Now, if there's distance of one or two days between two data points, they'll be connected, even if connect = false

AutoML Vision: Predictions include --other-- field

I have just trained a new model with a binary outcome (elite/non-elite). The model trained well, but when I tested a new image on it in the GUI it returned a third label --other--. I am not sure how/why that has appeared. Any ideas?
When multi-class (single-label) classification is used, there is an assumption that the confidence of all predictions must sum to 1 (as one and exactly one valid label is assumed). This is achieved by using softmax function. It normalizes all predictions to sum to 1 - which has some drawbacks - for example if both predictions are very low - for example prediction of "elite" is 0.0001 and Non_elite is 0.0002 - after normalization the predictions would be 0.333 and 0.666 respectively.
To work around that the automl system allows to use extra label (--other--) to indicate that none of the allowed predictions seems valid. This label is implementation detail and shouldn't be returned by the system (should be filtered out). This should get fixed in the near future.

Gensim Doc2Vec - Why does infer_vector() use alpha?

I try to map sentences to a vector in order to make sentences comparable to each other. To test gensim's Doc2Vec model, I downloaded sklearn's newsgroup dataset and trained the model on it.
In order to compare two sentences, I use model.infer_vector() and I am wondering why two calls using the same sentence delivers me different vectors:
model = Doc2Vec(vector_size=100, window=8, min_count=5, workers=6)
model.build_vocab(documents)
epochs=10
for epoch in range(epochs):
print("Training epoch %d" % (epoch+1))
model.train(documents, total_examples=len(documents), epochs=epochs)
v1 = model.infer_vector("I feel good")
v2 = model.infer_vector("I feel good")
print(np.linalg.norm(v1-v2))
Output:
Training epoch 1
0.41606528
Training epoch 2
0.43440753
Training epoch 3
0.3203116
Training epoch 4
0.3039317
Training epoch 5
0.68224543
Training epoch 6
0.5862567
Training epoch 7
0.5424634
Training epoch 8
0.7618142
Training epoch 9
0.8170159
Training epoch 10
0.6028216
If I set alpha and min_alpha = 0 I get consistent vectors for the "I feel fine" and "I feel good", but the model gives me the same vector in every epoch, so it does not seem to learn anything:
Training epoch 1
0.043668125
Training epoch 2
0.043668125
Training epoch 3
0.043668125
Training epoch 4
0.043668125
Training epoch 5
0.043668125
Training epoch 6
0.043668125
Training epoch 7
0.043668125
Training epoch 8
0.043668125
Training epoch 9
0.043668125
Training epoch 10
0.043668125
So my questions are:
Why do I even have the possibility to specify a learning rate for inference? I would expect that the model is only changed during training and not during inference.
If I specify alpha=0 for inference, why does the distance between those two vectors not change during different epochs?
Inference uses an alpha because it is the same iterative adjustment process as training, just limited to updating the one new vector for the one new text example.
So yes, the model's various weights are frozen. But the one new vector's weights (dimensions) start at small random values, just as every other vector also began, and then get incrementally nudged over multiple training cycles to make the vector work better as a doc-vector for predicting the text's words. Then the final new-vector is returned.
Those nudges begin at the larger starting alpha value, and wind up as the negligible min_alpha. With an alpha at 0.0, no training/inference can happen, because every nudge-correction to the updatable weights is multiplied by 0.0 before it's applied, meaning no change happens.
Separate from that, your code has a number of problems that may prevent desirable results:
By calling train() epochs times in a loop, and then also supplying a value larger than 1 for epochs, you're actually performing epochs * epochs total training passes
further, by leaving alpha and min_alpha unspecified, each call to train() will descend the effective alpha from its high-value to its low-value each call – a sawtooth pattern that's not proper for this kind of stochastic gradient descent optimization. (There should be a warning in your logs about this error.)
It's rare to need to call train() multiple times in a loop. Just call it once, with the right epochs value, and it will do the right thing: that many passes, with a smoothly-decaying alpha learning-rate.
Separately, when calling infer_vector():
it needs a list-of-tokens, just like the words property of the training examples that were items in documents – not a string. (By supplying a string, it looks like a list-of-characters, so it will be inferring a doc-vector for the document ['I', ' ', 'f', 'e', 'e', 'l', ' ', 'g', 'o', 'o', 'd'] not ['I', 'feel', 'good'].)
those tokens should be preprocessed the same as the training documents – for example if they were lowercased there, they should be lowercased before passing to infer_vector()
the default argument passes=5 is very small, especially for short texts – many report better results with a value in the tens or hundreds
the default argument alpha=0.1 is somewhat large compared to the training default 0.025; using the training value (especially with more passes) often gives better results
Finally, just like the algorithm during training makes use of randomization (to adjust word-prediction context windows, or randomly-sample negative examples, or randomly down-sample highly-frequent words), the inference does as well. So even supplying the exact same tokens won't automatically yield the exact same inferred-vector.
However, if the model has been sufficiently-trained, and the inference is adjusted as above for better results, the vectors for the same text should be very, very close. And because this is a randomized algorithm with some inherent 'jitter' between runs, it's best to make your downstream evaluations and uses tolerant to such small variances. (And, if you're instead seeing large variances, correct other model/inference issues, usually with more data or other parameter adjustments.)
If you want to force determinism, there's some discussion of how to do that in a gensim project issue. But, understanding & tolerating the small variances is often more consistent with the choice of such a randomly-influenced algorithm.

suitable formula/algorithm for detecting temperature fluctuations

I'm creating an app to monitor water quality. The temperature data is updated every 2 min to firebase real-time database. App has two requirements
1) It should alert the user when temperature exceed 33 degree or drop below 23 degree - This part is done
2) It should alert user when it has big temperature fluctuation after analysing data every 30min - This part i'm confused.
I don't know what algorithm to use to detect big temperature fluctuation over a period of time and alert the user. Can someone help me on this?
For a period of 30 minutes, your app would give you 15 values.
If you want to figure out a big change in this data, then there is one way to do so.
You can use implement the following method:
Calculate the mean and the standard deviation of the values.
Subtract the data you have from the mean and then take the absolute value of the result.
Compare if the absolute value is greater than one standard deviation, if it is greater then you have a big data.
See this example for better understanding:
Lets suppose you have these values for 10 minutes:
25,27,24,35,28
First Step:
Mean = 27 (apprx)
One standard deviation = 3.8
Second Step: Absolute(Data - Mean)
abs(25-27) = 2
abs(27-27) = 0
abs(24-27) = 3
abs(35-27) = 8
abs(28-27) = 1
Third Step
Check if any of the subtraction is greater than standard deviation
abs(35-27) gives 8 which is greater than 3.8
So, there is a big fluctuation. If all the subtracted results are less than standard deviation, then there is no fluctuation.
You can still improvise the result by selecting two or three standard deviation instead of one standard deviation.
Start by defining what you mean by fluctuation.
You don't say what temperature scale you're using. Fahrenheit, Celsius, Rankine, or Kelvin?
Your sampling rate is a new data value every two minutes. Do you define fluctuation as the absolute value of the difference between the last point and current value? That's defensible.
If the max allowable absolute value is some multiple of your 33-23 = 10 degrees you're in business.

Some details about adjusting cascaded AdaBoost stage threshold

I have implemented AdaBoost sequence algorithm and currently I am trying to implement so called Cascaded AdaBoost, basing on P. Viola and M. Jones original paper. Unfortunately I have some doubts, connected with adjusting the threshold for one stage. As we can read in original paper, the procedure is described in literally one sentence:
Decrease threshold for the ith classifier until the current
cascaded classifier has a detection rate of at least
d × Di − 1 (this also affects Fi)
I am not sure mainly two things:
What is the threshold? Is it 0.5 * sum (alpha) expression value or only 0.5 factor?
What should be the initial value of the threshold? (0.5?)
What does "decrease threshold" mean in details? Do I need to iterative select new threshold e.g. 0.5, 0.4, 0.3? What is the step of decreasing?
I have tried to search this info in Google, but unfortunately I could not find any useful information.
Thank you for your help.
I had the exact same doubt and have not found any authoritative source so far. However, this is what is my best guess to this issue:
1. (0.5*sum(aplha)) is the threshold.
2. Initial value of the threshold is what is above. Next, try to classify the samples using the intermediate strong classifier (what you currently have). You'll get the scores each of the samples attain, and depending on the current value of threshold, some of the positive samples will be classified as negative etc. So, depending on the desired detection rate desired for this stage (strong classifier), reduce the threshold so that that many positive samples get correctly classified ,
eg:
say thresh. was 10, and these are the current classifier outputs for positive training samples:
9.5, 10.5, 10.2, 5.4, 6.7
and I want a detection rate of 80% => 80% of above 5 samples classified correctly => 4 of above => set threshold to 6.7
Clearly, by changing the threshold, the FP rate also changes, so update that, and if the desired FP rate for the stage not reached, go for another classifier at that stage.
I have not done a formal course on ada-boost etc, but this is my observation based on some research papers I tried to implement. Please correct me if something is wrong. Thanks!
I have found a Master thesis on real-time face detection by Karim Ayachi (pdf) in which he describes the Viola Jones face detection method.
As it is written in Section 5.2 (Creating the Cascade using AdaBoost), we can set the maximal threshold of the strong classifier to sum(alpha) and the minimal threshold to 0 and then find the optimal threshold using binary search (see Table 5.1 for pseudocode).
Hope this helps!

Resources