Azure anomaly detector parameters - azure-anomaly-detection

I'm trialing the azure anomaly detector API (with C#) for alert conditions in my organisation. Data is formatted and ordered correctly, and the API returns a result as expected, using the last data point technique. What I'm struggling to understand is how the following parameters interact with each other:
Sensitivity
MaxAnomalyRatio
Period
Documentation here only really lists types and vague ranges. Are there any specific, detailed examples showing what these parameters are and how they interact?

This blog article has details on the max anomaly ratio I found very helpful: https://www.smartercode.io/how-azure-anomaly-detection-api-allows-you-to-find-weirdness-inside-crowd-part-2/
[Optional] Maximum Anomaly Ratio: Advanced model parameter; Between 0 and less than 0.5, it tells the maximum percentage of points that can be determined as anomalies. You can think of it as a mechanism to limit the top anomaly candidates.

Related

How `vw --audit` internally computes the weights of the features?

In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
From what I understand vowpawabbit tries to fit one linear model to each arm.
So if weights were calculated using an average across all the arms, then they would correlate with getting a reward generally, instead of which features makes the model pick one variant from another.
I am interested know out how they are calculated to see how I can interpret the results obtained. I tried searching its Github repository but could not find anything meaningful.
I am interested know out how they are calculated to see how I can interpret the results obtained.
Unfortunately knowing the first does not lead to knowing the second.
Your question is concerned with contextual bandits, but it is important to note that interpreting model parameters is an issue that also occurs in supervised learning. Machine learning has made progress recently (i.e., my lifetime) largely by focusing concern on quality of predictions rather than meaningfulness of model parameters. In a blog post, Phoebe Wong outlines the issue while being entertaining.
The bottom line is that our models are not causal, so you simply cannot conclude because "the weight of feature X is for arm A is large means that if I were to intervene in the system and increase this feature value that I will get more reward for playing arm A".
We are currently working on tools for model inspection that leverage techniques such as permutation importance that will help you answer questions like "if I were to stop using a particular feature how would the frequency of playing each arm change for the trained policy". We're hoping that is helpful information.
Having said all that, let me try to answer your original question ...
In vowpawabbit there is an option --audit that prints the weights of the features.
If we have a vw contextual bandit model with four arms, how is this feature weight created?
The format is documented here. Assuming you are using --cb (not --cb_adf) then there are a fixed number of arms and so the offset field will increment over the arms. So for an example like
1:2:0.4 |foo bar
with --cb 4 you'll get an audit output with namespace of foo, feature of bar, and offset of 0, 1, 2, and 3.
Interpreting the output when using --cb_adf is possible but difficult to explain succinctly.
From what I understand vowpawabbit tries to fit one linear model to each arm.
Shorter answer: With --cb_type dm, essentially VW independently tries to predict the average reward for each arm using only examples where the policy played that arm. So the weight you get from audit at a particular offset N is analogous to what you would get from a supervised learning model trained to predict reward on a subset of the historical data consisting solely of times the historical policy played arm N. With other --cb_type settings the interpretation is more complicated.
Longer answer: "Linear model" refers to the representation being used. VW can incorporate nonlinearities into the model but let's ignore that for now. "Fit" is where some important details are. VW takes the partial feedback information of a CB problem (partial feedback = "for this example you don't know the reward of the arms not pulled") and reduces it to a full feedback supervised learning problem (full feedback = "for this example you do the reward of all arms"). The --cb_type argument selects the reduction strategy. There are several papers on the topic, a good place to start is Dudik et. al. and then look for papers that cite this paper. In terms of code, ultimately things are grounded here, but the code is written more for performance than intelligibility.

Time series database - interpolation and filtering

Does any existing time series database contain functionality for interpolation and digital filtering for outlier detection (Hampel, Savitzky-Golay)? Or at least provide interfaces to enable custom query result post-processing?
As far as I know InfluxDB does not offer you anything more than a simple linear interpolation and the basic set of InfluxQL functions.
Looks like, everything more complex than that has to be done by hand with the programming language of your choice. Influx has a number of language clients.
There is an article on anomaly detection with Prometheus, but that looks like an attempt rather then a capability.
However, there is a thing called Kapacitor for InfluxDB. It a quite powerful data processing tools which allows define User Defined Functions (UDF). Here is an article of how to implement custom anomaly detection with Kapacitor.
Akumuli time-series database provides some basic outlier detection, for instance the EWMA and SMA. You can issue a query that will return the difference between the predicted value (given by EWMA or SMA) and the actual value.

Evaluating a specific Information retrieval system with P#1

I am working on a information retrieval system which aims to select the first result and to link it to other database. Indeed, our system is based on a Keyword description of a video and try to interlink the video to a DBpedia entity which has the same meaning of the description. In the step of evaluation, i noticid that the majority of evaluation set the minimum of the precision cut-off to 5, whereas in our system is not suitable. I am thinking to put an interval [1,5]: (P#1,...P#5).Will it be possible? !!
Please provide your suggestions and your reference to some notes.. Thanks..
You can definitely calculate P#1 for a retrieval system, if you have truth labels. (In this case, it sounds like they would be [Video, DBPedia] matching pairs generated by humans).
People generally look at this measure for things like Question-Answering or recommendation systems. The only caveat is that you typically wouldn't use it to train a learning to rank system or any other learning system -- it's not "continuous enough" a near miss (best at rank 2) and a total miss (best at rank 4 million) get equivalent scores, so it can be hard to smoothly improve a system by tuning weights in such a case.
For those kinds of tasks, using Mean Reciprocal Rank is pretty common, if you need something tunable. Also NDCG tends to be okay, too, since it has an exponential discounting factor.
But there's nothing in the definition of precision that prevents you from calculating it at rank 1. It may be more correct to describe it as a "success#1" feature, since you're going to get 0/1 or 1/1 as your two options.

methods of lessening the number of features when machine learning on images

I'm performing machine learning on a 25 x 125 image set. After getting the rgb components it becomes 9375 features per example (and I have about 675). I was trying fminunc and fminsearch and I thought that there was something wrong with my method, because it was 'freezing', but when I decrease the number of features by a factor of 10, it took a while but worked. How can I minimise the number of features, while maintaining the information relevant in the picture? I tried k-means, but I don't see how that helps, as I still have the same number of features, just that there are a lot of redundancy.
You're looking for feature reduction or selection methods. For example see this library:
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Reduction.html
or see this question
Feature Selection in MATLAB
If you google feature selection/reduction matlab will find many relevant articles/tools. Or you could google some commonly used methods like PCA (principal component analysis).

How can I visualize changes in a large code base quality?

One of the things I’ve been thinking about a lot off and on is how we can use metrics of some kind to measure change, are we going backwards or not? This is in the context of a large, legacy code base which we are improving. Most of the code is C++ with a C heritage. Some new functions and the GUI are written in C#.
To start with, we could at least be checking if the simple complexity level was changing over time in the code. The difficulty is in having a representation – we can maybe do a 3D surface where a 2D map represents the code and we have a heat-map of color representing complexity with the 3D surface bulging in and out to show change.
Once you can generate some matrics of numbers there are a ton of math systems around to take care of stuff like this.
Over time, I'd like to have more sophisticated numbers in there but the same visualisation techniques used to represent change.
I like the idea in Crap4j of focusing on the ratio between complexity and number of unit tests covering that code.
I'd also like to include Uncle Bob's SOLID metrics and some of the Chidamber and Kemerer OO metrics. The hard part is finding tools to generate these for C++. The only option seems to be Krakatau Essential Metrics (I have no objection to paying for tools). My desire to use the CK metrics comes partly from the books Object-Oriented Metrics:Measures of Complexity by Henderson-Sellers and the earlier Object-Oriented Software Metrics.
If we start using a number of these metrics we could end up with ten or so numbers that are varying across time. I'm fairly ignorant of statistics but it seems it could be interesting to track a bunch of such metrics and then pay attention to which ones tend to vary.
Note that a related question is about measuring code quality across a large code base. I'm more interested in measuring the change.
I'd consider using a Kiviat Diagram to represent multiple software metrics dimensions evolving over time. These diagrams represent multiple data points in a concave hull around a centerpoint. Visual inspection will show where a particular metric is going up or down, and one ought to be able to compute an overall ratio of area biased by metric value using some hueristic area computation.
You can also have a glance at NDepend documentation about code metrics. Disclaimer: I am one of the developer of the tool NDepend.
With the Code Rule and Query over LINQ (CQLinq) facility, it is possible to ask for code metric evolution/trending across two different snapshots in time of the code base. For example there is a default rule proposed: Avoid making complex methods even more complex illustrated by the screenshot below:
Several metric trending rules are proposed like:
Avoid decreasing code coverage by[enter link description here]5 tests of types
Types that used to be 100% covered but not anymore
and also, since you mentioned Crap4J the metric C.R.A.P can be written with CQLinq, and the query could be easily tweaked to see the trending in C.R.A.P metric.
Concerning the visualization of code metric, NDepend lets visualize code metrics values through an interactive treemap:
There is a fresh approach for this topic.
E.g.
https://github.com/databricks/koalas/pull/840#issuecomment-536949320
See https://softagram.com/docs/visualizing-code-changes/ for more info or do an image search in search engine using the two keywords: softagram koalas
Disclaimer: I work for Softagram.

Resources