BlueSKy Summary Numerical Statistical Analysus - summary

If I only want to get different quartiles instead of defaults, I do not get the result. For example, if I am interested to obtain 0.65 quantile I don't have the result using Summary + Numerical Statistical Analysis

The question is that when I want to use in BlueSky the option "Summary" + "Numerical Statistical Analysis" the default is to get quantiles 0, 0.25, 0.5, 0.75 and 1. If you add, for example, 0.65 quantile the programme does not calculate this quantile. There is no result for 0.65 quantile.

Related

Applying weights to KNN dimensions

When doing a KNN searches in ES/OS it seems to be recommended to normalize the data in the knn vectors to prevent single dimensions from over powering the the final scoring.
In my current example I have a 3 dimensional vector where all values are normalized to values between 0 and 1
[0.2, 0.3, 0.2]
From the perspective of Euclidian distance based scoring this seems to give equal weight to all dimensions.
In my particular example I am using an l2 vector:
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
}
However, if I want to give more weight to one of my dimensions (say by a factor of 2), would it be acceptable to single out that dimension and normalize between 0-2 instead of the base range of 0-1?
Example:
[0.2, 0.3, 1.2] // Third vector is now between 0-2
The distance computation for this term would now be (2 * (xi - yi))^2 and lead to bigger diffs compared to the rest. As a result the overall score would be more sensitive to differences in this particular term.
In OS the score is calculated as 1 / (1 + Distance Function) so the higher the value returned from the distance function, the lower the score will be.
Is there a method to deciding what the weighting range should be? Setting the range too high would likely make the dimension too dominant?

when resize my photo, what should i use function among floor, round, ceil in Matlab?

when resize my photo, what should i use function among floor, round, ceil in Matlab?
myimg (256, 256)
when scalefactor is 0.8
256 * 0.8 = 204.8
and then, scaled size of myimg (204.8 , 204.8)
in this case, ceil(204.8) or floor(204.8) or round(204.8)
what should I do?
As the previous commenter outlined it depends on what you need and the use case. Just for anyone looking for functional clarity:
ceil() : Returns the closest integer greater than the input value.
round(): Rounds the input to the closest integer (decimals greater than 0.5 round-up).
floor() : Returns the closest integer lesser than the input value.
Example:
ceil(204.8) → 205
round(204.8) → 205 and round(204.2) → 204
floor(204.8) → 204
Extension:
In this case, if your criteria required an image of at least 80% of the original I would use ceil(). If you require an image size of less than 80% of the original then floor() would best suit. In any other case where the scenario is flexible round() is a good option which will give the closest image resizing down to 80%.

Is it fine to have a threshold greater than 1 in roc_curve metrics?

Predicting the probability of class assignment for each chosen sample from the Train_features:
probs = classifier.predict_proba(Train_features)`
Choosing the class for which the AUC has to be determined.
preds = probs[:,1]
Calculating false positive rate, true positive rate and the possible thresholds that can clearly separate TP and TN.
fpr, tpr, threshold = metrics.roc_curve(Train_labels, preds)
roc_auc = metrics.auc(fpr, tpr)
print(max(threshold))
Output : 1.97834
The previous answer did not really address your question of why the threshold is > 1, and in fact is misleading when it says the threshold does not have any interpretation.
The range of threshold should technically be [0,1] because it is the probability threshold. But scikit learn adds +1 to the last number in the threshold array to cover the full range [0, 1]. So if in your example the max(threshold) = 1.97834, the very next number in the threshold array should be 0.97834.
See this sklearn github issue thread for an explanation. It's a little funny because somebody thought this is a bug, but it's just how the creators of sklearn decided to define threshold.
Finally, because it is a probability threshold, it does have a very useful interpretation. The optimal cutoff is the threshold at which sensitivity + specificity are maximum. In sklearn learn this can be computed like so
fpr_p, tpr_p, thresh = roc_curve(true_labels, pred)
# maximize sensitivity + specificity, i.e. tpr + (1-fpr) or just tpr-fpr
th_optimal = thresh[np.argmax(tpr_p - fpr_p)]
The threshold value does not have any kind of interpretation, what really matters is the shape of the ROC curve. Your classifier performs well if there are thresholds (no matter their values) such that the generated ROC curve lies above the linear function (better than random guessing); your classifier has a perfect result (this happens rarely in practice) if for any threshold the ROC curve is only one point at (0,1); your classifier has the worst result if for any threshold the ROC curve is only one point at (1,0). A good indicator of the performance of your classifier is the integral of the ROC curve, this indicator is known as AUC and is limited between 0 and 1, 0 for the worst performance and 1 for perfect performance.

Step Based signal to smooth one - How can I interpolate?

I'm programming a sort of audio plugin, and I get an array of values that represent a step-based signal such as this:
that have these values:
[ 0.27, 0.43, 0.48, 0.51, 0.85, 0.15, 0.48, 0.01, 0.28, 0.84, 0.15, 0.22, 0.11, 0.86, 0.66, 0.92, 0.40, 0.71 ]
I'm looking for to transform those values into a bigger array of interpolated values that represent a smooth signal, such as a sine wave. Somethings like this (sorry for my Paint art):
What kind of math should I use here? Inside my development environment (Ruby based), I have a common number of math functions. But I don't know where to start.
What you want here is a digital filter - specifically a lowpass filter.
There are two types of simple digital filter, Finite Impulse Response and Infinite Impulse Response.
A FIR filter works by summing, with some weighting, the previous n samples of the audio, and using that to generate the output sample. It's called "Finite Impulse Response", because a single impulse in the input can only affect a finite number of output samples.
An IIR filter, in contrast, uses its own previous output in addition to the current sample. It's called "Infinite Impulse Response" because of this feedback property; a single impulse can affect all future samples.
Of the two, the IIR filter is the simplest to implement, and in its most basic form looks like this:
state(N) = state(N - 1) * weighting + sample(N)
output(N) = state(N)
That is, for each input sample, reduce the previous state value by some amount and add the input, then use that as the output. As such, it's basically a moving average filter.
For example, if you set 'weighting' to 0.95, then each output sample is influenced 95% by previous samples and 5% by the current sample, and the output value will shift slowly in response to changing inputs. It will also be scaled up by 20X (1/(1-weighting)), so you should renormalize it accordingly.
Here's how the first few steps would work with your input data:
Start by setting state = 20 * 0.27.
Output state / 20 = 0.27
Update state = state * 0.95 + 0.43 = 26.08
Output state / 20 = 0.278.
Update state = state * 0.95 + 0.48 = 5.76
Output state / 20 = 0.288
And so forth. If you need more output data points than input data points, repeat your input samples n times before feeding into the filter, or interleave input samples with n zero samples. Both are valid, though they have different impacts on the filtered output.
There is a lot of theory behind digital filter design; in practice for a simple implementation you can probably use this first-order filter, and adjust the weighting value to suit.

Matlab algorithm/workflow need fixing

So during work i need to analyise video's like this one
http://youtu.be/TxBdkLcO5Do of a beating cells.
So i wrote a matlab code that plot a graph of changes in picture over time..
example of data of graph:
0 0
0.1 87124
0.15 87124
0.2 87124
0.25 85589
0.3 85589
0.35 85589
0.4 85589
0.45 19202
0.5 19202
0.55 19202
0.6 19202
0.65 61303
0.7 61303
0.75 61303
0.8 61303
0.85 56689
0.9 56689
0.95 56689
1 72988
1.05 72988
1.1 72988
1.15 72988
1.2 63871
1.25 63871
**
How my code works ?
**
left row is time (in fraction of sec.) and 2nd row is amplitude of picture.
I get loop all frames one by one.
Turn frame into gray,calculate threshold,Turn into binary.
Compare each frame with the frame just before it using imabsdiff
store the result in array corresponding with its Frame number / frame rate
And i get my Graph .... (is this a good way or is there a better way to do it by the way ?)
i plot array time,Amplitude_difference
So what i need to do now with this array is to find the number of Peaks that occurs and analyze the Frequency , Strength and regularity of these peaks !
Frequency = how many peaks in video
Strength = Peak summit value-average(2nd row)
regularity = time between each peak and next one !.
So basicly i should create an array peaks in which with every peak i add an element and assign to it value (Time assisted with peak, Strength )
and after this i want to print out a report of number of peaks and the value freq,str,reqularity of it..
:)
HOW THE CAN I DO THAT :D !?
Suppose you just need a peak finder. The rest should be simple.

Resources