create ROC from 10 different thresholds - roc

I have output from svmlight which has x=predictions (0.1,-0.6,1.2, -0.7...), y=actual class {+1,-1}. I want to create an ROC curve for 10 specific different thresholds (let t be a vector that contains 10 different threshold values). I checked ROCR package but I didn't see any option for supplying threshold vector. I need to calculate TPR and FPR for each threshold value and plot. Is there any other way to do that ? I am new to R programming.

ROCR creates an ROC curve by plotting the TPR and FPR for many different thresholds. This can be done with just one set of predictions and labels because if an observation is classified as positive for one threshold, it will also be classified as positive at a lower threshold. I found this paper to be helpful in explaining ROC curves in more detail.
You can create the plot as follows in ROCR where x is the vector of predictions, and y is the vector of class labels:
pred <- prediction(x,y)
perf <- performance(pred,"tpr","fpr")
plot(perf)
If you want to access the TPR and FPR associated with all the thresholds, you can examine the performance object 'perf':
str(perf)
The following answer shows how to obtain the threshold values in more detail:
https://stackoverflow.com/a/16347508/786220

You can do that with the pROC package. First create the ROC curve (for all thresholds):
myROC <- roc(y, x) # with the x and y you defined in your question
And then you query this curve for the 10 (or any number of) thresholds that you stored in t:
coords(myROC, x = t, input="threshold", ret = c("threshold", "se", "1-sp"))
Sensitivity is your TPR while 1-Specificity is your FPR.
Disclaimer: I am the author of pROC.

You can use this func:
def roc_curve_new(y_true, y_pred, thresholds):
fpr_list = []
tpr_list = []
thresholds_list = []
for threshold in thresholds:
thresholds_list.append(threshold)
new_y_pred = np.where(y_pred < threshold, y_pred, 1)
y_pred_b = np.where(new_y_pred >= threshold,new_y_pred, 0)
tn, fp, fn, tp = confusion_matrix(list(y_true), list(y_pred_b)).ravel()
#true positive rate
tpr = tp/(tp+fn)
#false positive rate
fpr = fp/(fp+tn)
fpr_list.append(fpr)
tpr_list.append(tpr)
return fpr_list, tpr_list, thresholds_list
thresholds = np.arange(0.1, 1.1, 0.1)
y = np.array([1, 1, 0, 1, 1, 0, 0])
scores = np.array([0.5, 0.4, 0.35, 0.75, 0.55, 0.4, 0.2])
fpr, tpr, _ = roc_curve_new(y, scores, thresholds)
plt.plot(fpr, tpr, '.-', color='b')
plt.plot([0, 1], [0, 1], color="navy", lw=1, linestyle="--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
It will give you img:

Related

Kalman FIlter Convergence

Attached is a simple python Kalman filter example of a free-fall object (g=-9.8m/s^2)
Alas, I have a problem. The state vector x contains both the position and the velocity but the z vector (measurement) contains only the position.
If I set a wrong initial position value, the algorithm coverages to the true value even with noisy measurements (see picture below)
However, if I sent the wrong initial velocity value, the algorithm does not converge even though the motion model is defined correctly.
Attached is the python code:
kalman.py
In your code I see two problems.
You set the Q-Matrix to zero. It means you trust too much in your model and give the filter no chance to improve the estimation through the measurement. Your filter becomes to stiff. You can think of it like a low pass filter with a very big time constant.
In my code I set the Q-Matrix to
Q = np.array([[1,0],[0,0.1]])
The second issue is your measurement noise. You simulate the noisy measurements with R=100 but communicate to the filter R=4. The filter trusts the measurement more than it should be. This issue is not really relevant to your question but still it should be corrected.
Now even if I set the initial velocity to 20, the position estimation works fine.
Here is the estimation for R = 4:
And for R = 100:
UPDATE
The velocity estimation works wrong, because you have some mistakes in your matrix operations. Please note, the matrix multiplication goes through np.dot(), not through *.
Here is a correct result for v0 = 20:
Many thanks, Anton.
Attached below is the corrected code for your convenience:
Roi
import numpy as np
import matplotlib.pyplot as plt
%matplotlib notebook
from numpy.linalg import inv
N = 1000 # number of time steps
dt = 0.01 # Sampling time (s)
t = dt*np.arange(N)
F = np.array([[1, dt],[ 0, 1]])# system matrix - state
B = np.array([[-1/2*dt**2],[ -dt]])# system matrix - input
H = np.array([[1, 0]])#; % observation matrix
Q = np.array([[1,0],[0,1]])
u = 9.80665# % input = acceleration due to gravity (m/s^2)
I = np.array([[1,0],[0,1]]) #identity matrix
# Define the initial position and velocity
y0 = 100; # m
v0 = 0; # m/s
G2 = np.array([-1/2*dt**2, -dt])# system matrix - input
# Initialize the state vector (true state)
xt = np.zeros((2, N)) # True state vector
xt[:,0] = [y0,v0]
for k in range(1,N):
xt[:,k] = np.dot(F,xt[:,k-1]) +G2*u
#Generate the noisy measurement from the true state
R = 4 # % m^2/s^2
v = np.sqrt(R)*np.random.randn(N) #% measurement noise
z = np.dot(H,xt) + v; #% noisy measurement
R2=4
#% Initialize the covariance matrix
P = np.array([[10, 0], [0, 0.1]])# Covariance for initial state error
#% Loop through and perform the Kalman filter equations recursively
x_list =[]
x_kalman= np.array([[117],[290]])
x_list.append(x_kalman)
print(-B*u)
for k in range(1,N):
x_kalman=np.dot(F,x_kalman) +B*u
P = np.dot(np.dot(F,P),F.T) +Q
S=(np.dot(np.dot(H,P),H.T) + R2)
S2 = inv(S)
K = np.dot(P,H.T)*S2
x_kalman = x_kalman +K*((z[:,k]- np.dot(H,x_kalman)))
P = np.dot((I - K*H),P)
x_list.append(x_kalman)
x_array = np.array(x_list)
print(x_array.shape)
plt.figure()
plt.plot(t,z[0,:], label="measurment", color='LIME', linewidth=1)
plt.plot(t,x_array[:,0,:],label="kalman",linewidth=5)
plt.plot(t,xt[0,:],linestyle='--', label = "Truth",linewidth=6)
plt.legend(fontsize=30)
plt.grid(True)
plt.xlabel("t[s]")
plt.title("Position Estimation", fontsize=20)
plt.ylabel("$X_t$ = h[m]")
plt.gca().set( ylim=(0, 110))
plt.gca().set(xlim=(0,6))
plt.figure()
#plt.plot(t,z, label="measurment", color='LIME')
plt.plot(t,x_array[:,1,:],label="kalman",linewidth=4)
plt.plot(t,xt[1,:],linestyle='--', label = "Truth",linewidth=2)
plt.legend()
plt.grid(True)
plt.xlabel("t[s]")
plt.title("Velocity Estimation")
plt.ylabel("$X_t$ = h[m]")

Why is my python Butterworth filter smoothing the signal spectrum (in frequency space)?

For a fluctuating time series I use a low pass butterworth filter to exclude high frequency noise in my analysis. It is implemented using scipy.signal butter and filtfilt functions.
def butter_lp(data, N, cutoff, df):
b, a = butter(N, cutoff/(df*len(data)), btype='lowpass', output='ba')
y = filtfilt(b, a, data)
return y
def plot_response(N, cutoff, data, df):
b, a = butter(N, cutoff/(df*len(data)), btype='lowpass', output='ba')
w, h = freqz(b, a)
plt.plot(w * (1/(2*dt)), 20 * log10(abs(h)), label='{0}, {1}'.format(forder,cutoff))
plt.xscale('log')
plt.xlabel('Frequency [radians / second]')
plt.ylabel('Amplitude [dB]')
plt.legend(loc='best')
plt.margins(0, 0.1)
plt.grid(which='both', axis='both')
plt.axvline(100, color='green') # cutoff frequency
#plt.show()
for completeness the input 'data' code looks something like
filtered["channel{0:02d}".format(ch)] = butter_lp(noiseelim(transpose(xbt[0][6])[ch-1][opint:clint]), forder, lpfreq, df)
Taking order 5 (bearing in mind it's applied twice) and a cutoff of 200kHz I see the expected attenuation of the signal above this cutoff frequency BUT ALSO the fluctuations of the signal (now in a noise floor region of the spectrum) are smoothed. Why? Can I/ should I avoid it?
shared y plot showing on the left the original spectrum including noise floor and adjacent (right) the filtered spectrum showing a smoothed, attenuating tail in place of the noise floor

Hilbert-Peano curve to scan image of arbitrary size

I have written an implementation of Hilbert-Peano space filling curve in Python (from a Matlab one) to flatten my 2D image:
def hilbert_peano(n):
if n<=0:
x=0
y=0
else:
[x0, y0] = hilbert_peano(n-1)
x = (1/2) * np.array([-0.5+y0, -0.5+x0, 0.5+x0, 0.5-y0])
y = (1/2) * np.array([-0.5+x0, 0.5+y0, 0.5+y0, -0.5-y0])
return x,y
However, the classical Hilbert-Peano curve only works for multi-dimensionnal array whose shape is a power of two (ex: 256*256 or 512*512 in case of a 2D array (image)).
Does anybody know how to extend this to an array of arbitrary size?
I had the same problem and have written an algorithm that generates a Hilbert-like curve for rectangles of arbitrary size in 2D and 3D. Example for 55x31: curve55x31
The idea is to recursively apply a Hilbert-like template but avoid odd sizes when halving the domain dimensions. If the dimensions happen to be powers of two, the classic Hilbert curve is generated.
def gilbert2d(x, y, ax, ay, bx, by):
"""
Generalized Hilbert ('gilbert') space-filling curve for arbitrary-sized
2D rectangular grids.
"""
w = abs(ax + ay)
h = abs(bx + by)
(dax, day) = (sgn(ax), sgn(ay)) # unit major direction
(dbx, dby) = (sgn(bx), sgn(by)) # unit orthogonal direction
if h == 1:
# trivial row fill
for i in range(0, w):
print x, y
(x, y) = (x + dax, y + day)
return
if w == 1:
# trivial column fill
for i in range(0, h):
print x, y
(x, y) = (x + dbx, y + dby)
return
(ax2, ay2) = (ax/2, ay/2)
(bx2, by2) = (bx/2, by/2)
w2 = abs(ax2 + ay2)
h2 = abs(bx2 + by2)
if 2*w > 3*h:
if (w2 % 2) and (w > 2):
# prefer even steps
(ax2, ay2) = (ax2 + dax, ay2 + day)
# long case: split in two parts only
gilbert2d(x, y, ax2, ay2, bx, by)
gilbert2d(x+ax2, y+ay2, ax-ax2, ay-ay2, bx, by)
else:
if (h2 % 2) and (h > 2):
# prefer even steps
(bx2, by2) = (bx2 + dbx, by2 + dby)
# standard case: one step up, one long horizontal, one step down
gilbert2d(x, y, bx2, by2, ax2, ay2)
gilbert2d(x+bx2, y+by2, ax, ay, bx-bx2, by-by2)
gilbert2d(x+(ax-dax)+(bx2-dbx), y+(ay-day)+(by2-dby),
-bx2, -by2, -(ax-ax2), -(ay-ay2))
def main():
width = int(sys.argv[1])
height = int(sys.argv[2])
if width >= height:
gilbert2d(0, 0, width, 0, 0, height)
else:
gilbert2d(0, 0, 0, height, width, 0)
A 3D version and more documentation is available at https://github.com/jakubcerveny/gilbert
I found this page by Lutz Tautenhahn:
"Draw A Space-Filling Curve of Arbitrary Size" (http://lutanho.net/pic2html/draw_sfc.html)
The algorithm doesn't have a name, he doesn't reference anyone else and the sketch suggests he came up with it himself.
I wonder if this is possible for a z order curve and how?
[1]Draw A Space-Filling Curve of Arbitrary Size
I finally choose, as suggested by Betterdev as adaptive curves are not that straigthforward [1], to compute a bigger curve and then get rid of coordinates which are outside my image shape:
# compute the needed order
order = np.max(np.ceil([np.log2(M), np.log2(N)]))
# Hilbert curve to scan a 2^order * 2^order image
x, y = hilbert_peano(order)
mat = np.zeros((2**order, 2**order))
# curve as a 2D array
mat[x, y] = np.arange(0, x.size, dtype=np.uint)
# clip the curve to the image shape
mat = mat[:M, :N]
# compute new indices (from 0 to M*N)
I = np.argsort(mat.flat)
x_new, y_new = np.meshgrid(np.arange(0, N, dtype=np.uint), np.arange(0, M, dtype=np.uint))
# apply the new order to the grid
x_new = x_new.flat[I]
y_new = y_new.flat[I]
[1] Zhang J., Kamata S. and Ueshige Y., "A Pseudo-Hilbert Scan Algorithm for Arbitrarily-Sized Rectangle Region"

Why is my performance bad? (Noob scheduling)

I'm mainly a very high level programmer so thinking about things like CPU locality is very new to me.
I'm working on a basic bilinear demosaic (for RGGB sensor data) and I've got the algorithm right (judging by the results) but it's not performing as well as I'd hoped (~210Mpix/s).
Here's my code (the input is a 4640x3472 image with a single channel of RGGB):
def get_bilinear_debayer(input_raw, print_nest=False):
x, y, c = Var(), Var(), Var()
# Clamp and move to 32 bit for lots of space for averaging.
input = Func()
input[x,y] = cast(
UInt(32),
input_raw[
clamp(x,0,input_raw.width()-1),
clamp(y,0,input_raw.height()-1)]
)
# Interpolate vertically
vertical = Func()
vertical[x,y] = (input[x,y-1] + input[x,y+1])/2
# Interpolate horizontally
horizontal = Func()
horizontal[x,y] = (input[x-1,y] + input[x+1,y])/2
# Interpolate on diagonals
diagonal_average = Func()
diagonal_average[x, y] = (
input[x+1,y-1] +
input[x+1,y+1] +
input[x-1,y-1] +
input[x-1,y+1])/4
# Interpolate on adjacents
adjacent_average = Func()
adjacent_average[x, y] = (horizontal[x,y] + vertical[x,y])/2
red, green, blue = Func(), Func(), Func()
# Calculate the red channel
red[x, y, c] = select(
# Red photosite
c == 0, input[x, y],
# Green photosite
c == 1, select(x%2 == 0, vertical[x,y],
horizontal[x,y]),
# Blue photosite
diagonal_average[x,y]
)
# Calculate the blue channel
blue[x, y, c] = select(
# Blue photosite
c == 2, input[x, y],
# Green photosite
c == 1, select(x%2 == 1, vertical[x,y],
horizontal[x,y]),
# Red photosite
diagonal_average[x,y]
)
# Calculate the green channel
green[x, y, c] = select(
# Green photosite
c == 1, input[x,y],
# Red/Blue photosite
adjacent_average[x,y]
)
# Switch color interpolator based on requested color.
# Specify photosite as third argument, calculated as [x, y, z] = (0, 0, 0), (0, 1, 1), (1, 0, 1), (1, 1, 2)
# Happily works out to a sum of x mod 2 and y mod 2.
debayer = Func()
debayer[x, y, c] = select(c == 0, red[x, y, x%2 + y%2],
c == 1, green[x, y, x%2 + y%2],
blue[x, y, x%2 + y%2])
# Scheduling
x_outer, y_outer, x_inner, y_inner, tile_index = Var(), Var(), Var(), Var(), Var()
bits = input_raw.get().type().bits
output = Func()
# Cast back to the original colour space
output[x,y,c] = cast(UInt(bits), debayer[x,y,c])
# Reorder so that colours are calculated in order (red runs, then green, then blue)
output.reorder_storage(c, x, y)
# Tile in 128x128 squares
output.tile(x, y, x_outer, y_outer, x_inner, y_inner, 128, 128)
# Vectorize based on colour
output.bound(c, 0, 3)
output.vectorize(c)
# Fuse and parallelize
output.fuse(x_outer, y_outer, tile_index)
output.parallel(tile_index)
# Debugging
if print_nest:
output.print_loop_nest()
debayer.print_loop_nest()
red.print_loop_nest()
green.print_loop_nest()
blue.print_loop_nest()
return output
Honestly I have no idea what I'm doing here and I'm too new to this to have any clue where or what to look at.
Any advice on how to improve the scheduling is helpful. I'm still learning but feedback is hard to find.
The schedule I have is the best I've been able to do but it's pretty much entirely trial and error.
EDIT: I added an extra 30Mpix/s by doing the whole adjacent average summation in the function directly and by vectorizing on x_inner instead of colour.
EDIT: New schedule:
# Set input bounds.
output.bound(x, 0, (input_raw.width()/2)*2)
output.bound(y, 0, (input_raw.height()/2)*2)
output.bound(c, 0, 3)
# Reorder so that colours are calculated in order (red runs, then green, then blue)
output.reorder_storage(c, x, y)
output.reorder(c, x, y)
# Tile in 128x128 squares
output.tile(x, y, x_outer, y_outer, x_inner, y_inner, 128, 128)
output.unroll(x_inner, 2).unroll(y_inner,2)
# Vectorize based on colour
output.unroll(c)
output.vectorize(c)
# Fuse and parallelize
output.fuse(x_outer, y_outer, tile_index)
output.parallel(tile_index)
EDIT: Final schedule that's now beating (640MP/s) the Intel Performance Primitive benchmark that was run on a CPU twice as powerful as mine:
output = Func()
# Cast back to the original colour space
output[x,y,c] = cast(UInt(bits), debayer[x,y,c])
# Set input bounds.
output.bound(x, 0, (input_raw.width()/2)*2)
output.bound(y, 0, (input_raw.height()/2)*2)
output.bound(c, 0, 3)
# Tile in 128x128 squares
output.tile(x, y, x_outer, y_outer, x_inner, y_inner, 128, 128)
output.unroll(x_inner, 2).unroll(y_inner, 2)
# Vectorize based on colour
output.vectorize(x_inner, 16)
# Fuse and parallelize
output.fuse(x_outer, y_outer, tile_index)
output.parallel(tile_index)
target = Target()
target.arch = X86
target.os = OSX
target.bits = 64
target.set_feature(AVX)
target.set_feature(AVX2)
target.set_feature(SSE41)
output.compile_jit(target)
Make sure that you are using unroll(c) to make the per-channel select logic optimize away. Unrolling by 2 in x and y will also help:
output.unroll(x, 2).unroll(y,2)
The goal there is to optimize out the select logic between even/odd rows and columns. In order to take full advantage of that, you'll likely also need to tell Halide that the min and extent are a multiple of 2:
output.output_buffer().set_bounds(0,
(f.output_buffer().min(0) / 2) * 2,
(output.output_buffer().extent(0) / 2) * 2)
output.output_buffer().set_bounds(1,
(f.output_buffer().min(1) / 2) * 2,
(output.output_buffer().extent(1) / 2) * 2)
Though it may be worth stating even more stringent constraints, such as using 128 instead of 2 to assert multiples of the tile size or just hardwiring the min and extent to reflect the actual sensor parameters if you are only supporting a single camera.

How to calculate the energy and correlation of an image

Could anyone help me how to calculate the energy and correlation of an image using MATLAB?
I think you are looking for graycomatrix and graycoprops. From the graycoprops documentation, two properties that can be computed:
'Correlation' statistical measure of how correlated a pixel is to its
neighbor over the whole image. Range = [-1 1].
Correlation is 1 or -1 for a perfectly positively or
negatively correlated image. Correlation is NaN for a
constant image.
'Energy' summation of squared elements in the GLCM. Range = [0 1].
Energy is 1 for a constant image.
To compute these properties, first compute the graylevel co-occurrence matrix via graycomatrix, then call graycoprops. For example,
I = imread('circuit.tif');
GLCM = graycomatrix(I,'Offset',[2 0;0 2]);
stats = graycoprops(GLCM,{'correlation','energy'})
You just need to decide on the Offset parameter for graycomatrix. A thorough choice would be offset = [0 1; -1 1; -1 0; -1 -1];
To compute entropy for the GLCMs, you can't use graycoprops, so you'll have to do it yourself:
p = bsxfun(#rdivide,GLCM,sum(sum(GLCM,1),2)); % normalize each GLCM to probs
numGLCMs = size(p,3);
entropyVals = zeros(1,numGLCMs);
for ii=1:numGLCMs,
pi = p(:,:,ii);
entropyVals(ii) = -sum(pi(pi>0).*log(pi(pi>0)));
end

Resources