probability density and ML mapping - probability

I am trying to solve the below question. Please help me with this.
A pixel in a particular image has a random brightness, X. The distribution of Xdepends on
whether the pixel belongs to an object (O) of interest or the background (B). If it belongs to
the object, then X has a Laplace distribution,
*fX(x|O) = (a/2)exp(-a|x|), -infinity<x<infinity
where a = 0:2. If the pixel belongs to the background, then X is uniformly distributed
between -10 and 10.
Consider the problem of classifying a pixel based on an observation of its brightness, X = x.
Find the ML rule for all values of x and illustrate the decision regions on the real line.
Find the probability of error if the ML decision rule is used, but the probability that the
pixel belongs to the object of interest is equal to 0.6.

Related

How to determine if a pattern of distribution is different from a random/uniform distribution

Here is my case:
Let's say we have 50 polygons(looks like this:
and a point set distributed within these 50 polygons. So that for each polygon, there is an associated point density. What I want to test if whether the distribution pattern of this data set (for example, the fluctuations in density across 50 polygons) is kind of realization of spatial randomness.
The method I use is: in the uniform random case, the number of points of each ring follows a binomial distribution, i.e. X~B(n, p), where n is the total number of points and p is the probability of each point to be inside a particular polygon (p = Area_polygon/Area_semicircle). So that for each polygon, I can calculate the expected number of points and upon which we can calculate the density. And then I can apply the one-way ANOVA to compare two groups: the actual density group and the theoretical density group.
However, I found a problem: when calculating the density, I actually divide the expected number over the area. But, considering the expected number
E = N(total number)*Area_polygon/total area,
thus the density:
D = N(total number)/total area
which means for each polygon, the expected density is the same number.
So in that case, is it still suitable to use one-way ANOVA to compare my actual density group to a group within which all numbers are the same?
What if use numbers rather than density? Or is there any other more suitable tests?
You may want to look up a method called "quadrat test". It is explained in the online help for the function quadrat.test in the R package spatstat and more extensively in the spatstat book. (Disclaimer: I'm a coauthor.)

Find sub-pixel maximum on a 2D array

Suppose I have an image and I want to find a subarray with shape 3x3 that contains the maximum sum compared to other subarrays.
How do I do that in python efficiently (run as fast as possible)? If you can provide a sample code that would be great.
My specific problem:
I want to extract the location of the center of the blob in this heatmap
I don't want to just get the maximum point because that would cause the coordinate to not be very precise. The true center of the blob could actually be between 2 pixels. Thus, it's better to do weighted average between many points to obtain subpixel precision. For example, if there are 2 points (x1,y1) and (x2,y2) with values 200 and 100. Then the average coordinate will be x=(200*x1+100*x2)/300 y=(200*y1+100*y2)/300
One of my solution is to do a convolution operation. But I think it's not efficient enough because it requires multiplication to the kernel (which contains only ones). I'm looking for a fast implementation so I cannot do looping myself because I'm not sure if it will be fast.
I want to do this algorithm to 50 images every few milliseconds. (Image come in as a batch). Concretely, think of these images as output of a machine learning model that output heatmaps. In order to obtain the coordinate from these heatmaps, I need to do some kind of weighted average between the coordinates with high intensity. My idea is to do a weighted average around 3x3 area on the image. I am also open to other approaches that can be faster or more elegant.
Looking for the "subarray of shape 3x3 with the maximum sum" is the same as looking for the maximum of an image after it has been filtered with an un-normalized 3x3 box filter. So it boils down to finding efficiently the maximum of an image, which you assume is a (perhaps "noisy") discrete sample of an underlying continuous and smooth signal - hence your desire to find a sub-pixel location.
You really need to split the problem in 2 parts:
Find the pixel location m=(xm, ym) of the maximum value of the image. This requires no more than a visit of every pixel in the image, and one comparison per pixel, so it's O(N) and hence optimal as long as you are operating at the native image resolution. In OpenCv it is done using
the minMaxLoc function.
Apply whatever model of the image you are using to find its (subpixel-interpolated) maximum in a neighborhood of m.
To clarify point (2): you write
I don't want to just get the maximum point because that would cause the coordinate to not be very precise. The true center of the blob could actually be between 2 pixels
While intuitively plausible, this assertion needs to be made more precise in order to be computable. That is, you need to express mathematically what assumptions you make about the image, that bring you to search for a "true" maximum between pixel-sampled location.
A simple example for such assumptions is quadratic smoothness. In this scenario you assume that, in a small (say, 3x3, of 5x5) neighborhood of the "true" maximum location, the image signal z is well approximated by a quadratic:
z = A00 dx^2 + A01 dx dy + A11 dy^2 + A02 dx + A12 dy + A22
where:
dx = x - xm; dy = y - ym
This assumption makes sense if the underlying signal is expected to be at least 3rd order continuous and differentiable, because of the Taylor series theorem. Geometrically, it means that you assume (hope?) that the signal looks like a quadric (a paraboloid, or an ellipsoid) near its maximum.
You can then evaluate the above equation for each of the pixels in a neighborhood of m, replacing the actual image values for z, and thus obtain a linear system in the unknown Aij, with as many equations as there are neighbor pixels (so even a 3x3 neighborhood will yield an over-constrained system). Solving the system in the least-squares sense gives you the "optimal" coefficients Aij. The theoretical maximum as predicted by this model is where the first partial derivatives vanish:
del z / del dx = 2 A00 dx + A01 dy = 0
del z / del dy = A01 dx + 2 A11 dy = 0
This is a linear system in the two unknown (dx, dy), and solving it yields the estimated location of the maximum and, through the above equation for z, the predicted image value at the maximum.
In terms of computational cost, all such model estimations are extremely fast, compared with traversing an image of even moderate size.
I am sorry I did not exactly understand the meaning of your last paragraph so I have just stopped at a point where I got all the coordinates having the maximum value. I have used cv2.filter2D for convolution on a thresholded image and then using np.amax and np.where have found the coordinates having the maximum value.
import cv2
import numpy as np
from timeit import default_timer as timer
img = cv2.imread('blob.png', 0)
start = timer()
_, thresh = cv2.threshold(img, 240, 1, cv2.THRESH_BINARY)
mask = np.ones((3, 3), np.uint8)
res = cv2.filter2D(thresh, -1, mask)
result = np.where(res == np.amax(res))
end = timer()
print(end - start)
I don't whether it as efficient as you want or not but the output was 0.0013461999999435648 s
P.S. The image you have provided had a white border which I had to crop out for this method.
One way is to sub-sampling the image and find the neighborhood of the desired point. You can make it by doing a loop not on all the pixels but on e.g. every 5 pixels (row=row+5andcol=col+5 in the loop). After finding the near location, consider a specific neighborhood around that location and do a loop on whole pixels of that specific crop to find the exact location.
Based on my knowledge of image processing, to get a reliable result that works for any one blob, follow these steps:
Make the image greyscale if it isn’t already (pixel values 0-255)
Normalise the image so that pixel intensities cover the full range of 0-255
Convert image to binary (a pixel is either 0 or 1) - this can be achieved by thresholding, such as applying the rule that any pixel less than or equal to 127 in intensity is given an intensity of 0 and anything else is given an intensity of 1
Find the weighted average of all the pixels that hold the value of “1”
or
Apple an erosion to the image until you are left with either 2 pixels or 1 pixel.
Case 1
If you have two pixels then you need to find the u and v co-ordinates if both pixels. The centre of the blob will be the halfway point between the u and v coordinates of the pixels.
Case 2
If you have one pixel left then that pixel’s co-ordinates is the centre point.
—————
You mentioned about achieving this quickly in Python:
Python by design is an interpreted language, so it executed line by line, making it less suitable for highly iterative tasks like image processing. However, you can make use of libraries like OpenCV (https://docs.opencv.org/2.4/index.html), which is written in C, to mitigate this apart from making the task at hand a lot easier for you.
OpenCV also provides solutions for all the steps I listed above in this capacity, therefore you should be able to achieve a reliable solution fairly quickly, though I can’t say for sure if it will hit your target of 50 images every few milliseconds. Other factors to take into account is the size of the image you are processing. That will increase the processing load exponentially.
UPDATE
I just found a good article that practically echoes my step-process:
https://www.learnopencv.com/find-center-of-blob-centroid-using-opencv-cpp-python/
More importantly it also denotes the formula for finding the centroid mathematically as:
c = (1/n)sigma(n, i = 1, x_i)
but this is better written in the article than I can do so here.

Understanding Gradient Descent Algorithm

I'm learning Machine Learning. I was reading a topic called Linear Regression with one variable and I got confused while understanding Gradient Descent Algorithm.
Suppose we have given a problem with a Training Set such that pair $(x^{(i)},y^{(i)})$ represents (feature/Input Variable, Target/ Output Variable). Our goal is to create a hypothesis function for this training set, Which can do prediction.
Hypothesis Function:
$$h_{\theta}(x)=\theta_0 + \theta_1 x$$
Our target is to choose $(\theta_0,\theta_1)$ to best approximate our $h_{\theta}(x)$ which will predict values on the training set
Cost Function:
$$J(\theta_0,\theta_1)=\frac{1}{2m}\sum\limits_{i=1}^m (h_{\theta}(x^{(i)})-y^{(i)})^2$$
$$J(\theta_0,\theta_1)=\frac{1}{2}\times Mean Squared Error$$
We have to minimize $J(\theta_0,\theta_1)$ to get the values $(\theta_0,\theta_1)$ which we can put in our hypothesis function to minimize it. We can do that by applying Gradient Descent Algorithm on the plot $(\theta_0,\theta_1,J(\theta_0,\theta_1))$.
My question is how we can choose $(\theta_0,\theta_1)$ and plot the curve $(\theta_0,\theta_1,J(\theta_0,\theta_1))$. In the online lecture, I was watching. The instructor told everything but didn't mentioned from where the plot will come.
At each iteration you will have some h_\theta, and you will calculate the value of 1/2n * sum{(h_\theta(x)-y)^2 | for each x in train set}.
At each iteration h_\theta is known, and the values (x,y) for each train set sample is known, so it is easy to calculate the above.
For each iteration, you have a new value for \theta, and you can calculate the new MSE.
The plot itself will have the iteration number on x axis, and MSE on y axis.
As a side note, while you can use gradient descent - there is no reason. This cost function is convex and it has a singular minimum that is well known: $\theta = (X^T*X)^{-1)X^Ty$, where yis the values of train set (1xn dimension for train set of size n), and X is 2xn matrix where each line X_i=(1,x_i).

Particle Filter Resampling

I implemented a bootstrap Particle filter on C++ by reading few Papers and I first implemented a 1D mouse tracker which performed really well. I used normal Gaussian for weighting in this exam.
I extended the algorithm to track face using 2 features of Local motion and HSV 32 bin Histogram. In this example my weighing function becomes the probability of Motion x probability of Histogram. (Is this correct).
Incase if that is correct than I am confused on the resampling function. At the moment my resampling function is as follows:
For each Particle N = 50;
Compute CDF
Generate a random number (via Gaussian) X
Update the particle at index X
Repeat for all N particles.
This is my re-sampling function at the moment. Note: the second step I am using a Random Number via Gaussian distribution for get the index while my weighting function is Probability of Motion and Histogram.
My question is: Should I generate random number using the probability of Motion and Histogram or just the random number via Gaussian is ok.
In the SIR (Sequential Importance Resampling) particle filter, resampling aims to replicate particles that have gained high weight, while remove those with less weight.
So, when you have your particles weighted (typically with the likelihood you have used), one way to do resampling is to create the cumulative distribution of the weights, and then generate a random number following a uniform distribution and pick the particle corresponding to the slot of the CDF. This way there is more probability to select a particle that has more weight.
Also, don't forget to add some noise after generating replicas of particles, otherwise your point-estimate might be biased for a period of time.

Is there any algorithm for determining 3d position in such case? (images below)

So first of all I have such image (and ofcourse I have all points coordinates in 2d so I can regenerate lines and check where they cross each other)
(source: narod.ru)
But hey, I have another Image of same lines (I know thay are same) and new coords of my points like on this image
(source: narod.ru)
So... now Having points (coords) on first image, How can I determin plane rotation and Z depth on second image (asuming first one's center was in point (0,0,0) with no rotation)?
What you're trying to find is called a projection matrix. Determining precise inverse projection usually requires that you have firmly established coordinates in both source and destination vectors, which the images above aren't going to give you. You can approximate using pixel positions, however.
This thread will give you a basic walkthrough of the techniques you need to use.
Let me say this up front: this problem is hard. There is a reason Dan Story's linked question has not been answered. Let provide an explanation for people who want to take a stab at it. I hope I'm wrong about how hard it is, though.
I will assume that the 2D screen coordinates and projection/perspective matrix is known to you. You need to know at least this much (if you don't know the projection matrix, essentially you are using a different camera to look at the world). Let's call each pair of 2D screen coordinates (a_i, b_i), and I will assume the projection matrix is of the form
P = [ px 0 0 0 ]
[ 0 py 0 0 ]
[ 0 0 pz pw]
[ 0 0 s 0 ], s = +/-1
Almost any reasonable projection has this form. Working through the rendering pipeline, you find that
a_i = px x_i / (s z_i)
b_i = py y_i / (s z_i)
where (x_i, y_i, z_i) are the original 3D coordinates of the point.
Now, let's assume you know your shape in a set of canonical coordinates (whatever you want), so that the vertices is (x0_i, y0_i, z0_i). We can arrange these as columns of a matrix C. The actual coordinates of the shape are a rigid transformation of these coordinates. Let's similarly organize the actual coordinates as columns of a matrix V. Then these are related by
V = R C + v 1^T (*)
where 1^T is a row vector of ones with the right length, R is an orthogonal rotation matrix of the rigid transformation, and v is the offset vector of the transformation.
Now, you have an expression for each column of V from above: the first column is { s a_1 z_1 / px, s b_1 z_1 / py, z_1 } and so on.
You must solve the set of equations (*) for the set of scalars z_i, and the rigid transformation defined R and v.
Difficulties
The equation is nonlinear in the unknowns, involving quotients of R and z_i
We have assumed up to now that you know which 2D coordinates correspond to which vertices of the original shape (if your shape is a square, this is slightly less of a problem).
We assume there is even a solution at all; if there are errors in the 2D data, then it's hard to say how well equation (*) will be satisfied; the transformation will be nonrigid or nonlinear.
It's called (digital) photogrammetry. Start Googling.
If you are really interested in this kind of problems (which are common in computer vision, tracking objects with cameras etc.), the following book contains a detailed treatment:
Ma, Soatto, Kosecka, Sastry, An Invitation to 3-D Vision, Springer 2004.
Beware: this is an advanced engineering text, and uses many techniques which are mathematical in nature. Skim through the sample chapters featured on the book's web page to get an idea.

Resources