Efficient gather memory access in Halide - halide

Let's say I want to do an operation (e.g. addition) between two images where each pixel in image Img1 has a corresponding pixel in image Img2. The correspondence vector is stored in a tuple Delta. Basically, something like this:
Img(x, y) = Img1(x, y) + Img2(x + Delta[0](x, y), y + Delta[1](x, y));
This is a memory gather operation. What would be the best way to do describe such a pattern in Halide? How to schedule it?

There isn't really a great way to schedule that. Gathers are slow, even where gather instructions exist. You probably still want to vectorize it over x so that the addressing math and the loads from Img1 and Delta are done using vectors though. I'd just use the obvious thing:
Img.vectorize(x, 8).parallel(y, 4);

Related

Translate - Rotate - Translate Back using XMMATRIX

If all I know is an object's World matrix (because its x/y/z position is not tracked, which would be easier), how do I go about rotating it around it's center?
If I knew the location, it'd be about as simple as something like this:
XMMATRIX world = pMissile->GetWorldMatrix();
XMMATRIX matrixTranslation = XMMatrixTranslationFromVector(pMissile->GetPosition());
XMMATRIX matrixInvTranslations = XMMatrixInverse(nullptr, matrixTranslation);
float rotationAmount = (60 * XMConvertToRadians((float)fElapsedTime / 2.0f));
XMMATRIX missileWorld = world *
matrixInvTranslations
* XMMatrixRotationX(rotationAmount)
* XMMatrixRotationY(rotationAmount)
* XMMatrixRotationZ(rotationAmount)
* matrixTranslation;
pMissile->SetWorldMatrix(missileWorld);
Unfortunately, since I don't know the position, I'm not sure what to do. Basically I need to be able to get the "Translate back to the origin" from just the world matrix. Before I start pulling elements out of the matrix, there must be a DirectX or DirectXTK function to do this, no?
Currently I'm decomposing the matrix to get it:
XMVECTOR vectorTranslation, vectorScale, rotationQuat;
XMMatrixDecompose(&vectorScale, &rotationQuat, &vectorTranslation, world)
If that's the right/best way, let me know!
Somewhat tangentially, as you can see I use an inverse of the translation to "move it back" to where it was originally before I translated it to the origin for rotation. A lot of samples skip this - is there something I'm missing in that you don't -need- to translate back at the end?
XMMatrixDecompose is the correct, fully general way to get the elements of an arbitrary transformation matrix. The computation is expensive, so most folks make assumptions about what's in the matrix--because they control it at all points. For example, avoiding non-uniform scaling can really simplify things.
Many games exclusively use rotation and translation, and avoid scaling or at least avoid non-uniform scaling. You can quickly compute the inverse from such matrices by just transposing the upper 3x3 elements and then negating the x, y, and z elements of the last row.
If you know your matrix only contains a rotation and translation, and never contains scale, then the rotation matrix is just the upper 3x3 elements. As long as your matrix is homogenous (i.e. the last column is [0 0 0 1]), you can just read out the translation from the last row: world.r[3] should be (x, y, z, 1).
If you are new to DirectXMath, you should consider using the SimpleMath wrapper in the DirectX Tool Kit. It handles the alignment complexities a bit more automatically, and includes handy helpers like Matrix::Translation which just extracts the equivalent world.r[3] x, y, and z.

Matching closest image in openCV?

This is the dilenma I'm having. For my application, I need to match an image L1 with the matching one L2 in a set of images. L1 and L2 are the exact same image, except L1 is much smaller (It will need to be upscale?), and could be artifacted a little on the edges, but nevertheless, they are from the exact same source image. Color DOES matter, in that using color information will remove possible ambiguities between the current image and the one it is to be matched with. Using OpenCV (or perhaps there may be a better alternative?), what is the best way to find the matching image (L2).
To reiterate, the image to be matched with is not rotated or distorted in anyway, only resized.
I guess there would be a function which rates how close the image to be matched is to all of the images in the set provided. Then we choose the one with the highest rating as the match. I'm not sure as to how to compare the images though. Any help would be great. Thanks.
Go to github and check out opencv-master\samples\cpp\matcher_simple.cpp(or matching_to_many_images.cpp)
not only can it satisfy your need but it also works for images with perspective distortion (eg. rotation, affine transformation and illumination variation). simply put, it's very robust.
but SIFT and SURF are patented, you might not be able to use it for commercial applications, which sucks. but there are many alternatives, just google around!
OpenCV has a tutorial on similarity measurement for images.
You will need to upscale L1 before doing the comparison, or downscale L2. If you are comparing L2 against lots of images it probably makes more sense to scale L2 down (because you then don't have to call resize for every image being compared against, and there are fewer pixels to be compared).
e.g.
cv::Mat L1 = ...;
cv::Mat L2 = ...;
cv::Mat L2small;
cv::resize(L2, L2small, L1.size());
double pnsr = getPSNR(L1, L2small);
// where code for getPSNR() is in the tutorial
I think you may be able to use something similar to Bag-of-words model used to measure document similarity. Take a look at this: link
I'm reproducing the equation below:
G = X'*X
where X = [x1 x2 ... xn]
In your case, use the normalized histogram of the image as vector xi.
I think you wouldn't have to resize the images in this approach and it would be faster.
EDIT
I tried this in Matlab using some sample images provided in opencv samples:
im1 = imread('baboon.jpg');
im2 = imread('board.jpg');
im3 = imread('fruits.jpg');
im4 = imread('fruits - small.jpg'); % fruits.jpg scaled down 25% using mspaint
% using grayscale for simplicity
gr1 = rgb2gray(im1);
gr2 = rgb2gray(im2);
gr3 = rgb2gray(im3);
gr4 = rgb2gray(im4);
[cnt_baboon, x] = imhist(gr1);
[cnt_board, x] = imhist(gr2);
[cnt_fruits, x] = imhist(gr3);
[cnt_fruits_small, x] = imhist(gr4);
% X: not normalized
X = [cnt_baboon cnt_board cnt_fruits cnt_fruits_small];
H = X'*X;
N = sqrt(diag(H)*diag(H)');
% normalize. this would be faster
G = H./N
The resulting G matrix:
G =
1.0000 0.8460 0.7748 0.7729
0.8460 1.0000 0.8741 0.8686
0.7748 0.8741 1.0000 0.9947
0.7729 0.8686 0.9947 1.0000
You can see that G(3,4) (and G(4,3)) are very close to 1.
i think you are looking for histogram matching. There are inbuilt functions for histogram matching, for example bhattacharya distance etc, and they don't require both of your images to be of same size also.
Just check this link on opencv site,
link

How to average multiple images using Octave and matrix manipulation to reduce noise?

UPDATE
Here is my code that is meant to add up the two matrices and using element by element addition and then divide by two.
function [ finish ] = stackAndMeanImage (initFrame, finalFrame)
cd 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
pkg load image;
i = initFrame;
f = finalFrame;
astr = num2str(i);
tmp = imread(astr, 'jpg');
d = f - i
for a = 1:d
a
astr = num2str(i + 1);
read_tmp = imread(astr, 'jpg');
read_tmp = rgb2gray(read_tmp);
tmp = tmp :+ read_tmp;
tmp = tmp / 2;
end
imwrite(tmp, 'meanimage.JPG');
finish = 'done';
end
Here are two example input images
http://imgur.com/5DR1ccS,AWBEI0d#1
And here is one output image
http://imgur.com/aX6b0kj
I am really confused as to what is happening. I have not implemented what the other answers have said yet though.
OLD
I am working on an image processing project where I am now manually choosing images that are 'empty' or only have the background, so that my algorithm can compute the differences and then do some more analysis, I have a simple piece of code that computes the mean of the two images, which I have converted to grayscale matrices, but this only works for two images, because when I find the mean of two, then take this mean and find the mean of this versus the next image, and do this repeatedly, I end up with a washed out white image that is absolutely useless. You can't even see anything.
I found that there is a function in Matlab called imFuse that is able to average images. I was wondering if anyone knew the process that imFuse uses to combine images, I am happy to implement this into Octave, or if anyone knew of or has already written a piece of code that achieves something similiar to this. Again, I am not asking for anyone to write code for me, just wondering what the process for this is and if there are already pre-existing functions out there, which I have not found after my research.
Thanks,
AeroVTP
You should not end up with a washed-out image. Instead, you should end up with an image, which is technically speaking temporally low-pass filtered. What this means is that half of the information content is form the last image, one quarter from the second last image, one eight from the third last image, etc.
Actually, the effect in a moving image is similar to a display with slow response time.
If you are ending up with a white image, you are doing something wrong. nkjt's guess of type challenges is a good one. Another possibility is that you have forgotten to divide by two after summing the two images.
One more thing... If you are doing linear operations (such as averaging) on images, your image intensity scale should be linear. If you just use the RGB values or some grayscale values simply calculated from them, you may get bitten by the nonlinearity of the image. This property is called the gamma correction. (Admittedly, most image processing programs just ignore the problem, as it is not always a big challenge.)
As your project calculates differences of images, you should take this into account. I suggest using linearised floating point values. Unfortunately, the linearisation depends on the source of your image data.
On the other hand, averaging often the most efficient way of reducing noise. So, there you are in the right track assuming the images are similar enough.
However, after having a look at your images, it seems that you may actually want to do something else than to average the image. If I understand your intention correctly, you would like to get rid of the cars in your road cam to give you just the carless background which you could then subtract from the image to get the cars.
If that is what you want to do, you should consider using a median filter instead of averaging. What this means is that you take for example 11 consecutive frames. Then for each pixel you have 11 different values. Now you order (sort) these values and take the middle (6th) one as the background pixel value.
If your road is empty most of the time (at least 6 frames of 11), then the 6th sample will represent the road regardless of the colour of the cars passing your camera.
If you have an empty road, the result from the median filtering is close to averaging. (Averaging is better with Gaussian white noise, but the difference is not very big.) But your averaging will be affected by white or black cars, whereas median filtering is not.
The problem with median filtering is that it is computationally intensive. I am very sorry I speak very broken and ancient Octave, so I cannot give you any useful code. In MatLab or PyLab you would stack, say, 11 images to a M x N x 11 array, and then use a single median command along the depth axis. (When I say intensive, I do not mean it couldn't be done in real time with your data. It can, but it is much more complicated than averaging.)
If you have really a lot of traffic, the road is visible behind the cars less than half of the time. Then the median trick will fail. You will need to take more samples and then find the most typical value, because it is likely to be the road (unless all cars have similar colours). There it will help a lot to use the colour image, as cars look more different from each other in RGB or HSV than in grayscale.
Unfortunately, if you need to resort to this type of processing, the path is slightly slippery and rocky. Average is very easy and fast, median is easy (but not that fast), but then things tend to get rather complicated.
Another BTW came into my mind. If you want to have a rolling average, there is a very simple and effective way to calculate it with an arbitrary length (arbitrary number of frames to average):
# N is the number of images to average
# P[i] are the input frames
# S is a sum accumulator (sum of N frames)
# calculate the sum of the first N frames
S <- 0
I <- 0
while I < N
S <- S + P[I]
I <- I + 1
# save_img() saves an averaged image
while there are images to process
save_img(S / N)
S <- -P[I-N] + S + P[I]
I <- I + 1
Of course, you'll probably want to use for-loops, and += and -= operators, but still the idea is there. For each frame you only need one subtraction, one addition, and one division by a constant (which can be modified into a multiplication or even a bitwise shift in some cases if you are in a hurry).
I may have misunderstood your problem but I think what you're trying to do is the following. Basically, read all images into a matrix and then use mean(). This is providing that you are able to put them all in memory.
function [finish] = stackAndMeanImage (ini_frame, final_frame)
pkg load image;
dir_path = 'C:\Users\Disc-1119\Desktop\Internships\Tracking\Octave\highway\highway (6-13-2014 11-13-41 AM)';
imgs = cell (1, 1, d);
## read all images into a cell array
current_frame = ini_frame;
for n = 1:(final_frame - ini_frame)
fname = fullfile (dir_path, sprintf ("%i", current_frame++));
imgs{n} = rgb2gray (imread (fname, "jpg"));
endfor
## create 3D matrix out of all frames and calculate mean across 3rd dimension
imgs = cell2mat (imgs);
avg = mean (imgs, 3);
## mean returns double precision so we cast it back to uint8 after
## rescaling it to range [0 1]. This assumes that images were all
## originally uint8, but since they are jpgs, that's a safe assumption
avg = im2uint8 (avg ./255);
imwrite (avg, fullfile (dir_path, "meanimage.jpg"));
finish = "done";
endfunction

Determining if an image is more or less similar to a goal image

I'm trying to think of a fast algorithm for the following issue.
Given a goal image G, and two images A and B, determine which of A or B is more similar to G. Note that images A, B, and G are all the same dimension.
By more similar, I mean it looks more like image G overall.
Any ideas for algorithms? I am doing this in Objective-C, and have the capability to scan each and every single pixel in images A, B, and G.
I implemented the following: scan each and every pixel, determine the absolute error in each of red, green, and blue values for A to G and for B to G. The one with the less error is more similar. It works okay, but it is extremely extremely slow.
It is not possible to do better than X*Y where X, Y are the image dimensions. Since you need to scan each pixel of the input anyways.
However, one technique you can try is scan random pixels in the image and find the difference. Once you see an image considerably similar or dissimilar than A or B, you can stop.
# X, Y are the dimensions
sim_A = 0
sim_B = 0
while( abs(sim_A - sim_B) > MAX_DISSIMILARITY):
rand_x = random(X)
rand_y = random(Y)
sim_A += dissimilar(img_G, img_A, rand_X, rand_Y)
sim_B += dissimilar(img_G, img_B, rand_X, rand_Y)
You may try using SIFT Algorithm (Scale Invariant Feature Transform). As you just mentioned that you want to find which image is MORE similar to the goal image, then I guess this is the best algorithm. It basically extracts the Invariant features of the image (features that dont change with change in luminous intensity, scale, perspective etc) and then creates a feature vector of these. Then you may use this Feature vector to compare it with other images. you may check this and this for further reference.
Ideally there are computer vision libraries that make things way simpler (i guess it might be difficult to read and write to images in objective C, without any computer vision library). OpenCV (opensource computer vision Library) is best suited for stuff like these. It has many inbuilt functions to handle common stuff with images/videos.
Hope this helps :)
I would recommend checking out OpenCV, which is an image processing library. I don't think it has Objective-C support, but I think it is a better starting place than writing your own algorithm. Usually better not to reinvent the wheel unless you are doing it for personal practice.
The best way, I found out, is to do the following.
First, invert all pixels on the image to make the opposite of the image. This is the most dissimilar image.
Then, to compare image to the target image, compute how far away it is from the most dissimilar image. If it's more far, it's a better image.

algorithm for antialiasing an image with sync filter

I have been reading about ways to do antialiasing and since its not processed in real time the antialising with signal processing seems to be ideal especial against artifacts.
However what I have read does not mention the step from turning a a bitmap image into a signal and back again,so I'm looking for an algorithm or code examples to demonstrate that.
A bitmap image already is a digital signal - it's 2 dimensional and the pixel values are the samples. You can apply a sinc filter to it directly.
The usual way things are handled are to apply your filter independently in both the x and y directions. That way, your overall filter is g(x,y) = f(x) * f(y).
In this kind of situation, g(x,y) is called a separable filter, and the advantage is that, by applying the x- and y- filters separately, a straightforward filter convolution takes O(X Y F) time, where X and Y are the dimensions of the image, and F is the support width of the filter f(). An arbitrary nonseparable filter of the same size (which would have O(F^2) samples) generally requires O(X Y F^2) time...
If you really want to apply an full sinc() (== sin(x)/x) filter to your image, the unlimited support of the sinc() function will make straightforward convolution very slow. In that case, it would be faster to do a 2D FFT of your image, filter in the frequency domain, and transform it back.
In practice, though, most people use windowing or other modification, to get a finite filter that can be practically applied in the spatial domain.

Resources