In image processing, specifically in fingerprint recognition, I have to apply a two-dimensional low pass filter with a unit integral.
What does this unit integral mean? Also, if I choose a Gaussian filter, what sigma to use?
Unit integral means that the total area of the mask or kernel should be 1. For example, a 3 x 3 averaging filter means that every coefficient in your mask should be 1/9. When you sum up all of the elements in the mask it adds to 1.
The Gaussian filter inherently has a unit integral / unit area of 1. If you use MATLAB, the fspecial command with the gaussian flag has its mask normalized.
However, if you want to create the Gaussian mask yourself, you can use the following equation:
Bear in mind that (x,y) are the locations inside the mask with respect to the centre. As such, if you have a 5 x 5 mask, then at row = 2, col = 2, x = 0 and y = 0. However, the above equation does not generate a unit area of 1. It is theoretically equal to 1 if you integrate over the entire 2D plane. Because we are truncating the Gaussian function, the area is not 1. As such, once you generate all of your coefficients, you need to make sure that the total area is 1 by summing up every single element in the mask. Then, you take this number and divide every single element in your mask by this number. In fact when you generate the Gaussian mask, it's not important to multiply the exponential term by the scale factor in the equation. By ensuring that the sum of the mask is equal to 1, the scale is effectively removed. You can just use the exponential term instead to shave off some calculations.
In terms of the sigma that is completely up to you. Usually people go with the half width of 3*sigma rule, so the total width spanning from left to right in 1D is 6*sigma + 1 (including the centre). In order to figure out what sigma you want specifically, people figure out how wide the smallest feature is in the image, set that as the width then figure out the sigma from there. For example, if the biggest width is 13, then rearranging for sigma in the equation gives you 2. In other words:
13 = 6*sigma + 1
12 = 6*sigma
sigma = 2
As such, you'd set your sigma to 2 and make the mask 13 x 13. For more information about the 3*sigma rule, check out my post on the topic here: By which measures should I set the size of my Gaussian filter in MATLAB?
Once you create that mask, use any convolution method you wish to Gaussian filter your image.
Here's another post that may help you if you can use MATLAB.
How to make a Gaussian filter in Matlab
If you need to use another language like C or Java, then you could create a Gaussian mask in the following way:
C / C++
#define WIDTH 13
float sigma = ((float)WIDTH - 1.0f) / 6.0f;
int half_width = (int)(WIDTH / 2.0);
float mask[WIDTH][WIDTH];
float scale = 0.0f;
for (int i = -half_width; i <= half_width; i++) {
for(int j = -half_width; j <= half_width; j++) {
mask[i+half_width][j+half_width] = expf( -((float)(i*i + j*j) / (2.0*sigma*sigma)) );
scale += mask[i+half_width][j+half_width];
}
}
for (int i = 0; i < WIDTH; i++)
for (int j = 0; j < WIDTH; j++)
mask[i][j] /= scale;
Java
int WIDTH = 13;
float sigma = ((float)WIDTH - 1.0f) / 6.0f);
int half_width = Math.floor((float)WIDTH / 2.0f);
float[][] mask = new float[WIDTH][WIDTH];
float scale = 0.0f;
for (int i = -half_width; i <= half_width; i++) {
for (int j = -half_width; j <= half_width; j++) {
mask[i+half_width][j+half_width] = (float) Math.exp( -((double)(i*i + j*j) / (2.0*sigma*sigma)) );
scale += mask[i+half_width][j+half_width];
}
}
for (int i = 0; i < WIDTH; i++)
for (int j = 0; j < WIDTH; j++)
mask[i][j] /= scale;
As I noted before, notice that in the code I didn't have to divide by 2*pi*sigma^2. Again, the reason why is because when you normalize the kernel, this constant factor gets cancelled out anyway, so there's no need to add any additional overhead when computing the mask coefficients.
Related
I am writing a web app using WebGL where I create a grid of vertices which is assembled into quads, a bit like a grid.
This is what it looks like, fully textured.
It works fine, but there is a problem. The texture is the following 8x8 image with 2 arrows, one pointing left and one pointing up.
At this point, you may have realized that the texture is flipped in all directions depending on the vertex.
I use the following Dart code to create the vertices. This shouldn't be too hard to follow for javascript developers though
vertices = new Float32List(verts * ChunkRenderer.FLOATS_PER_VERTEX);
for (int i = 0; i < w + 1; i++) {
for (int j = 0; j < h + 1; j++) {
int index = (i + j * (w + 1)) * ChunkRenderer.FLOATS_PER_VERTEX;
double x = i * 16.0;
double z = j * 16.0;
double y = 0.0;
double r = rand.nextDouble();
double g = rand.nextDouble();
double b = rand.nextDouble();
double u = i % 2 == 0 ? 0.0 : 1.0;
double v = j % 2 == 0 ? 0.0 : 1.0;
vertices.setAll(index, [
x, y, z,
u, v,
r, g, b,
]);
}
}
These lines in particular are responsible for setting the UV mapping
double u = i % 2 == 0 ? 0.0 : 1.0;
double v = j % 2 == 0 ? 0.0 : 1.0;
Is there a way to reuse the vertices for each triangle without messing up the textures or do I really need to duplicate vertices for each cells?
I like this current setup because I can move one vertex up for example and it will create a "spike" if you will. If each cell had their own vertices instead, I'd have to update 4 different vertices to get the same effect (excluding vertices on the edges of the grid of course)
Thanks
You can provide 4 pairs of (U,V) for each vertex and then decide which one use.
I don't know your render code so it's hard to provide any example.
I think the most basic form of reuse would be to prepare the mesh data for one quad, and then redraw that quad with a different translation and rotation information for each draw call. This would be significantly less performant than what you're doing, because of the extra draw calls, but would also require less memory.
I experienced that MTLBuffers with computionally intensive shader functions tend to stop calculating before all threadgroups are done. When I use a MTLComputePipelineState and MTLComputeCommandEncoder to blur an image with very large blur radii the resulting image half way processed and one can actually see half finished threadgroups. I did not narrow it down to the exact amount of blur radius, but 16 pixels works fine, 32 is already too much and not even half the groups are computed.
So are there any limitations on how long a shader function call should take to finish or anything like that? I just finished most of the documentation about how to use the Metal framework and I cannot recall stumbling upon any such statements.
EDIT
Since in my case the problem was not a simple timeout but some internal error I'm going to add some code.
The most expensive part of is the block-matching algorithm that finds matching blocks in two images (i.e consecutive frames in a movie)
//Exhaustive Search Block-matching algorithm
kernel void naiveMotion(
texture2d<float,access::read> inputImage1 [[ texture(0) ]],
texture2d<float,access::read> inputImage2 [[ texture(1) ]],
texture2d<float,access::write> outputImage [[ texture(2) ]],
uint2 gid [[ thread_position_in_grid ]]
)
{
//area to search for matches
float searchSize = 10.0;
int searchRadius = searchSize/2;
//window size to search in
int kernelSize = 6;
int kernelRadius = kernelSize/2;
//this will store the motion direction
float2 vector = float2(0.0,0.0);
float2 maxVector = float2(searchSize,searchSize/2);
float maxVectorLength = length(maxVector);
//maximum error caused by noise
float error = kernelSize*kernelSize*(10.0/255.0);
for (int y = -searchRadius; y < searchRadius; ++y)
{
for (int x = 0; x < searchSize; ++x)
{
float diff = 0;
for (int b = - kernelRadius; b < kernelRadius; ++b)
{
for (int a = - kernelRadius; a < kernelRadius; ++a)
{
uint2 textureIndex(gid.x + x + a, gid.y + y + b);
float4 targetColor = inputImage2.read(textureIndex).rgba;
float4 referenceColor = inputImage1.read(gid).rgba;
float targetGray = 0.299*targetColor.r + 0.587*targetColor.g + 0.114*targetColor.b;
float referenceGray = 0.299*referenceColor.r + 0.587*referenceColor.g + 0.114*referenceColor.b;
diff = diff + abs(targetGray - referenceGray);
}
}
if ( error > diff )
{
error = diff;
//vertical motion is rather irrelevant but negative values can't be stored so just take the absolute value
vector = float2(x, abs(y));
}
}
}
float intensity = length(vector)/maxVectorLength;
outputImage.write(float4(normalize(vector), intensity, 1),gid);
}
I am using that shader on a 960x540px image. With a searchSize of 9 and kernelSize of 8 the shader runs over the whole image. Changing the searchSize to 10 and the shader will stop early with an error code 1.
I've an image over which I would like to compute a local histogram within a circular neighborhood. The size of the neighborhood is given by a radius. Although the code below does the job, it's computationally expensive. I run the profiler and the way I'm accessing to the pixels within the circular neighborhoods is already expensive.
Is there any sort of improvement/optimization based maybe on vectorization? Or for instance, storing the neighborhoods as columns?
I found a similar question in this post and the proposed solution is quite in the spirit of the code below, however the solution is still not appropriate to my case. Any ideas are really welcomed :-) Imagine for the moment, the image is binary, but the method should also ideally work with gray-level images :-)
[rows,cols] = size(img);
hist_img = zeros(rows, cols, 2);
[XX, YY] = meshgrid(1:cols, 1:rows);
for rr=1:rows
for cc=1:cols
distance = sqrt( (YY-rr).^2 + (XX-cc).^2 );
mask_radii = (distance <= radius);
bwresponses = img(mask_radii);
[nelems, ~] = histc(double(bwresponses),0:255);
% do some processing over the histogram
...
end
end
EDIT 1 Given the received feedback, I tried to update the solution. However, it's not yet correct
radius = sqrt(2.0);
disk = diskfilter(radius);
fun = #(x) histc( x(disk>0), min(x(:)):max(x(:)) );
output = im2col(im, size(disk), fun);
function disk = diskfilter(radius)
height = 2*ceil(radius)+1;
width = 2*ceil(radius)+1;
[XX,YY] = meshgrid(1:width,1:height);
dist = sqrt((XX-ceil(width/2)).^2+(YY-ceil(height/2)).^2);
circfilter = (dist <= radius);
end
Following on the technique I described in my answer to a similar question you could try to do the following:
compute the index offsets from a particular voxel that get you to all the neighbors within a radius
Determine which voxels have all neighbors at least radius away from the edge
Compute the neighbors for all these voxels
Generate your histograms for each neighborhood
It is not hard to vectorize this, but note that
It will be slow when the neighborhood is large
It involves generating an intermediate matrix that is NxM (N = voxels in image, M = voxels in neighborhood) which could get very large
Here is the code:
% generate histograms for neighborhood within radius r
A = rand(200,200,200);
radius = 2.5;
tic
sz=size(A);
[xx yy zz] = meshgrid(1:sz(2), 1:sz(1), 1:sz(3));
center = round(sz/2);
centerPoints = find((xx - center(1)).^2 + (yy - center(2)).^2 + (zz - center(3)).^2 < radius.^2);
centerIndex = sub2ind(sz, center(1), center(2), center(3));
% limit to just the points that are "far enough on the inside":
inside = find(xx > radius+1 & xx < sz(2) - radius & ...
yy > radius + 1 & yy < sz(1) - radius & ...
zz > radius + 1 & zz < sz(3) - radius);
offsets = centerPoints - centerIndex;
allPoints = 1:prod(sz);
insidePoints = allPoints(inside);
indices = bsxfun(#plus, offsets, insidePoints);
hh = histc(A(indices), 0:0.1:1); % <<<< modify to give you the histogram you want
toc
A 2D version of the same code (which might be all you need, and is considerably faster):
% generate histograms for neighborhood within radius r
A = rand(200,200);
radius = 2.5;
tic
sz=size(A);
[xx yy] = meshgrid(1:sz(2), 1:sz(1));
center = round(sz/2);
centerPoints = find((xx - center(1)).^2 + (yy - center(2)).^2 < radius.^2);
centerIndex = sub2ind(sz, center(1), center(2));
% limit to just the points that are "far enough on the inside":
inside = find(xx > radius+1 & xx < sz(2) - radius & ...
yy > radius + 1 & yy < sz(1) - radius);
offsets = centerPoints - centerIndex;
allPoints = 1:prod(sz);
insidePoints = allPoints(inside);
indices = bsxfun(#plus, offsets, insidePoints);
hh = histc(A(indices), 0:0.1:1); % <<<< modify to give you the histogram you want
toc
You're right, I don't think that colfilt can be used as you're not applying a filter. You'll have to check the correctness, but here's my attempt using im2col and your diskfilter function (I did remove the conversion to double so it now output logicals):
function circhist
% Example data
im = randi(256,20)-1;
% Ranges - I do this globally for the whole image rather than for each neighborhood
mini = min(im(:));
maxi = max(im(:));
edges = linspace(mini,maxi,20);
% Disk filter
radius = sqrt(2.0);
disk = diskfilter(radius); % Returns logical matrix
% Pad array with -1
im_pad = padarray(im, (size(disk)-1)/2, -1);
% Convert sliding neighborhoods to columns
B = im2col(im_pad, size(disk), 'sliding');
% Get elements from each column that correspond to disk (logical indexing)
C = B(disk(:), :);
% Apply histogram across columns to count number of elements
out = histc(C, edges)
% Display output
figure
imagesc(out)
h = colorbar;
ylabel(h,'Counts');
xlabel('Neighborhood #')
ylabel('Bins')
axis xy
function disk = diskfilter(radius)
height = 2*ceil(radius)+1;
width = 2*ceil(radius)+1;
[XX,YY] = meshgrid(1:width,1:height);
dist = sqrt((XX-ceil(width/2)).^2+(YY-ceil(height/2)).^2);
disk = (dist <= radius);
If you want to set your ranges (edges) based on each neighborhood then you'll need to make sure that the vector is always the same length if you want to build a big matrix (and then the rows of that matrix won't correspond to each other).
You should note that the shape of the disk returned by fspecial is not as circular as what you were using. It's meant to be used a smoothing/averaging filter so the edges are fuzzy (anti-aliased). Thus when you use ~=0 it will grab more pixels. It'd stick with your own function, which is faster anyways.
You could try processing with an opposite logic (as briefly explained in the comment)
hist = zeros(W+2*R, H+2*R, Q);
for i = 1:R+1;
for j = 1:R+1;
if ((i-R-1)^2+(j-R-1)^2 < R*R)
for q = 0:1:Q-1;
hist(i:i+W-1,j:j+H-1,q+1) += (image == q);
end
end
end
end
With the below code snippet I create a scene with 100.000 rectangles.
The performance is fine; the view responds with no delays.
QGraphicsScene * scene = new QGraphicsScene;
for (int y = -50000; y < 50000; y++) {
scene->addRect(0, y * 25, 40, 20);
}
...
view->setScene(scene);
And now the 2nd snippet sucks
for (int y = 0; y < 100000; y++) {
scene->addRect(0, y * 25, 40, 20);
}
For the 1st half of scene elements the view delays to respond on mouse and key events, and for the other half it seems to be ok ?!?
The former scene has sceneRect (x, y, w, h) = (0, -1250000, 40, 2499995).
The latter scene has sceneRect (x, y, w, h) = (0, 0, 40, 2499995).
I don't know why the sceneRect affects the performance, since the BSP index is based on relative item coordinates.
Am I missing something? I didn't find any information on the documentation,
plus the Qt demo 40000 Chips also distributes the elements around (0, 0), without explaining the reason for that choice.
// Populate scene
int xx = 0;
int nitems = 0;
for (int i = -11000; i < 11000; i += 110) {
++xx;
int yy = 0;
for (int j = -7000; j < 7000; j += 70) {
++yy;
qreal x = (i + 11000) / 22000.0;
qreal y = (j + 7000) / 14000.0;
...
I have a solution for you, but promise to not ask me why is this working,
because I really don't know :-)
QGraphicsScene * scene = new QGraphicsScene;
// Define a fake symetrical scene-rectangle
scene->setSceneRect(0, -(25*100000+20), 40, 2 * (25*100000+20) );
for (int y = 0; y < 100000; y++) {
scene->addRect(0, y * 25, 40, 20);
}
view->setScene(scene);
// Tell the view to display only the actual scene-objects area
view->setSceneRect(0, 0, 40, 25*100000+20);
For the common case, the default index
method BspTreeIndex works fine. If
your scene uses many animations and
you are experiencing slowness, you can
disable indexing by calling
setItemIndexMethod(NoIndex). Qt-doc
You will need to call setItemIndexMethod(QGraphicsScene::NoIndex) before insertion:
scene->setItemIndexMethod(QGraphicsScene::NoIndex);
for (int y = 0; y < 100000; y++) {
scene->addRect(0, y * 25, 40, 20);
}
//...
It could be due to loss of precision with float. A 32 bit float has a 23 bit mantissa (or significand), 1 bit sign and 8 bit exponent. This is like scientific notation. You have 23 "significant digits" (really 24 due to an implicit leading 1) and an exponent of 2^exp where the exponent can range from -126 to 127 (others are used to give you things like NaN and Inf). So you can represent really large numbers like 2^24*2^127 but the next closest floating point number to such a float is (2^24-1)*2^127 or 170 billion billion billion billion away. If you try to add a smaller amount (like 1000) to such a number it doesn't change. It has no way to represent that.
This becomes significant in computer graphics because you need some of your significant digits left over to make a fractional part. When your scene ranges up to 1250000.0 you can add 0.1 to that and get 1250000.1. If you take 2500000.0 + 0.1 you get 2500000.0. The problem is magnified by any scaling or rotation that occurs. This can lead to obvious visual problems if you actually fly out to those coordinates and look at your scene.
Why does centering around 0 help? Because there's a separate sign bit in the floating point representation. In floating point there are "more numbers" between (-x,+x) than there are from (0,2x). If I'm right it would also work if you simply scaled your entire scene down by 1/2. This moves the most significant bit down leaving it free for precision on the other end.
Why would this lead to poor performance? I can only speculate without reading the Qt source, but consider a data structure for storing objects by location. What might you have to do differently if two objects touch (or overlap) due to loss of precision that you didn't have to do when they did not overlap?
I'm looking for some algorithm to joint objects, for example, combine an apple into a tree in digital image and some demo in Matlab. Please show me some materials of that. Thanks for reading and helping me!!!
I not sure if I undertand your question, but if you are looking to do some image overlaping, as does photoshop layers, you can use some image characteristics to, through that characteristc, determine the degree of transparency.
For example, consider using two RGB images. Image A will be overlapped by image B. To do it, we'll use image B brightness to determine transparency degree (255 = 100%).
Intensity = pixel / 255;
NewPixel = (PixelA * (1 - Intensity)) + (PixelB * Intensity);
As intensity is a percentage and each pixel is multiplied by the complement of this percentage, the resulting sum will never overflow over 255 (max graylevel)
int WidthA = imageA.Width * channels;
int WidthB = imageB.Width * channels;
int width = Min(ImageA.Width, ImageB.Width) * channels;
int height = Min(ImageA.Height, ImageB.Height);
byte *ptrA = imageA.Buffer;
byte *ptrB = imageB.Buffer;
for (int y = 0; y < height; y++)
{
for (int x = 0; x < width; x += channels, ptrA += channels, ptrB += channels)
{
//Take the intensity of the pixel. If RGB (channels = 3), intensity = (R+B+G) / 3. If grayscale, the pixel value is intensity itself
int avg = 0;
for (int j = 0; j < channels; ++j)
{
avg += ptrB[j];
}
//Obtain the intensity as a value between 0..100%
double intensity = (double)(avg / channels) / 255;
for (int j = 0; j < channels; ++j)
{
//Write in image A the resulting pixel which is obtained by multiplying Image B pixel
//by 100% - intensity plus Image A pixel multiplied by the intensity
ptrA[j] = (byte) ((ptrB[j] * (1.0 - intensity)) + ((intensity) * ptrA[j]));
}
}
ptrA = imageA.Buffer + (y * WidthA));
ptrB = imageB.Buffer + (y * WidthB));
}
You can also change this algorithm in order to overlap Image A over B, in a different place. I'm assuming here the image B coordinate (0, 0) will overlap image A coordinate (0, 0).
But once again, I'm not sure if this is what you are looking for.