I am trying to understand the concept of bilinear interpolation. For example in the case where bilinear interpolation is used to rotate an image (let's say by 45 degrees), and then we rotate it back by the same amount. Is the resulting image the same as the original?
What about when an image is scaled up by a factor c, and then scaled down by the same factor c, is the resulting image the same as the original image?
In general, you will not get back the same values. I'll try and explain empirically...
Imagine a white rectangle on a black background, that will only have values of 255 (white) and 0 (black) in it. When you rotate it, the pixels on the edges of the rectangle will fall between pixels in the new image. At that point, you will end up interpolating between 0-255 and get some entirely new value, say 172. And now you immediately have a problem because bilinear interpolation has introduced a new value that wasn't in the original image and when you rotate back, you will end up interpolating between that new 172 and 255 or 0, which will give you yet another new value not present in the original image.
I hope that helps - it is the reason why you should use Nearest Neighbour interpolation when your pixels represent say classes in a Supervised Classification. You start off with water in class 0 and sand on the beach in class 17 beside it, and if you use Bilinear Interpolation to resize or geo-correct, you get a result of class 7 which might represent wheat - and you will rarely find wheat growing on a beach!
Related
A project I've been working on for the past few months is calculating the top area of an object taken with a 3D depth camera from top view.
workflow of my project:
capture a group of objects image(RGB,DEPTH data) from top-view
Instance Segmentation with RGB image
Calculate the real area of the segmented mask with DEPTH data
Some problem on the project:
All given objects have different shapes
The side of the object, not the top, begins to be seen as it moves to the outside of the image.
Because of this, the mask area to be segmented gradually increases.
As a result, the actual area of an object located outside the image is calculated to be larger than that of an object located in the center.
In the example image, object 1 is located in the middle of the angle, so only the top of the object is visible, but object 2 is located outside the angle, so part of the top is lost and the side is visible.
Because of this, the mask area to be segmented is larger for objects located on the periphery than for objects located in the center.
I only want to find the area of the top of an object.
example what I want image:
Is there a way to geometrically correct the area of an object located on outside of the image?
I tried to calibrate by multiplying the area calculated according to the angle formed by Vector 1 connecting the center point of the camera lens to the center point of the floor and Vector 2 connecting the center point of the lens to the center of gravity of the target object by a specific value.
However, I gave up because I couldn't logically explain how much correction was needed.
fig 3:
What I would do is convert your RGB and Depth image to 3D mesh (surface with bumps) using your camera settings (FOVs,focal length) something like this:
Align already captured rgb and depth images
and then project it onto ground plane (perpendicul to camera view direction in the middle of screen). To obtain ground plane simply take 3 3D positions of the ground p0,p1,p2 (forming triangle) and using cross product to compute the ground normal:
n = normalize(cross(p1-p0,p2-p1))
now you plane is defined by p0,n so just each 3D coordinate convert like this:
by simply adding normal vector (towards ground) multiplied by distance to ground, if I see it right something like this:
p' = p + n * dot(p-p0,n)
That should eliminate the problem with visible sides on edges of FOV however you should also take into account that by showing side some part of top is also hidden so to remedy that you might also find axis of symmetry, and use just half of top side (that is not hidden partially) and just multiply the measured half area by 2 ...
Accurate computation is virtually hopeless, because you don't see all sides.
Assuming your depth information is available as a range image, you can consider the points inside the segmentation mask of a single chicken, estimate the vertical direction at that point, rotate and project the points to obtain the silhouette.
But as a part of the surface is occluded, you may have to reconstruct it using symmetry.
There is no way to do this accurately for arbitrary objects, since there can be parts of the object that contribute to the "top area", but which the camera cannot see. Since the camera cannot see these parts, you can't tell how big they are.
Since all your objects are known to be chickens, though, you could get a pretty accurate estimate like this:
Use Principal Component Analysis to determine the orientation of each chicken.
Using many objects in many images, find a best-fit polynomial that estimates apparent chicken size by distance from the image center, and orientation relative to the distance vector.
For any given chicken, then, you can divide its apparent size by the estimated average apparent size for its distance and orientation, to get a normalized chicken size measurement.
Is there a special webGL trick to check if a texture contains at least one black rgb pixel, without having to read pixels on CPU ?
To me, it seems that checking pixels on CPU is the only solution. In this case,
is there a way for example, to compress a high resolution texture to a 1x1 texture containing a single boolean color information, so that I only have to read one single pixel for performance reason.
Thanks !
Only idea I have is erm... at least complex.
Lets say we have texture with dimension N, then:
Make canvas 1x1 pixel large.
Create array of N*N points, each point with different [x,y] attributes representing pixel position we will look for.
In vertex shader do texture lookup based on the point position. If the color of that pixel is not black then discard;
Set point position [0,0]
In fragment shader simply draw black color (assuming we have white canvas) for point.
Then what you have is 1x1 canvas with black color if there were any black pixels, or white color if there werent any and you can simply read it with cpu.
The bottle neck in this case is second step = point array building, and also cpu-gpu communication. This will run faster only if you need to do a lot of reads in short time and if you know size of the texture before the application runs so you can reuse same buffer for points for all textures.
Just idea, not sure if it would work.
Make a render target, draw your texture into it using a shader that draws white if the pixel is rgb 0,0,0 and black otherwise. Let's assume is 1024x768 and now has 1 white pixel. Draw that into a texture 1/4 it's size. In this case 512x384. With linear filtering on if we have the worst case, just 1 white pixel then it will average to 0.25 (63). We can do that 2 more times first to 256x192, then again 128x96. That one original white (255) pixel will now be (3). So run the black/white shader again and repeat 64x48, 32x24, 16x12, run the black/white shader, then again 8x6, 4x4, 2x2, run the black/white shader. Then 1x1. Now check If that's not pure black there was at least 1 black pixel.
Instead of doing a 1/2 size reduction each time you could try reducing those 3 levels into one by averaging a bunch of pixels in a shader. 15x15 pixels for example would still leave something > 0 if only 1 pixel was white and the rest black. In that case starting at 1024x768 it would be
1024x768 -> 67x52 -> 5x4 -> 1x1
I have no idea if that would be faster or slower.
Basically I was trying to achieve this: impose an arbitrary image to a pre-defined uneven surface. (See examples below).
-->
I do not have a lot of experience with image processing or 3D algorithms, so here is the best method I can think of so far:
Predefine a set of coordinates (say if we have a 10x10 grid, we have 100 coordinates that starts with (0,0), (0,10), (0,20), ... etc. There will be 9x9 = 81 grids.
Record the transformations for each individual coordinate on the t-shirt image e.g. (0,0) becomes (51,31), (0, 10) becomes (51, 35), etc.
Triangulate the original image into 81x2=162 triangles (with 2 triangles for each grid). Transform each triangle of the image based on the coordinate transformations obtained in Step 2 and draw it on the t-shirt image.
Problems/questions I have:
I don't know how to smooth out each triangle so that the image on t-shirt does not look ragged.
Is there a better way to do it? I want to make sure I'm not reinventing the wheels here before I proceed with an implementation.
Thanks!
This is called digital image warping. There was a popular graphics text on it in the 1990s (probably from somebody's thesis). You can also find an article on it from Dr. Dobb's Journal.
Your process is essentially correct. If you work pixel by pixel, rather than trying to use triangles, you'll avoid some of the problems you're facing. Scan across the pixels in target bitmap, and apply the local transformation based on the cell you're in to determine the coordinate of the corresponding pixel in the source bitmap. Copy that pixel over.
For a smoother result, you do your coordinate transformations in floating point and interpolate the pixel values from the source image using something like bilinear interpolation.
It's not really a solution for the problem, it's just a workaround :
If you have the 3D model that represents the T-Shirt.
you can use directX\OpenGL and put your image as a texture of the t-shirt.
Then you can ask it to render the picture you want from any point of view.
I have more then 1 week reading about selective color change of an image. It meand selcting a color from a color picker and then select a part of image in which I want to change the color and apply the changing of color form original color to color of color picker.
E.g. if I select a blue color in color picker and I also select a red part in the image I should be able to change red color to blue color in all the image.
Another example. If I have an image with red apples and oranges and if I select an apple on the image and a blue color in the color picket, then all apples should be changing the color from red to blue.
I have some ideas but of course I need something more concrete on how to do this
Thank you for reading
As a starting point, consider clustering the colors of your image. If you don't know how many clusters you want, then you will need methods to determine whether to merge or not two given clusters. For the moment, let us suppose that we know that number. For example, given the following image at left, I mapped its colors to 3 clusters, which have the mean colors as shown in the middle, and representing each cluster by its mean color gives the figure at right.
With the output at right, now what you need is a method to replace colors. Suppose the user clicks (a single point) somewhere in your image, then you know the positions in the original image that you will need to modify. For the next image, the user (me) clicked on a point that is contained by the "orange" cluster. Then he clicked on some blue hue. From that, you make a mask representing the points in the "orange" cluster and play with that. I considered a simple gaussian filter followed by a flat dilation 3x5. Then you replace the hues in the original image according to the produced mask (after the low pass filtering, the values on it are also considered as a alpha value for compositing the images).
Not perfect at all, but you could have a better clustering than me and also a much-less-primitive color replacement method. I intentionally skipped the details about clustering method, color space, and others, because I used only basic k-means on RGB without any pre-processing of the input. So you can consider the results above as a baseline for anything else you can do.
Given the image, a selected color, and a target new color - you can't do much that isn't ugly. You also need a range, some amount of variation in color, so you can say one pixel's color is "close enough" while another is clearly "different".
First step of processing: You create a mask image, which is grayscale and varying from 0.0 to 1.0 (or from zero to some maximum value we'll treat as 1.0), and the same size as the input image. For each input pixel, test if its color is sufficiently near the selected color. If it's "the same" or "close enough" put 1.0 in the mask. If it's different, put 0.0. If is sorta borderline, put an in-between value. Exactly how to do this depends on the details of the image.
This might work best in LAB space, and testing for sameness according to the angle of the A,B coordinates relative to their origin.
Once you have the mask, put it aside. Now color-transform the whole image. This might be best done in HSV space. Don't touch the V channel. Add a constant to S, modulo 360deg (or mod 256, if S is stored as bytes) and multiply S by a constant chosen so that the coordinates in HSV corresponding to the selected color is moved to the HSV coordinates for the target color. Convert the transformed S and H, with the unchanged L, back to RGB.
Finally, use the mask to blend the original image with the color-transformed one. Apply this to each channel - red, green, blue:
output = (1-mask)*original + mask*transformed
If you're doing it all in byte arrays, 0 is 0.0 and 255 is 1.0, and be careful of overflow and signed/unsigned problems.
I'm successfully drawing the convex polys which make up the following white concave shape.
The orange color is my attempt to add a uniform outline around the white shape. As you can see it's not so uniform. On some edges the orange doesn't show at all.
Evidently using...
glScalef(1.1, 1.1, 0.0);
... to draw a slightly larger orange shape before I drew the white shape wasn't the way to go.
I just have a nagging feeling I'm missing a more simple way to do this.
Note that the white part is going to be mapped with a texture which has areas of transparency, so the orange part needs to be behind the white shapes too, not just surrounding them.
Also, I'm using a parallel projection matrix, that's why glScalef's z is set to 0.0 - reminds me there is no perspective scaling.
Any ideas? Thanks!
Nope, you wont be going anywhere with glScale in this case. Possible options are
a) construct an extruded polygon from the original one (possibly rounding sharp corners)
b) draw the polygon with GL_LINES and set glLineWidth to your desired outline width (in fact you might want to draw the outline with 2x width first)
The first approach will generate CPU load, the second one might slow down rendering significantly AFAIK.
You can displace your polygon in the 8 directions of the compass.
You can have a look at this link: http://simonschreibt.de/gat/cell-shading/
It's a nice trick, and might do the job
Unfortunately there is no simple way to get an outline of consistent width - you just have to do the maths:
For each edge: calculate the normal, scale to the desired width, and add to the edge vertices to get a line segment on the new expanded edge
Calculate the intersection of the lines through two adjacent segments to find the expanded vertex positions
A distinct answer from those offered to date, posted just for interest; if you're in GLES 2.0 have access to shaders then you could render the source polygon to a framebuffer with a texture bound as the colour renderbuffer, then do a second parse to write to the screen (so you're using the image of the white polygon as the input texture and running a post-processing pixel shader to every pixel on the screen) with a shader that obeys the following logic for an outline of thickness q:
if the input is white then output a white pixel
if the input pixel is black then sample every pixel within a radius of q from the current pixel; if any one of them is white then output an orange pixel, otherwise output a black pixel
In practise you'd spend an awful lot on texture sampling and probably turn that into the bottleneck. And they'd be mostly dependent reads, which are bad for the pipeline on lots of GPUs — including the PowerVR SGX that powers the overwhelming majority of OpenGL ES 2.0 devices.
EDIT: actually, you could speed this up substantially; if your radius is q then have the hardware generate mip maps for your framebuffer object, take the first one for which the output pixels are at least q by q in the source image. You've then essentially got a set of bins that'll be pure black if there were no bits of the polygon in that region and pure white if that area was entirely internal to the polygon. For each output fragment that you're considering might be on the border you can quite possibly just straight to a conclusion of definitely in or definitely out and beyond the border based on four samples of the mipmap.