I'm trying to build something like the Liquify filter in Photoshop. I've been reading through image distortion code but I'm struggling with finding out what will create similar effects. The closest reference I could find was the iWarp filter in Gimp but the code for that isn't commented at all.
I've also looked at places like ImageMagick but they don't have anything in this area
Any pointers or a description of algorithms would be greatly appreciated.
Excuse me if I make this sound a little simplistic, I'm not sure how much you know about gfx programming or even what techniques you're using (I'd do it with HLSL myself).
The way I would approach this problem is to generate a texture which contains offsets of x/y coordinates in the r/g channels. Then the output colour of a pixel would be:
Texture inputImage
Texture distortionMap
colour(x,y) = inputImage(x + distortionMap(x, y).R, y + distortionMap(x, y).G)
(To tell the truth this isn't quite right, using the colours as offsets directly means you can only represent positive vectors, it's simple enough to subtract 0.5 so that you can represent negative vectors)
Now the only problem that remains is how to generate this distortion map, which is a different question altogether (any image would generate a distortion of some kind, obviously, working on a proper liquify effect is quite complex and I'll leave it to someone more qualified).
I think liquefy works by altering a grid.
Imagine each pixel is defined by its location on the grid.
Now when the user clicks on a location and move the mouse he's changing the grid location.
The new grid is again projected into the 2D view able space of the user.
Check this tutorial about a way to implement the liquify filter with Javascript. Basically, in the tutorial, the effect is done transforming the pixel Cartesian coordinates (x, y) to Polar coordinates (r, α) and then applying Math.sqrt on r.
Related
I have been working a self project in image processing and robotics where instead robot as usual detecting colors and picking out the object, it tries to detect the holes(resembling different polygons) on the board. For a better understanding of the setup here is an image:
As you can see I have to detect these holes, find out their shapes and then use the robot to fit the object into the holes. I am using a kinect depth camera to get the depth image. The pic is shown below:
I was lost in thought of how to detect the holes with the camera, initially using masking to remove the background portion and some of the foreground portion based on the depth measurement,but this did not work out as, at different orientations of the camera the holes would merge with the board... something like inranging (it fully becomes white). Then I came across adaptiveThreshold function
adaptiveThreshold(depth1,depth3,255,ADAPTIVE_THRESH_GAUSSIAN_C,THRESH_BINARY,7,-1.0);
With noise removal using erode, dilate, and gaussian blur; which detected the holes in a better manner as shown in the picture below. Then I used the cvCanny edge detector to get the edges but so far it has not been good as shown in the picture below.After this I tried out various feature detectors from SIFT, SURF, ORB, GoodFeaturesToTrack and found out that ORB gave the best times and the features detected. After this I tried to get the relative camera pose of a query image by finding its keypoints and matching those keypoints for good matches to be given to the findHomography function. The results are as shown below as in the diagram:
In the end i want to get the relative camera pose between the two images and move the robot to that position using the rotational and translational vectors got from the solvePnP function.
So is there any other method by which I could improve the quality of the
holes detected for the keypoints detection and matching?
I had also tried contour detection and approxPolyDP but the approximated shapes are not really good:
I have tried tweaking the input parameters for the threshold and canny functions but
this is the best I can get
Also ,is my approach to get the camera pose correct?
UPDATE : No matter what I tried I could not get good repeatable features to map. Then I read online that a depth image is cheap in resolution and its only used for stuff like masking and getting the distances. So , it hit me that the features are not proper because of the low resolution image with its messy edges. So I thought of detecting features on a RGB image and using the depth image to get only the distances of those features. The quality of features I got were literally off the charts.It even detected the screws on the board!! Here are the keypoints detected using GoodFeaturesToTrack keypoint detection..
I met an another hurdle while getting the distancewith the distances of the points not coming out properly. I searched for possible causes and it occured to me after quite a while that there was a offset in the RGB and depth images because of the offset between the cameras.You can see this from the first two images. I then searched the net on how to compensate this offset but could not find a working solution.
If anyone one of you could help me in compensate the offset,it would be great!
UPDATE: I could not make good use of the goodFeaturesToTrack function. The function gives the corners in Point2f type .If you want to compute the descriptors we need the keypoints and converting Point2f to Keypoint with the code snippet below leads to the loss of scale and rotational invariance.
for( size_t i = 0; i < corners1.size(); i++ )
{
keypoints_1.push_back(KeyPoint(corners1[i], 1.f));
}
The hideous result from the feature matching is shown below .
I have to start on different feature matchings now.I'll post further updates. It would be really helpful if anyone could help in removing the offset problem.
Compensating the difference between image output and the world coordinates:
You should use good old camera calibration approach for calibrating the camera response and possibly generating a correction matrix for the camera output (in order to convert them into real scales).
It's not that complicated once you have printed out a checkerboard template and capture various shots. (For this application you don't need to worry about rotation invariance. Just calibrate the world view with the image array.)
You can find more information here: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/own_calib.html
--
Now since I can't seem to comment on the question, I'd like to ask if your specific application requires the machine to "find out" the shape of the hole on the fly. If there are finite amount of hole shapes, you may then model them mathematically and look for the pixels that support the predefined models on the B/W edge image.
Such as (x)^2+(y)^2-r^2=0 for a circle with radius r, whereas x and y are the pixel coordinates.
That being said, I believe more clarification is needed regarding the requirements of the application (shape detection).
If you're going to detect specific shapes such as the ones in your provided image, then you're better off using a classifer. Delve into Haar classifiers, or better still, look into Bag of Words.
Using BoW, you'll need to train a bunch of datasets, consisting of positive and negative samples. Positive samples will contain N unique samples of each shape you want to detect. It's better if N would be > 10, best if >100 and highly variant and unique, for good robust classifier training.
Negative samples would (obviously), contain stuff that do not represent your shapes in any way. It's just for checking the accuracy of the classifier.
Also, once you have your classifier trained, you could distribute your classifier data (say, suppose you use SVM).
Here are some links to get you started with Bag of Words:
https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/
Sample code:
http://answers.opencv.org/question/43237/pyopencv_from-and-pyopencv_to-for-keypoint-class/
I have an image processing problem. I have pictures of yarn:
The individual strands are partly (but not completely) aligned. I would like to find the predominant direction in which they are aligned. In the center of the example image, this direction is around 30-34 degrees from horizontal. The result could be the average/median direction for the whole image, or just the average in each local neighborhood (producing a vector map of local directions).
What I've tried: I rotated the image in small steps (1 degree) and calculated statistics in the vertical vs horizontal direction of the rotated image (for example: standard deviation of summed rows or summed columns). I reasoned that when the strands are oriented exactly vertically or exactly horizontally the difference in statistics would be greatest, and so that angle of rotation is the correct direction in the original image. However, for at least several kinds of statistical properties I tried, this did not work.
I further thought that perhaps this wasn't working because there were too many different directions at the same time in the whole image, so I tired it in a small neighborhood. In this case, there is always a very clear preferred direction (different for each neighborhood), but it is not the direction that the fibers really go... I can post my sample code but it is basically useless.
I keep thinking there has to be some kind of simple linear algebra/statistical property of the whole image, or some value derived from the 2D FFT that would give the correct direction in one step... but how?
What probably won't work: detecting individual fibers. They are not necessarily the same color, and the image can shade from light to dark so edge detectors don't work well, and the image may not even be in focus sometimes. Because of that, it is not always even possible to see individual fibers for a human (see top-right in the example), they kinda have to be detected as preferred direction in a statistical sense.
You might try doing this in the frequency domain. The output of a Fourier Transform is orientation dependent so, if you have some kind of oriented pattern, you can apply a 2D FFT and you will see a clustering around a specific orientation.
For example, making a greyscale out of your image and performing FFT (with ImageJ) gives this:
You can see a distinct cluster that is oriented orthogonally with respect to the orientation of your yarn. With some pre-processing on your source image, to remove noise and maybe enhance the oriented features, you can probably achieve a much stronger signal in the FFT. Once you have a cluster, you can use something like PCA to determine the vector for the major axis.
For info, this is a technique that is often used to enhance oriented features, such as fingerprints, by applying a selective filter in the FFT and then taking the inverse to obtain a clearer image.
An alternative approach is to try a series of Gabor filters see here pre-built with a selection of orientations and frequencies and use the resulting features as a metric for identifying the most likely orientation. There is a scikit article that gives some examples here.
UPDATE
Just playing with ImageJ to give an idea of some possible approaches to this - I started with the FFT shown above, then - in the following image, I performed these operations (clockwise from top left) - Threshold => Close => Holefill => Erode x 3:
Finally, rather than using PCA, I calculated the spatial moments of the lower left blob using this ImageJ Plugin which handily calculates the orientation of the longest axis based on the 2nd order moment. The result gives an orientation of approximately -38 degrees (with respect to the X axis):
Depending on your frame of reference you can calculate the approximate average orientation of your yarn from this rather than from PCA.
I tried to use Gabor filters to enhance the orientations of your yarns. The parameters I used are:
phi = x*pi/16; % x = 1, 3, 5, 7
theta = 3;
sigma = 0.65*theta;
filterSize = 3;
And the imag part of the convoluted image are shown below:
As you mentioned, the most orientations lies between 30-34 degrees, thus the filter with phi = 5*pi/16 in left bottom yields the best contrast among the four.
I would consider using a Hough Transform for this type of problem, there is a nice write-up here.
Assume I have a model that is simply a cube. (It is more complicated than a cube, but for the purposes of this discussion, we will simplify.)
So when I am in Sketchup, the cube is Xmm by Xmm by Xmm, where X is an integer. I then export the a Collada file and subsequently load that into threejs.
Now if I look at the geometry bounding box, the values are floats, not integers.
So now assume I am putting cubes next to each other with a small space in between say 1 pixel. Because screens can't draw half pixels, sometimes I see one pixel and sometimes I see two, which causes a lack of uniformity.
I think I can resolve this satisfactorily if I can somehow get the imported model to have integer dimensions. I have full access to all parts of the model starting with Sketchup, so any point in the process is fair game.
Is it possible?
Thanks.
Clarification: My app will have two views. The view that this is concerned with is using an OrthographicCamera that is looking straight down on the pieces, so this is really a 2D view. For purposes of this question, after importing the model, it should look like a grid of squares with uniform spacing in between.
UPDATE: I would ask that you please not respond unless you can provide an actual answer. If I need help finding a way to accomplish something, I will post a new question. For this question, I am only interested in knowing if it is possible to align an imported Collada model to full pixels and if so how. At this point, this is mostly to serve my curiosity and increase my knowledge of what is and isn't possible. Thank you community for your kind help.
Now you have to learn this thing about 3D programming: numbers don't mean anything :)
In the real world 1mm, 2.13cm and 100Kg specify something that can be measured and reproduced. But for a drawing library, those numbers don't mean anything.
In a drawing library, 3D points are always represented with 3 float values.You submit your points to the library, it transforms them in 2D points (they must be viewed on a 2D surface), and finally these 2D points are passed to a rasterizer which translates floating point values into integer values (the screen has a resolution of NxM pixels, both N and M being integers) and colors the actual pixels.
Your problem simply is not a problem. A cube of 1mm really means nothing, because if you are designing an astronomic application, that object will never be seen, but if it's a microscopic one, it will even be way larger than the screen. What matters are the coordinates of the point, and the scale of the overall application.
Now back to your cubes, don't try to insert 1px in between two adjacent ones. Your cubes are defined in terms of mm, so try to choose the distance in mm appropriate to your world, and let the rasterizer do its job and translate them to pixels.
I have been informed by two co-workers that I tracked down that this is indeed impossible using normal means.
Folks,
I have read a number of articles on Discrete Wavelet Transform (DWT) and looked at some sample code as well. However, I am not clear on what exactly does DWT achieve.
Here is what I understand. For a two dimensional image in YUV format, I can pass in the Y plane (brightness) to DWT function as a parameter. The function returns me a matrix of the original width and height containing coefficient values.
What are these coefficient values telling me? Is it how fast or slow the brightness of a pixel has changed compared to its neighbors?
Further, the returned matrix is rearranged in four quarters. As the coefficients have been rearranged, I no longer know which coefficient belongs to which pixel. This is confusing. If I cannot associate the coefficient to its corresponding pixel location, how can I really use the coefficients?
A little bit of background. I am looking at hiding some information in an image as an invisible watermark. From what I understand, DWT can help me identify the best region to hide the information. However, I have not been able to put the whole picture together.
Ok. I figured out how DWT works. I was under the assumption that the coefficients generated have a relationship with the original image. However, the transform converts the input luma into a completely different set. It is possible to run the reverse transform on the new values to once again obtain the original values.
Regards,
Peter
I'd like to implement a Filter that allows resampling of an image by moving a number of control points that mark edges and tangent directions. The goal is to be able to freely transform an image as seen in Photoshop when you use "Free Transform" and chose Warpmode "Custom". The image is fitted into a some kind of Spline-Patch (if that is a valid name) that can be manipulated.
I understand how simple splines (paths) work but how do you connect them to form a patch?
And how can you sample such a patch to render the morphed image? For each pixel in the target I'd need to know what pixel in the source image corresponds. I don't even know where to start searching...
Any helpful info (keywords, links, papers, reference implementations) are greatly appreciated!
This document will get you a good insight into warping: http://www.gson.org/thesis/warping-thesis.pdf
However, this will include filtering out high frequencies, which will make the implementation a lot more complicated but will give a better result.
An easy way to accomplish what you want to do would be to loop through every pixel in your final image, plug the coordinates into your splines and retrieve the pixel in your original image. This pixel might have coordinates 0.4/1.2 so you could bilinearly interpolate between 0/1, 1/1, 0/2 and 1/2.
As for splines: there are many resources and solutions online for the 1D case. As for 2D it gets a bit trickier to find helpful resources.
A simple example for the 1D case: http://www-users.cselabs.umn.edu/classes/Spring-2009/csci2031/quad_spline.pdf
Here's a great guide for the 2D case: http://en.wikipedia.org/wiki/Bicubic_interpolation
Based upon this you could derive an own scheme for splines for the 2D case. Define a bivariate (with x and y) polynomial and set your constraints to solve for the coefficients of the polynomial.
Just keep in mind that the borders of the spline patches have to be consistent (both in value and derivative) to avoid ugly jumps.
Good luck!