JavaCV - Identifying the most accurate of detected faces - javacv

I am new to the JavaCV/OpenCV thing, so apologies in advance if I'm being a complete idiot...
I need to detect the "primary/main" face in an image (This image will for the most part be a "Profile picture"), face recognition is not required.
Due to the fact that the different haarcascade files each detect different faces and that the faces detected are sometimes not actually faces but arbitrary artifacts in the image, I need to decide which of the faces to use.
Assuming the faces detected are real faces, it makes sense to use the largest face because it is a profile pic.
The main problem I'm having is that the code detects (for some images) more that 1 face and the biggest face is actually not the persons face at all.
Here is an example from one of my tests where the code detected 2 faces, 1 being the real face and the other being the woman's bust, it just so happens that her bust is bigger than her face.
Face: java.awt.Rectangle[x=62,y=42,width=78,height=78] Area of 6084
Bust: java.awt.Rectangle[x=86,y=144,width=80,height=80] Area of 6400
So my question in short, if I have multiple detected faces, is there some sort of rating scale that I can use to determine which of the faces best matches what OpenCV sees as a face?

Unfortunately the face detection does not provide you such an option.
I guess your code looks something like this :
CvSeq faces = cvHaarDetectObjects(grayImage, cascade, ......);
So you get a CvSeq which is in fact nothing more than a pointer to the sequence of rectangles that delimits the faces. You get no more information about that.
Usually, I would say that the bust is under the head (even more, the head is above the rest of the body), except in some special cases ;).
You can simply utilize the Y position to discard the bust.
If other elements that are non body parts are detected as faces then you're doomed.

Related

anyway to remove algorithmically discolorations from aerial imagery

I don't know much about image processing so please bear with me if this is not possible to implement.
I have several sets of aerial images of the same area originating from different sources. The pictures have been taken during different seasons, under different lighting conditions etc. Unfortunately some images look patchy and suffer from discolorations or are partially obstructed by clouds or pix-elated, as par example picture1 and picture2
I would like to take as an input several images of the same area and (by some kind of averaging them) produce 1 picture of improved quality. I know some C/C++ so I could use some image processing library.
Can anybody propose any image processing algorithm to achieve it or knows any research done in this field?
I would try with a "color twist" transform, i.e. a 3x3 matrix applied to the RGB components. To implement it, you need to pick color samples in areas that are split by a border, on both sides. You should fing three significantly different reference colors (hence six samples). This will allow you to write the nine linear equations to determine the matrix coefficients.
Then you will correct the altered areas by means of this color twist. As the geometry of these areas is intertwined with the field patches, I don't see a better way than contouring the regions by hand.
In the case of the second picture, the limits of the regions are blurred so that you will need to blur the region mask as well and perform blending.
In any case, don't expect a perfect repair of those problems as the transform might be nonlinear, and completely erasing the edges will be difficult. I also think that colors are so washed out at places that restoring them might create ugly artifacts.
For the sake of illustration, a quick attempt with PhotoShop using manual HLS adjustment (less powerful than color twist).
The first thing I thought of was a kernel matrix of sorts.
Do a first pass of the photo and use an edge detection algorithm to determine the borders between the photos - this should be fairly trivial, however you will need to eliminate any overlap/fading (looks like there's a bit in picture 2), you'll see why in a minute.
Do a second pass right along each border you've detected, and assume that the pixel on either side of the border should be the same color. Determine the difference between the red, green and blue values and average them along the entire length of the line, then divide it by two. The image with the lower red, green or blue value gets this new value added. The one with the higher red, green or blue value gets this value subtracted.
On either side of this line, every pixel should now be the exact same. You can remove one of these rows if you'd like, but if the lines don't run the length of the image this could cause size issues, and the line will likely not be very noticeable.
This could be made far more complicated by generating a filter by passing along this line - I'll leave that to you.
The issue with this could be where there was development/ fall colors etc, this might mess with your algorithm, but there's only one way to find out!

how to improve keypoints detection and matching

I have been working a self project in image processing and robotics where instead robot as usual detecting colors and picking out the object, it tries to detect the holes(resembling different polygons) on the board. For a better understanding of the setup here is an image:
As you can see I have to detect these holes, find out their shapes and then use the robot to fit the object into the holes. I am using a kinect depth camera to get the depth image. The pic is shown below:
I was lost in thought of how to detect the holes with the camera, initially using masking to remove the background portion and some of the foreground portion based on the depth measurement,but this did not work out as, at different orientations of the camera the holes would merge with the board... something like inranging (it fully becomes white). Then I came across adaptiveThreshold function
adaptiveThreshold(depth1,depth3,255,ADAPTIVE_THRESH_GAUSSIAN_C,THRESH_BINARY,7,-1.0);
With noise removal using erode, dilate, and gaussian blur; which detected the holes in a better manner as shown in the picture below. Then I used the cvCanny edge detector to get the edges but so far it has not been good as shown in the picture below.After this I tried out various feature detectors from SIFT, SURF, ORB, GoodFeaturesToTrack and found out that ORB gave the best times and the features detected. After this I tried to get the relative camera pose of a query image by finding its keypoints and matching those keypoints for good matches to be given to the findHomography function. The results are as shown below as in the diagram:
In the end i want to get the relative camera pose between the two images and move the robot to that position using the rotational and translational vectors got from the solvePnP function.
So is there any other method by which I could improve the quality of the
holes detected for the keypoints detection and matching?
I had also tried contour detection and approxPolyDP but the approximated shapes are not really good:
I have tried tweaking the input parameters for the threshold and canny functions but
this is the best I can get
Also ,is my approach to get the camera pose correct?
UPDATE : No matter what I tried I could not get good repeatable features to map. Then I read online that a depth image is cheap in resolution and its only used for stuff like masking and getting the distances. So , it hit me that the features are not proper because of the low resolution image with its messy edges. So I thought of detecting features on a RGB image and using the depth image to get only the distances of those features. The quality of features I got were literally off the charts.It even detected the screws on the board!! Here are the keypoints detected using GoodFeaturesToTrack keypoint detection..
I met an another hurdle while getting the distancewith the distances of the points not coming out properly. I searched for possible causes and it occured to me after quite a while that there was a offset in the RGB and depth images because of the offset between the cameras.You can see this from the first two images. I then searched the net on how to compensate this offset but could not find a working solution.
If anyone one of you could help me in compensate the offset,it would be great!
UPDATE: I could not make good use of the goodFeaturesToTrack function. The function gives the corners in Point2f type .If you want to compute the descriptors we need the keypoints and converting Point2f to Keypoint with the code snippet below leads to the loss of scale and rotational invariance.
for( size_t i = 0; i < corners1.size(); i++ )
{
keypoints_1.push_back(KeyPoint(corners1[i], 1.f));
}
The hideous result from the feature matching is shown below .
I have to start on different feature matchings now.I'll post further updates. It would be really helpful if anyone could help in removing the offset problem.
Compensating the difference between image output and the world coordinates:
You should use good old camera calibration approach for calibrating the camera response and possibly generating a correction matrix for the camera output (in order to convert them into real scales).
It's not that complicated once you have printed out a checkerboard template and capture various shots. (For this application you don't need to worry about rotation invariance. Just calibrate the world view with the image array.)
You can find more information here: http://www.vision.caltech.edu/bouguetj/calib_doc/htmls/own_calib.html
--
Now since I can't seem to comment on the question, I'd like to ask if your specific application requires the machine to "find out" the shape of the hole on the fly. If there are finite amount of hole shapes, you may then model them mathematically and look for the pixels that support the predefined models on the B/W edge image.
Such as (x)^2+(y)^2-r^2=0 for a circle with radius r, whereas x and y are the pixel coordinates.
That being said, I believe more clarification is needed regarding the requirements of the application (shape detection).
If you're going to detect specific shapes such as the ones in your provided image, then you're better off using a classifer. Delve into Haar classifiers, or better still, look into Bag of Words.
Using BoW, you'll need to train a bunch of datasets, consisting of positive and negative samples. Positive samples will contain N unique samples of each shape you want to detect. It's better if N would be > 10, best if >100 and highly variant and unique, for good robust classifier training.
Negative samples would (obviously), contain stuff that do not represent your shapes in any way. It's just for checking the accuracy of the classifier.
Also, once you have your classifier trained, you could distribute your classifier data (say, suppose you use SVM).
Here are some links to get you started with Bag of Words:
https://gilscvblog.wordpress.com/2013/08/23/bag-of-words-models-for-visual-categorization/
Sample code:
http://answers.opencv.org/question/43237/pyopencv_from-and-pyopencv_to-for-keypoint-class/

Rectangle detection in image

I'd like to program a detection of a rectangular sheet of paper which doesn't absolutely need to be perfectly straight on each side as I may take a picture of it "in the air" which means the single sides of the paper might get distorted a bit.
The app (iOs and android) CamScanner does this very very good and Im wondering how this might be implemented. First of all I thought of doing:
smoothing / noise reduction
Edge detection (canny etc) OR thresholding (global / adaptive)
Hough Transformation
Detecting lines (only vertically / horizontally allowed)
Calculate the intercept point of 4 found lines
But this gives me much problems with different types of images.
And I'm wondering if there's maybe a better approach in directly detecting a rectangular-like shape in an image and if so, if maybe camscanner does implement it like this as well!?
Here are some images taken in CamScanner.
These ones are detected quite nicely even though in a) the side is distorted (but the corner still gets shown in the overlay but doesnt really fit the corner of the white paper) and in b) the background is pretty close to the actual paper but it still gets recognized correctly:
It even gets the rotated pictures correctly:
And when Im inserting some testing errors, it fails but at least detects some of the contour, but always try to detect it as a rectangle:
And here it fails completely:
I suppose in the last three examples, if it would do hough transformation, it could have detected at least two of the four sides of the rectangle.
Any ideas and tips?
Thanks a lot in advance
OpenCV framework may help your problem. Also, you can look to this document for the Android platform.
The full source code is available on Github.

simple case of optical flow

General: I'm hoping that the use-case I'm about to describe is a simple case of an optical flow problem and since I don't have much knowledge on the subject, I was wondering if anyone has any suggestions on how I can approach solving my problem.
Research I've already done: I have began reading the High Accuracy Optical Flow Estimation Based on a Theory for Warping paper and am planning on looking over the Particle Video paper. I have found a MATLAB High Accuracy Optical Flow implementation of optical flow. However, the papers (and the code) seem to describe concepts that are very involved and may require a lot of time for me to dig in and understand. I am hoping that the solution to my problem may be more simple.
Problem: I have a sequence of images. The images depict a material breakage process, where the material and background are black and the cracks are white. I am interested in traversing the sequence of images in reverse in an attempt to map all of the cracks that have formed in the breakage process to the first black image. You can think of the material as a large puzzle and I am trying to put the pieces back together in the reverse order that they broke.
In each image, there can be some cracks that are just emerging and/or some cracks that have been fully formed (and thus created a fragment). Throughout the breakage process, some fragments may separate and break further. The fragments can also move farther away from one another (the change is slight between subsequent frames).
Desired Output: All of the cracks/lines in the sequence mapped to the first image in the sequence.
Additional Notes: Images are available in grayscale format (i.e. original) as well as in binary format, where the cracks have been outlined in white and the background is completely black. See below for some image examples.
The top row shows the original images and the bottom row shows the binary images. As you can see, the crack that goes down the middle grows wider and wider as the image sequence progresses. Thus, the bottom crack moves together with the lower fragment. When traversing the sequence in reverse, I hope to algorithmically realize that the middle crack comes together as one (and map it correctly to the first image), and also map the bottom crack correctly, keeping its correct correspondence (size and position) with the bottom fragment.
A sequence typically contains about 30~40 images, so I've just shown the beginning subset. Also, although these images don't show it, it is possible that a particular image only contains the beginning of the crack (i.e. its initial appearance) and in subsequent images it gets longer and longer and may join with other cracks.
Language: Although not necessary, I would like to implement the solution using MATLAB (just because most of the other code that relates to the project has been done in MATLAB). However, if OpenCV may be easier, I am flexible in my language/library usage.
Any ideas are greatly appreciated.
Traverse forward rather than reverse, and don't use optical flow. Use the fracture lines to segment the black parts, track the centroid of each black segment over time. Whenever a new fracture line appears that cuts across a black segment, split the segment into two and continue tracking each segment separately.
From this you should be able to construct a tree structure representing the segmentation of the black parts over time. The fracture lines can be added as metadata to this tree, perhaps assigning fracture lines to the segment node in which they first appeared.
I would advise you to follow your initial idea of backtracking the cracks. Yo kind of know how the cracks look like so you can track all the points that belong to the crack. You just track all the white points with an optical flow tracker, start with Lukas-Kanade tracker and see where you get. The high-accuracy optical flow method is a global one and more general, I'll track all the pixels in the image trying to keep some smoothness everywhere. The LK is a local method that will just use a small window around each point to do the tracking. The problem is that appart from the cracks all the pixels are plain black so nothing to track there, you'll just waist time trying to track something that you can't track and you don't need to track.
If lines are very straight you might end up with what's called the aperture problem and you'll get inaccurate results. You can also try some shape fitting/deformation based on snakes.
I agree to damian. Most optical flow methods like the HAOF rely on the first-order taylor approximation of the intensity constancy constrian equation I(x,t)=I(x+v,t+dt). That mean the solution depends on image derivatives where the gradient determine the motion vector magnitude and angle i.e. you need a certain amount of texture. However the very low texture of your non-binarised images could be enough. You could try histogram equalization to increase the contrast of your input data but it is important to apply the same transformation for both input images. e.g. as follows:
cv::Mat equalizeMat(grayInp1.rows, grayInp1.cols * 2 , CV_8UC1);
grayInp1.copyTo(equalizeMat(cv::Rect(0,0,grayInp1.cols,grayInp1.rows)));
grayInp2.copyTo(equalizeMat(cv::Rect(grayInp1.cols,0,grayInp2.cols,grayInp2.rows)));
cv::equalizeHist(equalizeMat,equalizeMat);
equalizeMat(cv::Rect(0,0,grayInp1.cols,grayInp1.rows)).copyTo(grayInp1);
equalizeMat(cv::Rect(grayInp1.cols,0,grayInp2.cols,grayInp2.rows)).copyTo(grayInp2);
// estimate optical flow

matching jigsaw puzzle pieces

I have nothing useful to do and was playing with jigsaw puzzle like this:
alt text http://manual.gimp.org/nl/images/filters/examples/render-taj-jigsaw.jpg
and I was wondering if it'd be possible to make a program that assists me in putting it together.
Imagine that I have a small puzzle, like 4x3 pieces, but the little tabs and blanks are non-uniform - different pieces have these tabs in different height, of different shape, of different size. What I'd do is to take pictures of all of these pieces, let a program analyze them and store their attributes somewhere. Then, when I pick up a piece, I could ask the program to tell me which pieces should be its 'neighbours' - or if I have to fill in a blank, it'd tell me how does the wanted puzzle piece(s) look.
Unfortunately I've never did anything with image processing and pattern recognition, so I'd like to ask you for some pointers - how do I recognize a jigsaw piece (basically a square with tabs and holes) in a picture?
Then I'd probably need to rotate it so it's in the right position, scale to some proportion and then measure tab/blank on each side, and also each side's slope, if present.
I know that it would be too time consuming to scan/photograph 1000 pieces of puzzle and use it, this would be just a pet project where I'd learn something new.
Data acquisition
(This is known as Chroma Key, Blue Screen or Background Color method)
Find a well-lit room, with the least lighting variation across the room.
Find a color (hue) that is rarely used in the entire puzzle / picture.
Get a color paper that has that exactly same color.
Place as many puzzle pieces on the color paper as it'll fit.
You can categorize the puzzles into batches and use it as a computer hint later on.
Make sure the pieces do not overlap or touch each other.
Do not worry about orientation yet.
Take picture and download to computer.
Color calibration may be needed because the Chroma Key background may have upset the built-in color balance of the digital camera.
Acquisition data processing
Get some computer vision software
OpenCV, MATLAB, C++, Java, Python Imaging Library, etc.
Perform connected-component on the chroma key color on the image.
Ask for the contours of the holes of the connected component, which are the puzzle pieces.
Fix errors in the detected list.
Choose the indexing vocabulary (cf. Ira Baxter's post) and measure the pieces.
If the pieces are rectangular, find the corners first.
If the pieces are silghtly-off quadrilateral, the side lengths (measured corner to corner) is also a valuable signature.
Search for "Shape Context" on SO or Google or here.
Finally, get the color histogram of the piece, so that you can query pieces by color later.
To make them searchable, put them in a database, so that you can query pieces with any combinations of indexing vocabulary.
A step back to the problem itself. The problem of building a puzzle can be easy (P) or hard (NP), depending of whether the pieces fit only one neighbour, or many. If there is only one fit for each edge, then you just find, for each piece/side its neighbour and you're done (O(#pieces*#sides)). If some pieces allow multiple fits into different neighbours, then, in order to complete the whole puzzle, you may need backtracking (because you made a wrong choice and you get stuck).
However, the first problem to solve is how to represent pieces. If you want to represent arbitrary shapes, then you can probably use transparency or masks to represent which areas of a tile are actually part of the piece. If you use square shapes then the problem may be easier. In the latter case, you can consider the last row of pixels on each side of the square and match it with the most similar row of pixels that you find across all other pieces.
You can use the second approach to actually help you solve a real puzzle, despite the fact that you use square tiles. Real puzzles are normally built upon a NxM grid of pieces. When scanning the image from the box, you split it into the same NxM grid of square tiles, and get the system to solve that. The problem is then to visually map the actual squiggly piece that you hold in your hand with a tile inside the system (when they are small and uniformly coloured). But you get the same problem if you represent arbitrary shapes internally.

Resources