For a hobby project I'm attempting to align photo's and create 3D pictures. I basically have 2 camera's on a rig, that I use to make pictures. Automatically I attempt to align the images in such a way that you get a 3D SBS image.
They are high resolution images, which means a lot of pixels to process. Because I'm not really patient with computers, I want things to go fast.
Originally I've worked with code based on image stitching and feature extraction. In practice I found these algorithms to be too inaccurate and too slow. The main reason is that you have different levels of depth here, so you cannot do a 1-on-1 match of features. Most of the code already works fine, including vertical alignment.
For this question, you can assume that different ISO exposion levels / color correction and vertical alignment of the images are both taken care of.
What is still missing is a good algorithm for correcting the angle of the pictures. I noticed that left-right pictures usually vary a small number of degrees (think +/- 1.2 degrees difference) in angle, which is enough to get a slight headache. As a human you can easily spot this by looking at sharp differences in color and lining them up.
The irony here is that you spot it immediately as a human if it's correct or not, but somehow I'm not able to learn this to a machine. :-)
I've experimented with edge detectors, Hough transform and a large variety of home-brew algorithms, but so far found all of them to be both too slow and too inaccurate for my purposes. I've also attempted to iteratively aligning vertically while changing the angles slightly, so far without any luck.
Please note: Accuracy is perhaps more important than speed here.
I've added an example image here. It's actually both a left and right eye, alpha-blended. If you look closely, you can see the lamb at the top having two ellipses, and you can see how the chairs don't exactly line up at the top. It might seem negliable, but on a full screen resolution while using a beamer, you will easily see the difference. This also shows the level of accuracy that is required; it's quite a lot.
The shift in 'x' direction will give the 3D effect. Basically, if the shift is 0, it's on the screen, if it's <0 it's behind the screen and if it's >0 it's in front of the screen. This also makes matching harder, since you're not looking for a 'stitch'.
Basically the two camera's 'look' in the same direction (perpendicular as in the second picture here: http://www.triplespark.net/render/stereo/create.html ).
The difference originates from the camera being on a slightly different angle. This means the rotation is uniform throughout the picture.
I have once used the following amateur approach.
Assume that the second image has a rotation + vertical shift mismatch. This means that we need to apply some transform for the second image which can be expressed in matrix form as
x' = a*x + b*y + c
y' = d*x + e*y + f
that is, every pixel that has coordinates (x,y) on the second image, should be moved to a position (x',y') to compensate for this rotation and vertical shift.
We have a strict requirement that a=e, b=-d and d*d+e*e=1 so that it is indeed rotation+shift, no zoom or slanting etc. Also this notation allows for horizontal shift too, but this is easy to fix after angle+vertical shift correction.
Now select several common features on both images (I did selection by hand, as just 5-10 seemed enough, you can try to apply some automatic feature detection mechanism). Assume i-th feature has coordinates (x1[i], y1[i]) on first image and (x2[i], y2[i]) on the second. We expect that after out transformation the features have as equal as possible y-coordinates, that is we want (ideally)
y1[i]=y2'[i]=d*x2[i]+e*y2[i]+f
Having enough (>=3) features, we can determine d, e and f from this requirement. In fact, if you have more than 3 features, you will most probably not be able to find common d, e and f for them, but you can apply least-square method to find d, e and f that make y2' as close to y1 as possible. You can also account for the requirement that d*d+e*e=1 while finding d, e and f, though as far as i remember, I got acceptable results even not accounting for this.
After you have determined d, e and f, you have the requirement a=e and b=-d. This leaves only c unknown, which is horizontal shift. If you know what the horizontal shift should be, you can find c from there. I used the background (clouds on a landscape, for example) to get c.
When you know all the parameters, you can do one pass on the image and correct it. You might also want to apply some anti-aliasing, but that's a different question.
Note also that you can in a similar way introduce quadratic correction to the formulas to account for additional distortions the camera usually has.
However, that's just a simple algorithm I came up with when I faced the same problem some time ago. I did not do much research, so I'll be glad to know if there is a better or well-established approach or even a ready software.
Related
I have four cameras each feeding me a different portion of a basketball court. Due to the slight offset of the cameras physical locations and lens distortion around the edges of the camera, I cannot simply stitch the videos together without some kind of correction.
I've looked into ffmpeg's perspective filter, as well as the lenscorrection filter. In the former case it was only able to create a trapezoid, not the curved image I want. In the latter case using negative values to k1 and k2 seemed to be heading in the right direction, but it either disorted the top and bottom of the image to the point of being nonsensical noise, or it zoomed in to the image so much that I lost important details.
For the sample picture below, ultimately I want the midcourt line (the blue vertical line on the right side) to be vertical, and I want the mess of wires on the white desk at the bottom to remain visible and identifiable.
Given a video which looks like the following:
I wish to produce something like the following:
This image was made using the "Curve Bend" filter in GIMP, but I just eye-balled it - so it's not perfect. Ideally once I get the exact parameters the midcourt line will be perfectly vertical
When using the lenscorrection filter, no values for k1 and k2 seemed to get the effect I want:
Negative k1, negative k2:
Negative k1, positive k2:
Positive k1, negative k2:
Positive k1, positive k2:
In general:
negative / negative distorted the image beyond recognition
negative / positive looked alright, but the midcourt line was off the screen and it wasn't clear if any distortion had been applied
positive / negative looked the best, but while the top and bottom curved in the middle of the left and right actually bulged out, leaving the midcourt line distorted
positive / positive was the opposite of the desired effect
I wrote a post about this subject. Strangely enough I was also trying to undistort video of a basketball court.
There's a few options:
Try to find a standard projection (e.g. fisheye, stereographic, etc) that roughly matches the projection that your camera produces, and look up or measure the field of view of your camera. At that point you can use the new v360 filter to correct the image to a rectilinear projection (which is the one where straight lines in real life remain straight in the image).
Either find a database entry in lensfun for your camera, or create one (there are instructions in their documentation), or send pictures and ask the maintainers to do it. Then you can use the lensfun filter to accurately correct the distortion.
Lensfun is probably the best option if you want it to be accurate, but depending on your camera you might find v360 produces good enough results, and its significantly faster.
Short answer: No. FFMPEG does not have a curve bend function. That said, curve bend is not the proper solution, anyway. A lens correction is necessary, the parameters supplied were just way off.
Ultimately I just wrote a script to dump thousands of images using lensfun with different lenses, then skimmed them for one that looked good
I have a problem finding defects at the edge of a circular object. It's hard to describe so I have a picture which may help a bit. I am trying to find the red marked areas, such as what is shown below:
I already tried matching with templates vision.TemplateMatcher(), but this only works well for the picture I made the template of.
I tried to match it with vision.CascadeObjectDetector() and I trained it with 150 images. I found only < 5% correct results with this.
I also tried matching with detectSURFFeatures() and then matchFeatures(), but this only works on quite similar defects (when the edges are not closed it fails).
Since the defects are close to the half of a circle, I tried to find it with imfindcircles(), but there I find so many possible results. When I take the one with the highest metric sometimes I get the right one but not even close to 30%.
Do any of you have an idea what I can try to find at least more than 50%?
If someone has an idea and wants to try something I added another picture.
Since I am new I can only add two pictures but if you need more I can provide more pictures.
Are you going to detect rough edges like that on smooth binary overlays as you provided before? For eg. are you making a program whose input consists of getting a black image with lots of circles with rough edges which its then supposed to detect? i.e. sudden rough discontinuities in a normally very smooth region.
If the above position is valid, then this may be solved via classical signal processing. My opinion, plot a graph of the intensity on a line between any two points outside and inside the circle. It should look like
.. continuous constant ... continuous constant .. continuous constant.. DISCONTINUOUS VARYING!! DISCONTINUOUS VARYING!! DISCONTINUOUS VARYING!! ... continuous constant .. continuous constant..
Write your own function to detect these discontinuities.
OR
Gradient: The rate of change of certain quantities w.r.t a distance measure.
Use the very famous Sobel (gradient) filter.
Use the X axis version of the filter, See result, if gives you something detectable use it, do same for Y axis version of filter.
In case you're wondering, if you're using Matlab then you just need to get a readily available and highly mentioned 3x3 matrix (seen almost everywhere on the internet ) and plug it into the imfilter function, or use the in-built implementation (edge(image,'sobel')) (if you have the required toolbox).
I am trying to count the number of hairs transplanted in the following image. So practically, I have to count the number of spots I can find in the center of image.
(I've uploaded the inverted image of a bald scalp on which new hairs have been transplanted because the original image is bloody and absolutely disgusting! To see the original non-inverted image click here. To see the larger version of the inverted image just click on it). Is there any known image processing algorithm to detect these spots? I've found out that the Circle Hough Transform algorithm can be used to find circles in an image, I'm not sure if it's the best algorithm that can be applied to find the small spots in the following image though.
P.S. According to one of the answers, I tried to extract the spots using ImageJ, but the outcome was not satisfactory enough:
I opened the original non-inverted image (Warning! it's bloody and disgusting to see!).
Splited the channels (Image > Color > Split Channels). And selected the blue channel to continue with.
Applied Closing filter (Plugins > Fast Morphology > Morphological Filters) with these values: Operation: Closing, Element: Square, Radius: 2px
Applied White Top Hat filter (Plugins > Fast Morphology > Morphological Filters) with these values: Operation: White Top Hat, Element: Square, Radius: 17px
However I don't know what to do exactly after this step to count the transplanted spots as accurately as possible. I tried to use (Process > Find Maxima), but the result does not seem accurate enough to me (with these settings: Noise tolerance: 10, Output: Single Points, Excluding Edge Maxima, Light Background):
As you can see, some white spots have been ignored and some white areas which are not actually hair transplant spots, have been marked.
What set of filters do you advise to accurately find the spots? Using ImageJ seems a good option since it provides most of the filters we need. Feel free however, to advise what to do using other tools, libraries (like OpenCV), etc. Any help would be highly appreciated!
I do think you are trying to solve the problem in a bit wrong way. It might sound groundless, so I'd better show my results first.
Below I have a crop of you image on the left and discovered transplants on the right. Green color is used to highlight areas with more than one transplant.
The overall approach is very basic (will describe it later), but still it provides close to be accurate results. Please note, it was a first try, so there is a lot of room for enhancements.
Anyway, let's get back to the initial statement saying you approach is wrong. There are several major issues:
the quality of your image is awful
you say you want to find spots, but actually you are looking for hair transplant objects
you completely ignores the fact average head is far from being flat
it does look like you think filters will add some important details to your initial image
you expect algorithms to do magic for you
Let's review all these items one by one.
1. Image quality
It might be very obvious statement, but before the actual processing you need to make sure you have best possible initial data. You might spend weeks trying to find a way to process photos you have without any significant achievements. Here are some problematic areas:
I bet it is hard for you to "read" those crops, despite the fact you have the most advanced object recognition algorithms in your brain.
Also, your time is expensive and you still need best possible accuracy and stability. So, for any reasonable price try to get: proper contrast, sharp edges, better colors and color separation.
2. Better understanding of the objects to be identified
Generally speaking, you have a 3D objects to be identified. So you can analyze shadows in order to improve accuracy. BTW, it is almost like a Mars surface analysis :)
3. The form of the head should not be ignored
Because of the form of the head you have distortions. Again, in order to get proper accuracy those distortions should be corrected before the actual analysis. Basically, you need to flatten analyzed area.
3D model source
4. Filters might not help
Filters do not add information, but they can easily remove some important details. You've mentioned Hough transform, so here is interesting question: Find lines in shape
I will use this question as an example. Basically, you need to extract a geometry from a given picture. Lines in shape looks a bit complex, so you might decide to use skeletonization
All of a sadden, you have more complex geometry to deal with and virtually no chances to understand what actually was on the original picture.
5. Sorry, no magic here
Please be aware of the following:
You must try to get better data in order to achieve better accuracy and stability. The model itself is also very important.
Results explained
As I said, my approach is very simple: image was posterized and then I used very basic algorithm to identify areas with a specific color.
Posterization can be done in a more clever way, areas detection can be improved, etc. For this PoC I just have a simple rule to highlight areas with more than one implant. Having areas identified a bit more advanced analysis can be performed.
Anyway, better image quality will let you use even simple method and get proper results.
Finally
How did the clinic manage to get Yondu as client? :)
Update (tools and techniques)
Posterization - GIMP (default settings,min colors)
Transplant identification and visualization - Java program, no libraries or other dependencies
Having areas identified it is easy to find average size, then compare to other areas and mark significantly bigger areas as multiple transplants.
Basically, everything is done "by hand". Horizontal and vertical scan, intersections give areas. Vertical lines are sorted and used to restore the actual shape. Solution is homegrown, code is a bit ugly, so do not want to share it, sorry.
The idea is pretty obvious and well explained (at least I think so). Here is an additional example with different scan step used:
Yet another update
A small piece of code, developed to verify a very basic idea, evolved a bit, so now it can handle 4K video segmentation in real-time. The idea is the same: horizontal and vertical scans, areas defined by intersected lines, etc. Still no external libraries, just a lot of fun and a bit more optimized code.
Additional examples can be found on YouTube: RobotsCanSee
or follow the progress in Telegram: RobotsCanSee
I've just tested this solution using ImageJ, and it gave good preliminary result:
On the original image, for each channel
Small (radius 1 or 2) closing in order to get rid of the hairs (black part in the middle of the white one)
White top-hat of radius 5 in order to detect the white part around each black hair.
Small closing/opening in order to clean a little bit the image (you can also use a median filter)
Ultimate erode in order to count the number of white blob remaining. You can also certainly use a LoG (Laplacian of Gaussian) or a distance map.
[EDIT]
You don't detect all the white spots using the maxima function, because after the closing, some zones are flat, so the maxima is not a point, but a zone. At this point, I think that an ultimate opening or an ultimate eroded would give you the center or each white spot. But I am not sure that there is a function/pluggin doing it in ImageJ. You can take a look to Mamba or SMIL.
A H-maxima (after white top-hat) may also clean a little bit more your results and improve the contrast between the white spots.
As Renat mentioned, you should not expect algorithms to do magic for you, however I'm hopeful to come up with a reasonable estimate of the number of spots. Here, I'm going to give you some hints and resources, check them out and call me back if you need more information.
First, I'm kind of hopeful to morphological operations, but I think a perfect pre-processing step may push the accuracy yielded by them dramatically. I want you put my finger on the pre-processing step. Thus I'm going ti work with this image:
That's the idea:
Collect and concentrate the mass around the spot locations. What do I mean my concentrating the masses? Let's open the book from the other side: As you see, the provided image contains some salient spots surrounded by some noisy gray-level dots.
By dots, I mean the pixels that are not part of a spot, but their gray-value are larger than zero (pure black) - which are available around the spots. It is clear that if you clear these noisy dots, you surely will come up with a good estimate of spots using other processing tools such as morphological operations.
Now, how to make the image more sharp? What if we could make the dots to move forward to their nearest spots? This is what I mean by concentrating the masses over the spots. Doing so, only the prominent spots will be present in the image and hence we have made a significant step toward counting the prominent spots.
How to do the concentrating thing? Well, the idea that I just explained is available in this paper, which its code is luckily available. See the section 2.2. The main idea is to use a random walker to walk on the image for ever. The formulations is stated such that the walker will visit the prominent spots far more times and that can lead to identifying the prominent spots. The algorithm is modeled Markov chain and The equilibrium hitting times of the ergodic Markov chain holds the key for identifying the most salient spots.
What I described above is just a hint and you should read that short paper to get the detailed version of the idea. Let me know if you need more info or resources.
That is a pleasure to think on such interesting problems. Hope it helps.
You could do the following:
Threshold the image using cv::threshold
Find connected components using cv::findcontour
Reject the connected components of size larger than a certain size as you seem to be concerned about small circular regions only.
Count all the valid connected components.
Hopefully, you have a descent approximation of the actual number of spots.
To be statistically more accurate, you could repeat 1-4 for a range of thresholds and take the average.
This is what you get after applying unsharpen radius 22, amount 5, threshold 2 to your image.
This increases the contrast between the dots and the surrounding areas. I used the ballpark assumption that the dots are somewhere between 18 and 25 pixels in diameter.
Now you can take the local maxima of white as a "dot" and fill it in with a black circle until the circular neighborhood of the dot (a circle of radius 10-12) erases the dot. This should let you "pick off" the dots joined to each other in clusters more than 2. Then look for local maxima again. Rinse and repeat.
The actual "dot" areas are in stark contrast to the surrounding areas, so this should let you pick them off as well as you would by eyeballing it.
I am thinking of implement a image processing based solution for industrial problem.
The image is consists of a Red rectangle. Inside that I will see a matrix of circles. The requirement is to count the number of circles under following constraints. (Real application : Count the number of bottles in a bottle casing. Any missing bottles???)
The time taken for the operation should be very low.
I need to detect the red rectangle as well. My objective is to count the
items in package and there are no
mechanism (sensors) to trigger the
camera. So camera will need to capture
the photos continuously but the
program should have a way to discard
the unnecessary images.
Processing should be realtime.
There may be a "noise" in image capturing. You may see ovals instead of circles.
My questions are as follows,
What is the best edge detection algorithm that matches with the given
scenario?
Are there any other mechanisms that I can use other than the edge
detection?
Is there a big impact between the language I use and the performance of
the system?
AHH - YOU HAVE NOW TOLD US THE BOTTLES ARE IN FIXED LOCATIONS!
IT IS AN INCREDIBLY EASIER PROBLEM.
All you have to do is look at each of the 12 spots and see if there is a black area there or not. Nothing could be easier.
You do not have to do any edge or shape detection AT ALL.
It's that easy.
You then pointed out that the box might be rotatated, things could be jiggled. That the box might be rotated a little (or even a lot, 0 to 360 each time) is very easily dealt with. The fact that the bottles are in "slots" (even if jiggled) massively changes the nature of the problem. You're main problem (which is easy) is waiting until each new red square (crate) is centered under the camera. I just realised you meant "matrix" literally and specifically in the sentence in your original questions. That changes everything totally, compared to finding a disordered jumble of circles. Finding whether or not a blob is "on" at one of 12 points, is a wildly different problem to "identifying circles in an image". Perhaps you could post an image to wrap up the question.
Finally I believe Kenny below has identified the best solution: blob analysis.
"Count the number of bottles in a bottle casing"...
Do the individual bottles sit in "slots"? ie, there are 4x3 = 12 holes, one for each bottle.
In other words, you "only" have to determine if there is, or is not, a bottle in each of the 12 holes.
Is that correct?
If so, your problem is incredibly easier than the more general problem of a pile of bottles "anywhere".
Quite simply, where do we see the bottles from? The top, sides, bottom, or? Do we always see the tops/bottoms, or are they mixed (ie, packed top-to-tail). These issues make huge, huge differences.
Surf/Sift = overkill in this case you certainly don't need it.
If you want real time speed (about 20fps+ on a 800x600 image) I recommend using Cuda to implement edge detection using a standard filter scheme like sobel, then implement binarization + image closure to make sure the edges of circles are not segmented apart.
The hardest part will be fitting circles. This is assuming you already got to the step where you have taken edges and made sure they are connected using image closure (morphology.) At this point I would proceed as follows:
run blob analysis/connected components to segment out circles that do not touch. If circles can touch the next step will be trickier
for each connected componet/blob fit a circle or rectangle using RANSAC which can run in realtime (as opposed to Hough Transform which I believe is very hard to run in real time.)
Step 2 will be much harder if you can not segment the connected components that form circles seperately, so some additional thought should be invested on how to guarantee that condition.
Good luck.
Edit
Having thought about it some more, I feel like RANSAC is ideal for the case where the circle connected components do touch. RANSAC should hypothetically fit the circle to only a part of the connected component (due to its ability to perform well in the case of mostly outlier points.) This means that you could add an extra check to see if the fitted circle encompasses the entire connected component and if it does not then rerun RANSAC on the portion of the connected component that was left out. Rinse and repeat as many times as necessary.
Also I realize that I say circle but you could just as easily fit an ellipse instead of circles using RANSAC.
Also, I'd like to comment that when I say CUDA is a good choice I mean CUDA is a good choice to implement the sobel filter + binirization + image closing on. Connected components and RANSAC are probably best left to the CPU, but you can try pushing them onto CUDA though I don't know how much of an advantage a GPU will give you for those 2 over a CPU.
For the circles, try the Hough transform.
other mechanisms: dunno
Compiled languages will possibly be faster.
SIFT should have a very good response to circular objects - it is patented, though. GLOHis a similar algorithm, but I do not know if there are any implementations readily available.
Actually, doing some more research, SURF is an improved version of SIFT with quite a few implementations available, check out the links on the wikipedia page.
Sum of colors + convex hull to detect boundary. You need, mostly, 4 corners of a rectangle, and not it's sides?
No motion, no second camera, a little choice - lot of math methods against a little input (color histograms, color distribution matrix). Dunno.
Java == high memory consumption, Lisp == high brain consumption, C++ == memory/cpu/speed/brain use optimum.
If the contrast is good, blob analysis is the algorithm for the job.
I'm trying to write a simple tracking routine to track some points on a movie.
Essentially I have a series of 100-frames-long movies, showing some bright spots on dark background.
I have ~100-150 spots per frame, and they move over the course of the movie. I would like to track them, so I'm looking for some efficient (but possibly not overkilling to implement) routine to do that.
A few more infos:
the spots are a few (es. 5x5) pixels in size
the movement are not big. A spot generally does not move more than 5-10 pixels from its original position. The movements are generally smooth.
the "shape" of these spots is generally fixed, they don't grow or shrink BUT they become less bright as the movie progresses.
the spots don't move in a particular direction. They can move right and then left and then right again
the user will select a region around each spot and then this region will be tracked, so I do not need to automatically find the points.
As the videos are b/w, I though I should rely on brigthness. For instance I thought I could move around the region and calculate the correlation of the region's area in the previous frame with that in the various positions in the next frame. I understand that this is a quite naïve solution, but do you think it may work? Does anyone know specific algorithms that do this? It doesn't need to be superfast, as long as it is accurate I'm happy.
Thank you
nico
Sounds like a job for Blob detection to me.
I would suggest the Pearson's product. Having a model (which could be any template image), you can measure the correlation of the template with any section of the frame.
The result is a probability factor which determine the correlation of the samples with the template one. It is especially applicable to 2D cases.
It has the advantage to be independent from the sample absolute value, since the result is dependent on the covariance related with the mean of the samples.
Once you detect an high probability, you can track the successive frames in the neightboor of the original position, and select the best correlation factor.
However, the size and the rotation of the template matter, but this is not the case as I can understand. You can customize the detection with any shape since the template image could represent any configuration.
Here is a single pass algorithm implementation , that I've used and works correctly.
This has got to be a well reasearched topic and I suspect there won't be any 100% accurate solution.
Some links which might be of use:
Learning patterns of activity using real-time tracking. A paper by two guys from MIT.
Kalman Filter. Especially the Computer Vision part.
Motion Tracker. A student project, which also has code and sample videos I believe.
Of course, this might be overkill for you, but hope it helps giving you other leads.
Simple is good. I'd start doing something like:
1) over a small rectangle, that surrounds a spot:
2) apply a weighted average of all the pixel coordinates in the area
3) call the averaged X and Y values the objects position
4) while scanning these pixels, do something to approximate the bounding box size
5) repeat next frame with a slightly enlarged bounding box so you don't clip spot that moves
The weight for the average should go to zero for pixels below some threshold. Number 4 can be as simple as tracking the min/max position of anything brighter than the same threshold.
This will of course have issues with spots that overlap or cross paths. But for some reason I keep thinking you're tracking stars with some unknown camera motion, in which case this should be fine.
I'm afraid that blob tracking is not simple, not if you want to do it well.
Start with blob detection as genpfault says.
Now you have spots on every frame and you need to link them up. If the blobs are moving independently, you can use some sort of correspondence algorithm to link them up. See for instance http://server.cs.ucf.edu/~vision/papers/01359751.pdf.
Now you may have collisions. You can use mixture of gaussians to try to separate them, give up and let the tracks cross, use any other before-and-after information to resolve the collisions (e.g. if A and B collide and A is brighter before and will be brighter after, you can keep track of A; if A and B move along predictable trajectories, you can use that also).
Or you can collaborate with a lab that does this sort of stuff all the time.