I wanted to implement barcode for one of my mobile project requirements. The amount of data that is to be stored is very little (<25 alpha-numeric). I want to know if its wiser to implement a 1d barcode or a 2d barcode (Qr code particularly) for this project. I would be really glad if someone could educate me on the following aspects from a 1d vs 2d perspective:
scanning speed
size (minimum display size that is needed, for the mobile camera to recognize -- this is more crucual)
accuracy
Considered from a typical processing and SDK perspective (zxing preferably).
I'd go with a qr code, particularly if you're planning on using a phone camera. qr codes have features (finders) that make things like perspective correction easier/more reliable. They also have ECC that enables eliminating false positives and correcting various amounts of bit detection errors. If you look at the zxing test suite, you'll find a number of false positive 1D cases since many 1D codes don't have even a checksum.
Speed's probably not an issue for either case if you know what you're trying to scan. The biggest computational cost in zxing is going through all possible codes when you don't know what you're looking for. If you know the code type, it's not likely to be significantly different.
The only thing about size is the number of pixels that have to be captured. In other words, a small code can be read if you hold the camera close to the code. A large code can be read from further away. All this is subject to light conditions, camera focus (or lack there of), and camera brightness adjustment. I can't see how any of these would impact 1D vs 2D though.
Related
Without implementing openCV or calling QR code's recognition API, is there any quick and reliable algorithm to determine the existence of a QR code in an image?
The intention of this question is to improve the user experience of scanning QR code. When QR code's recognition fails, the program needs to know whether there really exists a QR code for it to scan and recognize QR code again or there is not any QR code so that the program can call other procedures.
To echo some response, the detection program doesn't need to be 100% accurate but returns an accurate result with reasonable probability. If we can use openCV here, Fourier Transformation will be easily implemented to detect whether there is an obvious high frequency in an image, which is a good sign of the existence of QR. But the integration of openCV will largely increase the size of my program, which I want to avoid.
It's great that you want to provide feedback to a user. Providing graphics that indicate the user is "getting warmer" in finding the QR code can make the process of finding and reading a code quicker and smoother.
It looks like you already have your answer, but to provide a more robust solution and/or have options, you might try one or more of the following:
Use N iterations to morph dark pixels closed, and the resulting squarish checkboard pattern should more closely resemble a filled square. This was part of a detection method I used to determine if a DataMatrix (a similar 2D code) was present whether it was readable or not. Whether this works will depend greatly on your background.
Before applying FFT, considering finding the affine transform to reduce perspective distortion. Analyzing FFT data can be a pain if the frequencies have a bit of spread because of foreshortening.
You could get some decent results using texture measures such as Local Binary Patterns (LBPs) or older techniques such as Law's Texture methods. You might even get lucky and be able to detect slight differences in the histogram of texture measures between a 2D code and a checkerboard pattern.
In regions of checkerboard-like patterns, look for the 3 guide features at the corners of the QR code. You could try SIFT/SURF-like methods, or perhaps implement a simpler match method by using a limited number of correlation templates that are tested in scale space.
Speaking of scale space: generate an image pyramid to save yourself the trouble of searching for squares in full-resolution images. You could try edge-preserving or non-edge-preserving methods to generate the smaller images in the pyramid, or perhaps a combination of both.
If you have code for fast kernel processing, you might try a corner detection method to reduce the amount of data you process to detect checkerboard-like patterns.
Look for clear bimodal distributions of grayscale values in squarish regions. 2D codes on paper labels tend to have stark contrast even though 2D codes on paper are quite readable at low contrast.
Rather than look for bimodal distribution of grayscale values, you could look for regions where gradient magnitudes are very consistent, nearly unimodal.
If you know the min/max area limits of a readable QR code, you could probabilistically sample the image for patches that match one or more of the above criteria: one mode of gradient magnitudes, nearly evenly space corner points, etc. If a patch does look promising, then jump to another random position with the caveat that the new patch was not previously found unpromising.
If you have the memory for an image pyramid, then working with reduced resolution images could be advantageous since you could try a number of tests fairly quickly.
As far as user interaction is concerned, you might also update the "this might be a QR code" graphic multiple times during pre-processing, and indicate degrees of confidence with progressively stronger/greener graphics (or whatever color is appropriate for the local culture). For example, if a patch of texture has a roughly 60% chance of being a QR code, you might display a thin yellowish-green rectangle with a dashed border. For an 80% - 90% likelihood you might display a solid rectangle of a more saturated green color. If you can update the graphics about every 100 - 200 milliseconds then a user will have some idea that some action such as moving the smart phone is helping or hurting.
1) convert the image into grayscale
2) divide the image into cells of n x m, say 3 x 3. This procedure intends to guarantee that at the least one cell will be fully covered by possible QR code if any
3) implement 2D Fourier Transformation for all the cells. If in any cell there is an significantly large value in high-frequency area in both X and Y axis, there is a high likelihood that there exists a QR code
I am addressing a probability issue rather than 100% accurate detection. In this algorithm, chessboard will be detected as QR code as well.
I have a web cam that takes a picture every N seconds. This gives me a collection of images of the same scene over time. I want to process that collection of images as they are created to identify events like someone entering into the frame, or something else large happening. I will be comparing images that are adjacent in time and fixed in space - the same scene at different moments of time.
I want a reasonably sophisticated approach. For example, naive approaches fail for outdoor applications. If you count the number of pixels that change, for example, or the percentage of the picture that has a different color or grayscale value, that will give false positive reports every time the sun goes behind a cloud or the wind shakes a tree.
I want to be able to positively detect a truck parking in the scene, for example, while ignoring lighting changes from sun/cloud transitions, etc.
I've done a number of searches, and found a few survey papers (Radke et al, for example) but nothing that actually gives algorithms that I can put into a program I can write.
Use color spectroanalisys, without luminance: when the Sun goes down for a while, you will get similar result, colors does not change (too much).
Don't go for big changes, but quick changes. If the luminance of the image changes -10% during 10 min, it means the usual evening effect. But when the change is -5%, 0, +5% within seconds, its a quick change.
Don't forget to adjust the reference values.
Split the image to smaller regions. Then, when all the regions change same way, you know, it's a global change, like an eclypse or what, but if only one region's parameters are changing, then something happens there.
Use masks to create smart regions. If you're watching a street, filter out the sky, the trees (blown by wind), etc. You may set up different trigger values for different regions. The regions should overlap.
A special case of the region is the line. A line (a narrow region) contains less and more homogeneous pixels than a flat area. Mark, say, a green fence, it's easy to detect wheter someone crosses it, it makes bigger change in the line than in a flat area.
If you can, change the IRL world. Repaint the fence to a strange color to create a color spectrum, which can be identified easier. Paint tags to the floor and wall, which can be OCRed by the program, so you can detect wheter something hides it.
I believe you are looking for Template Matching
Also i would suggest you to look on to Open CV
We had to contend with many of these issues in our interactive installations. It's tough to not get false positives without being able to control some of your environment (sounds like you will have some degree of control). In the end we looked at combining some techniques and we created an open piece of software named OpenTSPS (Open Toolkit for Sensing People in Spaces - http://www.opentsps.com). You can look at the C++ source in github (https://github.com/labatrockwell/openTSPS/).
We use ‘progressive background relearn’ to adjust to the changing background over time. Progressive relearning is particularly useful in variable lighting conditions – e.g. if lighting in a space changes from day to night. This in combination with blob detection works pretty well and the only way we have found to improve is to use 3D cameras like the kinect which cast out IR and measure it.
There are other algorithms that might be relevant, like SURF (http://achuwilson.wordpress.com/2011/08/05/object-detection-using-surf-in-opencv-part-1/ and http://en.wikipedia.org/wiki/SURF) but I don't think it will help in your situation unless you know exactly the type of thing you are looking for in the image.
Sounds like a fun project. Best of luck.
The problem you are trying to solve is very interesting indeed!
I think that you would need to attack it in parts:
As you already pointed out, a sudden change in illumination can be problematic. This is an indicator that you probably need to achieve some sort of illumination-invariant representation of the images you are trying to analyze.
There are plenty of techniques lying around, one I have found very useful for illumination invariance (applied to face recognition) is DoG filtering (Difference of Gaussians)
The idea is that you first convert the image to gray-scale. Then you generate two blurred versions of this image by applying a gaussian filter, one a little bit more blurry than the first one. (you could use a 1.0 sigma and a 2.0 sigma in a gaussian filter respectively) Then you subtract from the less-blury image, the pixel intensities of the more-blurry image. This operation enhances edges and produces a similar image regardless of strong illumination intensity variations. These steps can be very easily performed using OpenCV (as others have stated). This technique has been applied and documented here.
This paper adds an extra step involving contrast equalization, In my experience this is only needed if you want to obtain "visible" images from the DoG operation (pixel values tend to be very low after the DoG filter and are veiwed as black rectangles onscreen), and performing a histogram equalization is an acceptable substitution if you want to be able to see the effect of the DoG filter.
Once you have illumination-invariant images you could focus on the detection part. If your problem can afford having a static camera that can be trained for a certain amount of time, then you could use a strategy similar to alarm motion detectors. Most of them work with an average thermal image - basically they record the average temperature of the "pixels" of a room view, and trigger an alarm when the heat signature varies greatly from one "frame" to the next. Here you wouldn't be working with temperatures, but with average, light-normalized pixel values. This would allow you to build up with time which areas of the image tend to have movement (e.g. the leaves of a tree in a windy environment), and which areas are fairly stable in the image. Then you could trigger an alarm when a large number of pixles already flagged as stable have a strong variation from one frame to the next one.
If you can't afford training your camera view, then I would suggest you take a look at the TLD tracker of Zdenek Kalal. His research is focused on object tracking with a single frame as training. You could probably use the semistatic view of the camera (with no foreign objects present) as a starting point for the tracker and flag a detection when the TLD tracker (a grid of points where local motion flow is estimated using the Lucas-Kanade algorithm) fails to track a large amount of gridpoints from one frame to the next. This scenario would probably allow even a panning camera to work as the algorithm is very resilient to motion disturbances.
Hope this pointers are of some help. Good Luck and enjoy the journey! =D
Use one of the standard measures like Mean Squared Error, for eg. to find out the difference between two consecutive images. If the MSE is beyond a certain threshold, you know that there is some motion.
Also read about Motion Estimation.
if you know that the image will remain reletivly static I would reccomend:
1) look into neural networks. you can use them to learn what defines someone within the image or what is a non-something in the image.
2) look into motion detection algorithms, they are used all over the place.
3) is you camera capable of thermal imaging? if so it may be worthwile to look for hotspots in the images. There may be existing algorithms to turn your webcam into a thermal imager.
I'd like to know if there is any good (and freely available) text, on how to obtain motion vectors of macro blocks in raw video stream. This is often used in video compression, although my application of it is not video encoding.
Code that does this is available in OSS codecs, but understanding the method by reading the code is kinda hard.
My actual goal is to determine camera motion in 2D projection space, assuming the camera is only changing it's orientation (NOT the position). What I'd like to do is divide the frames into macro blocks, obtain their motion vectors, and get the camera motion by averaging those vectors.
I guess OpenCV could help with this problem, but it's not available on my target platform.
The usual way is simple brute force: Compare a macro block to each macro block from the reference frame and use the one that gives the smallest residual error. The code gets complex primarily because this is usually the slowest part of mv-based compression, so they put a lot of work into optimizing it, often at the expense of anything even approaching readability.
Especially for real-time compression, some reduce the workload a bit by (for example) restricting the search to the original position +/- some maximum delta. This can often gain quite a bit of compression speed in exchange for a fairly small loss of compression.
If you assume only camera motion, I suspect there is something possible with analysis of the FFT of successive images. For frequencies whose amplitudes have not changed much, the phase information will indicate the camera motion. Not sure if this will help with camera rotation, but lateral and vertical motion can probably be computed. There will be difficulties due to new information appearing on one edge and disappearing on the other and I'm not sure how much that will hurt. This is speculative thinking in response to your question, so I have no proof or references :-)
Sounds like you're doing a very limited SLAM project?
Lots of reading matter at Bristol University, Imperial College, Oxford University for example - you might find their approaches to finding and matching candidate features from frame to frame of interest - much more robust than simple sums of absolute differences.
For the most low-level algorithms of this type the term you are looking for is optical flow and one of the easiest algorithms of that class is the Lucas Kanade algorithm.
This is a pretty good overview presentation that should give you plenty of ideas for an algorithm that does what you need
After asking here and trying both SURF and SIFT, neither of them seams to be efficient enough to generate interest points fast enough to track a stream from the camera.
SURF, for example, takes around 3 seconds to generate interest points for an image, that's way too slow to track a video coming from a web cam, and it'll be even worse when using it on a mobile phone.
I just need an algorithm that tracks a certain area, its scale, tilt, etc.. and I can build on top of that.
Thanks
I suspect your SURF usage may need some alteration?
Here is a link to an MIT paper on using SURF for augmented reality applications on mobile devices.
Excerpt:
In this section, we present our
implementation of the SURF al- gorithm
and its adaptation to the mobile
phone. Next, we discuss the impact
that accuracy has on the speed of the
nearest-neighbor search and show that
we can achieve an order of magnitude
speed- up with minimal impact on
matching accuracy. Finally, we dis-
cuss the details of the phone
implementation of the image matching
pipeline. We study the performance,
memory use, and bandwidth consumption
on the phone.
You might also want to look into OpenCV's algorithms because they are tried and tested.
Depending on the constraints of your application, you may be able to reduce the genericness of those algorithms to look for known POIs and markers within the image.
Part of tracking a POI is estimating its vector from one point in the 2D image to another, and then optionally confirming that it still exists there (through pixel characteristics). The same approach can be used to track (not re-scan the entire image) for POI and POI group/object perspective and rotation changes.
There are tons of papers online for tracking objects on a 2D projection (up to a servere skew in many cases).
Good Luck!
You should try FAST detector
http://svr-www.eng.cam.ac.uk/~er258/work/fast.html
We are using SURF for a project and we found OpenSURF to outmatch OpenCV's SURF implementation in raw speed and performance. We still haven´t tested repeatability and accuracy, but it is way faster.
Update:
I just wanted to point out that you needn't perform a SURF match step in each frame, you could simply do it every other frame and interpolate the position of the object in the frame you don't execute SURF on.
You can use a simpler algorithm if you would make stricter restrictions on the area you would like to be tracked. As you surely know, ARToolKit is pretty fast, but only tracks black and white markers with a very distinct frame.
If you want a (somewhat) general purpose tracker, you may want to check PTAM. The site (http://www.robots.ox.ac.uk/~gk/PTAM/) is currently down, but here's a snazzy video of it working on an iPhone (http://www.youtube.com/watch?v=pBI5HwitBX4)
As others have mentioned, three seconds seems unusually long. While testing the SURF implementation in the Mahotas library, I found that it took on average 0.36sec, even with some fairly large images (e.g. 1024x768). And that's with a mix of Python and C, so I'd imagine some other pure-C implementations would be even faster.
I found this nice comparison of each feature detection algorithms at http://computer-vision-talks.com/2011/01/comparison-of-the-opencvs-feature-detection-algorithms-2/
Have a look. It might be useful!
According to that comparison, and as mirror2image has also suggested, FAST is the best choice. But it depends on what you really want to achieve.
One option I've used in constrained embedded systems is to use a simpler interest point detector: FAST or Shi-Tomasi for example. I used Shi-Tomasi, as I was targetting an FPGA and could easily run it at pixel rate with no significant buffering required.
Then use SURF to generate the descriptors for the image patch around the identified features and use those for matching and tracking purposes.
I'm sure many people have already seen demos of using genetic algorithms to generate an image that matches a sample image. You start off with noise, and gradually it comes to resemble the target image more and more closely, until you have a more-or-less exact duplicate.
All of the examples I've seen, however, use a fairly straightforward pixel-by-pixel comparison, resulting in a fairly predictable 'fade in' of the final image. What I'm looking for is something more novel: A fitness measure that comes closer to what we see as 'similar' than the naive approach.
I don't have a specific result in mind - I'm just looking for something more 'interesting' than the default. Suggestions?
I assume you're talking about something like Roger Alsing's program.
I implemented a version of this, so I'm also interested in alternative fitness functions, though I'm coming at it from the perspective of improving performance rather than aesthetics. I expect there will always be some element of "fade-in" due to the nature of the evolutionary process (though tweaking the evolutionary operators may affect how this looks).
A pixel-by-pixel comparison can be expensive for anything but small images. For example, the 200x200 pixel image I use has 40,000 pixels. With three values per pixel (R, G and B), that's 120,000 values that have to be incorporated into the fitness calculation for a single image. In my implementation I scale the image down before doing the comparison so that there are fewer pixels. The trade-off is slightly reduced accuracy of the evolved image.
In investigating alternative fitness functions I came across some suggestions to use the YUV colour space instead of RGB since this is more closely aligned with human perception.
Another idea that I had was to compare only a randomly selected sample of pixels. I'm not sure how well this would work without trying it. Since the pixels compared would be different for each evaluation it would have the effect of maintaining diversity within the population.
Beyond that, you are in the realms of computer vision. I expect that these techniques, which rely on feature extraction, would be more expensive per image, but they may be faster overall if they result in fewer generations being required to achieve an acceptable result. You might want to investigate the PerceptualDiff library. Also, this page shows some Java code that can be used to compare images for similarity based on features rather than pixels.
A fitness measure that comes closer to what we see as 'similar' than the naive approach.
Implementing such a measure in software is definitely nontrivial. Google 'Human vision model', 'perceptual error metric' for some starting points. You can sidestep the issue - just present the candidate images to a human for selecting the best ones, although it might be a bit boring for the human.
I haven't seen such a demo (perhaps you could link one). But a couple proto-ideas from your desription that may trigger an interesting one:
Three different algorithms running in parallel, perhaps RGB or HSV.
Move, rotate, or otherwise change the target image slightly during the run.
Fitness based on contrast/value differences between pixels, but without knowing the actual colour.
...then "prime" a single pixel with the correct colour?
I would agree with other contributors that this is non-trivial. I'd also add that it would be very valuable commercially - for example, companies who wish to protect their visual IP would be extremely happy to be able to trawl the internet looking for similar images to their logos.
My naïve approach to this would be to train a pattern recognizer on a number of images, each generated from the target image with one or more transforms applied to it: e.g. rotated a few degrees either way; a translation a few pixels either way; different scales of the same image; various blurs and effects (convolution masks are good here). I would also add some randomness noise to the each of the images. The more samples the better.
The training can all be done off-line, so shouldn't cause a problem with runtime performance.
Once you've got a pattern recognizer trained, you can point it at the the GA population images, and get some scalar score out of the recognizers.
Personally, I like Radial Basis Networks. Quick to train. I'd start with far too many inputs, and whittle them down with principle component analysis (IIRC). The outputs could just be a similiarity measure and dissimilarity measure.
One last thing; whatever approach you go for - could you blog about it, publish the demo, whatever; let us know how you got on.