Image Recognition - Classifying an image in an image (i.e classify an object based on surrounding objects)? - image

I'm kind of new to this image classification stuff so this is a somewhat high-level question. I was wondering if it's possible to train an image classifier (i.e using just TF/Keras or one of the many image recognition libraries and APIs) to identify whether an object is in an object. For example:
Output: A square
Output: A circle
Output: A circle in a square
Output: A square in a circle in a square
Output: A square in a circle and a square in a square
...and so on
If it's possible, what's the best way to go about it? Do I have to train the model to recognize all the variations example by example (which is unfavorable as there are far too many potential examples), or is there some better way? Thanks :)

You can do it by using simpler computer vision techniques instead of going for machine learning.
For example, if you use OpenCV, it has an inbuilt function called findContours, which returns a hierarchy.
Example:
The matrix on top shows how each shape is related to other, according to -
[Next, Previous, First_Child, Parent]
For instance, contours 2 and 4 (circle and rectangle) are at the same level. Hence in the matrix, the next of the second row is 4. You can construct a tree like this to get the output as you desired. You just need to make sure that the inner and outer contours of single shape are not counted as two separate ones which I didn't do here so it shows 5,7 in the output.

Related

algorithm for finding closest images based on jitter / translation

I've got a series of images and in some of them the people are only slightly moved, or the camera was shifted slightly, but mostly all is still the same.
I'm wondering algorithmically how I could detect this and find and score images based on their closeness.
A simple euclidian distance might not work - imagine the case in where zebra stripes were shifted just enough to have the "old" white positions filled with black and vice versa. A pathological example, I know, but you get the idea.
As an optional tag along, perhaps there's a nice OpenCV or scipy (preference for Python) function for this or some of the pipeline for doing this.
Thanks!
You can calculate the difference between your images.
The higher the intensity values of the difference image, the more they are different.
So, if you have two exactly the same images and subtract them, there will be a "black" difference image.
You can simply use the overloaded operator-() of Mat-class.

Object Detection in an Image

I want to detect some elements in an Image.
For this goal, i get the image and the specified element (like a nose) and from Pixel(0,0) start to search for my element.
But the software performance is awful because i traverse the pixels one by one.
I think i need some smart algorithm for this problem.
And maybe the machine learning algorithm useful for this.
What's your idea?
I would start with viola jones object detection framework.
This is a supervised learning technique, that allows you to detect any kind of object with high provavility.
(even though the article mainly refers to faces, but it is designed for general objects..).
If you chose this approach - your main chore is going to be to obtain a classified training set. You can later evaluate how good your algorithm is using cross-validation.
AFAIK, it is implemented in OpenCV library (I am not familiar with the library to offer help)
You can do a very fast cross correlation using the Fourier transformation of your image and search pattern
A good implementation is for example OpenCV's matchTemplate function
This will work best if your pattern always has the same rotation and scale accross your image.
If it does not, you can repeat the search with several scaled/rotated versions of your pattern.
One advantage of this approach is that no training phase is required.
Another, simpler approach that would work in particular with your pattern is this:
Use connected component labeling to identify blobs with the right number of white pixels to be the center rectangle of your element. This will eliminate all but a few false positives. Concentrate your search on the remaining few spots.
Again OpenCV has a nice Blob library for that sort of stuff.
If you're looking for simple geometric shapes in computer-generated images like the example you provided, then you don't need to bother with machine learning.
For example, here's one of the components you're trying to find in the original image:
(Image removed by request)
Assuming this component is always drawn at the same dimensions, the top and bottom lines are always going to be 21 pixels apart. You can narrow down your search space considerably by combining this image with a copy of itself shifted vertically by 21 pixels, and taking the lighter of the two images as the pixel value at each position.
(Image removed by request)
Similarly, the vertical lines at the left and right of this component are 47 pixels apart, so we can repeat this process with a 47px horizontal shift. This results in a vertical bar about 24px tall at the position of the component.
(Image removed by request)
You can detect these bars quite easily by looking for runs of black pixels between 22 and 26 pixels long in the vertical columns of the processed image. This will provide you with a short list of candidate positions where you can check for the presence of this component more thoroughly, e.g. by calculating a local 2D cross correlation.
Here are the results after processing the whole image. Reaching this stage should only take a few milliseconds.
(Image removed by request)

Counting object on image algorithm

I got school task again. This time, my teacher gave me task to create algorithm to count how many ducks on picture.
The picture is similar to this one:
I think I should use pattern recognition for searching how many ducks on it. But I don't know which pattern match for each duck.
I think that you can solve this problem by segmenting the ducks' beaks and counting the number of connected components in the binary image.
To segment the ducks' beaks, first convert the image to HSV color space and then perform a binarization using the hue component. Note that the ducks' beaks hue are different from other parts of the image.
Here's one way:
Hough transform for circles:
Initialize an accumulator array indexed by (x,y,radius)
For each pixel:
calculate an edge (e.g. Sobel operator will provide both magnitude and direction), if magnitude exceeds some threshold then:
increment every accumulator for which this edge could possibly lend evidence (only the (x,y) in the direction of the edge, only radii between min_duck_radius and max_duck_radius)
Now smooth and threshold the accumulator array, and the coordinates of highest accumulators show you where the heads are. The threshold may leap out at you if you histogram the values in the accumulators (there may be a clear difference between "lots of evidence" and "noise").
So that's very terse, but it can get you started.
It might be just because I'm working with SIFT right now, but to me it looks like it could be good for your problem.
It is an algorithm that matches the same object on two different pictures, where the objects can have different orientations, scales and be viewed from different perspectives on the two pictures. It can also work when an object is partially hidden (as your ducks are) by another object.
I'd suggest finding a good clear picture of a rubber ducky ( :D ) and then use some SIFT implementation (VLFeat - C library with SIFT but no visualization, SIFT++ - based on VLFeat, but in C++ , Rob Hess in C with OpenCV...).
You should bear in mind that matching with SIFT (and anything else) is not perfect - so you might not get the exact number of rubber duckies in the picture.

Best approach for specific Object/Image Recognition task?

I'm searching for an certain object in my photograph:
Object: Outline of a rectangle with an X in the middle. It looks like a rectangular checkbox. That's all. So, no fill, just lines. The rectangle will have the same ratios of length to width but it could be any size or any rotation in the photograph.
I've looked a whole bunch of image recognition approaches. But I'm trying to determine the best for this specific task. Most importantly, the object is made of lines and is not a filled shape. Also, there is no perspective distortion, so the rectangular object will always have right angles in the photograph.
Any ideas? I'm hoping for something that I can implement fairly easily.
Thanks all.
You could try using a corner detector (e.g. Harris) to find the corners of the box, the ends and the intersection of the X. That simplifies the problem to finding points in the right configuration.
Edit (response to comment):
I'm assuming you can find the corner points in your image, the 4 corners of the rectangle, the 4 line endings of the X and the center of the X, plus a few other corners in the image due to noise or objects in the background. That simplifies the problem to finding a set of 9 points in the right configuration, out of a given set of points.
My first try would be to look at each corner point A. Then I'd iterate over the points B close to A. Now if I assume that (e.g.) A is the upper left corner of the rectangle and B is the lower right corner, I can easily calculate, where I would expect the other corner points to be in the image. I'd use some nearest-neighbor search (or a library like FLANN) to see if there are corners where I'd expect them. If I can find a set of points that matches these expected positions, I know where the symbol would be, if it is present in the image.
You have to try if that is good enough for your application. If you have too many false positives (sets of corners of other objects that accidentially form a rectangle + X), you could check if there are lines (i.e. high contrast in the right direction) where you would expect them. And you could check if there is low contrast where there are no lines in the pattern. This should be relatively straightforward once you know the points in the image that correspond to the corners/line endings in the object you're looking for.
I'd suggest the Generalized Hough Transform. It seems you have a fairly simple, fixed shape. The generalized Hough transform should be able to detect that shape at any rotation or scale in the image. You many need to threshold the original image, or pre-process it in some way for this method to be useful though.
You can use local features to identify the object in image. Feature detection wiki
For example, you can calculate features on some referent image which contains only the object you're looking for and save the results, let's say, to a plain text file. After that you can search for the object just by comparing newly calculated features (on images with some complex scenes containing the object) with the referent ones.
Here's some good resource on local features:
Local Invariant Feature Detectors: A Survey

Raytracing (LoS) on 3D hex-like tile maps

Greetings,
I'm working on a game project that uses a 3D variant of hexagonal tile maps. Tiles are actually cubes, not hexes, but are laid out just like hexes (because a square can be turned to a cube to extrapolate from 2D to 3D, but there is no 3D version of a hex). Rather than a verbose description, here goes an example of a 4x4x4 map:
(I have highlighted an arbitrary tile (green) and its adjacent tiles (yellow) to help describe how the whole thing is supposed to work; but the adjacency functions are not the issue, that's already solved.)
I have a struct type to represent tiles, and maps are represented as a 3D array of tiles (wrapped in a Map class to add some utility methods, but that's not very relevant).
Each tile is supposed to represent a perfectly cubic space, and they are all exactly the same size. Also, the offset between adjacent "rows" is exactly half the size of a tile.
That's enough context; my question is:
Given the coordinates of two points A and B, how can I generate a list of the tiles (or, rather, their coordinates) that a straight line between A and B would cross?
That would later be used for a variety of purposes, such as determining Line-of-sight, charge path legality, and so on.
BTW, this may be useful: my maps use the (0,0,0) as a reference position. The 'jagging' of the map can be defined as offsetting each tile ((y+z) mod 2) * tileSize/2.0 to the right from the position it'd have on a "sane" cartesian system. For the non-jagged rows, that yields 0; for rows where (y+z) mod 2 is 1, it yields 0.5 tiles.
I'm working on C#4 targeting the .Net Framework 4.0; but I don't really need specific code, just the algorithm to solve the weird geometric/mathematical problem. I have been trying for several days to solve this at no avail; and trying to draw the whole thing on paper to "visualize" it didn't help either :( .
Thanks in advance for any answer
Until one of the clever SOers turns up, here's my dumb solution. I'll explain it in 2D 'cos that makes it easier to explain, but it will generalise to 3D easily enough. I think any attempt to try to work this entirely in cell index space is doomed to failure (though I'll admit it's just what I think and I look forward to being proved wrong).
So you need to define a function to map from cartesian coordinates to cell indices. This is straightforward, if a little tricky. First, decide whether point(0,0) is the bottom left corner of cell(0,0) or the centre, or some other point. Since it makes the explanations easier, I'll go with bottom-left corner. Observe that any point(x,floor(y)==0) maps to cell(floor(x),0). Indeed, any point(x,even(floor(y))) maps to cell(floor(x),floor(y)).
Here, I invent the boolean function even which returns True if its argument is an even integer. I'll use odd next: any point point(x,odd(floor(y)) maps to cell(floor(x-0.5),floor(y)).
Now you have the basics of the recipe for determining lines-of-sight.
You will also need a function to map from cell(m,n) back to a point in cartesian space. That should be straightforward once you have decided where the origin lies.
Now, unless I've misplaced some brackets, I think you are on your way. You'll need to:
decide where in cell(0,0) you position point(0,0); and adjust the function accordingly;
decide where points along the cell boundaries fall; and
generalise this into 3 dimensions.
Depending on the size of the playing field you could store the cartesian coordinates of the cell boundaries in a lookup table (or other data structure), which would probably speed things up.
Perhaps you can avoid all the complex math if you look at your problem in another way:
I see that you only shift your blocks (alternating) along the first axis by half the blocksize. If you split up your blocks along this axis the above example will become (with shifts) an (9x4x4) simple cartesian coordinate system with regular stacked blocks. Now doing the raytracing becomes much more simple and less error prone.

Resources