identifying custom shapes in a 2D grid - algorithm

Dear stackoverflowers.
So, let's say there's a grid, whose values at certain x and y represent whether there is a tile there (1) or it's missing (0).
For example,
100110100010
100100111110
111110000000
000010000000
And there are some already known shapes A, B and C, for example,
(A) (B) (C)
1 1 1
111, 111, 11
So what I am trying to achieve is to identify which 1's on a grid belong to which shape.
All of 1's should be used up, exact number of shapes is known, but rotation (no mirroring) is allowed, so I guess it's better to add rotated versions and think, that some shapes won't be found on grid.
So, the expected result would be (it's known that it should be exactly 1xA, 2xB, 2xC):
A00CC0B000C0
A00C00BBBCC0
AABBB0000000
0000B0000000
If there are several possible matches, any would suit, as long as every tile gets allocated to it's own shape.
Moreover, finding out whether a tile is present or not ("uncovering") is an expensive operation (but results are cached, tiles don't appear out of nowhere), so I am actually seeking for a way to identify them with as minimum number of "uncoverings" as possible.
(It's okay if it's not optimum, just identifying shapes would be great).
Obviously, set of known shapes might change (but it will be known by the time of implementing and it will stay constant, so it's possible to tune up code for a particular set of tiles or develop some search strategies), but it won't be large (~5-6) and grid is quite small too (~15x15).
Thanks!

Using the ideas here and/or here (I guess using this one, the object types would be 0 and 1), one way to do it might be to try and match your own patterns against the catalog of collected objects. To take you own example,
100110100010
100100111110
111110000000
000010000000
Shapes A, B and C:
(A) (B) (C)
1 1 1 111 1 11
111 or 1 111 or 1 11 or 1
11
The first collected object might be,
1 11
1 1
11111
1
=> represented as a set of numbers: [(0,0),(0,1),(0,2),(1,2)..etc]
(the objects need not start or include (0,0) but object
bounds seem needed to calibrate the pattern matching)
Testing object A against the top left of the object would match [(0,0),(0,1),(0,2),(1,2)]. After A is matched, the program must find a way to calibrate the remaining points - the bottom right corner will effectively be measured as (2,3) rather than (4,3) - testing the bottom right of the remaining points in the object would match object B. Continue in a similar vein to match all, trying different combinations if a total match is not found.

Related

Speeding up a pre-computed seam carving algorithm

I'm writing a JS seam carving library. It works great, I can rescale a 1024x1024 image very cleanly in real time as fast as I can drag it around. It looks great! But in order to get that performance I need to pre-compute a lot of data and it takes about 10 seconds. I'm trying to remove this bottleneck and am looking for ideas here.
Seam carving works by removing the lowest energy "squiggly" line of pixels from an image. e.g. If you have a 10x4 image a horizontal seam might look like this:
........x.
.x.....x.x
x.xx..x...
....xx....
So if you resize it to 10x3 you remove all the 'X' pixels. The general idea is that the seams go around the things that look visually important to you, so instead of just normal scaling where everything gets squished, you're mostly removing things that look like whitespace, and the important elements in a picture are unaffected.
The process of calculating energy levels, removing them, and re-calculating is rather expensive, so I pre-compute it in node.js and generate a .seam file.
Each seam in the .seam file is basically: starting position, direction, direction, direction, direction, .... So for the above example you'd have:
starting position: 2
seam direction: -1 1 0 1 0 -1 -1 -1 1
This is quite compact and allows me to generate .seam files in ~60-120kb for a 1024x1024 image depending on settings.
Now, in order to get fast rendering I generate a 2D grids that represents the order in which pixels should be removed. So:
(figure A):
........1.
.1.....1.1
1.11..1...
....11....
contains 1 seam of info, then we can add a 2nd seam:
(figure B):
2...2....2
.222.2.22.
......2...
and when merged you get:
2...2...12
.122.2.1.1
1211..122.
....112...
For completeness we can add seams 3 & 4:
(figures C & D):
33.3..3...
..3.33.333
4444444444
and merge them all into:
(figure E):
2343243412
3122424141
1211331224
4434112333
You'll notice that the 2s aren't all connected in this merged version, because the merged version is based on the original pixel positions, whereas the seam is based on the pixel positions at the moment the seam is calculated which, for this 2nd seam, is a 10x3px image.
This allows the front-end renderer to basically just loop over all the pixels in an image and filter them against this grid by number of desired pixels to remove. It runs at 100fps on my computer, meaning that it's perfectly suitable for single resizes on most devices. yay!
Now the problem that I'm trying to solve:
The decoding step from seams that go -1 1 0 1 0 -1 -1 -1 1 to the pre-computed grid of which pixels to remove is slow. The basic reason for this is that whenever one seam is removed, all the seams from there forward get shifted.
The way I'm currently calculating the "shifting" is by splicing each pixel of a seam out of a 1,048,576 element array (for a 1024x1024 px image, where each index is x * height + y for horizontal seams) that stores the original pixel positions. It's veeerrrrrryyyyy slow running .splice a million times...
This seems like a weird leetcode problem, in that perhaps there's a data structure that would allow me to know "how many pixels above this one have already been excluded by a seam" so that I know the "normalized index". But... I can't figure it out, anything I can think of requires too many re-writes to make this any faster.
Or perhaps there might be a better way to encode the seam data, but using 1-2 bits per pixel of the seam is very efficient, and anything else I can come up with would make those files huge.
Thanks for taking the time to read this!
[edit and tl;dr] -- How do I efficiently merge figures A-D into figure E? Alternatively, any ideas that yield figure E efficiently, from any compressed format
If I understand correct your current algorithm is:
while there are pixels in Image:
seam = get_seam(Image)
save(seam)
Image = remove_seam_from_image(Image, seam)
You then want to construct an array containing the numbers of each seam.
To do so, you could make a 1024x1024 array where each value is the index of that element of the array (y*width+x). Call this Indices.
A modified algorithm then gives you what you want
Let Indices have the dimensions of Image and be initialized to [0, len(Image)0
Let SeamNum have the dimensions of Image and be initialized to -1
seam_num = 0
while there are pixels in Image:
seam = get_seam(Image)
Image = remove_seam_from_image(Image, seam)
Indices = remove_seam_from_image_and_write_seam_num(Indices, seam, SeamNum, seam_num)
seam_num++
remove_seam_from_image_and_write_seam_num is conceptually identical to remove_seam_from_image except that as it walks seam to remove each pixel from Indices it writes seam_num to the location in SeamNum indicated by the pixel's value in Indices.
The output is the SeamNum array you're looking for.

Convolutional Neural Networks - Theory

I am sorry for asking this stupid question, but after a bit thinking, I still don't get it yet:
According to Jordi Torres (see here), if we look at an image with 28x28 = 784 pixels, then one way to implement this is to let one neuron of a hidden layer learn about 5x5 = 25 pixels of the input layer:
However, as he explains it:
Analyzing a little bit the concrete case we have proposed, we note that, if we have an input of 28×28 pixels and a window of 5×5, this defines a space of 24×24 neurons in the first hidden layer because we can only move the window 23 neurons to the right and 23 neurons to the bottom before hitting the right (or bottom) border of the input image. We would like to point out to the reader that the assumption we have made is that the window moves forward 1 pixel away, both horizontally and vertically when a new row starts. Therefore, in each step, the new window overlaps the previous one except in this line of pixels that we have advanced.
I really don't get why we need a space of 24x24 neurons in the first hidden layer? Since I take 5x5 windows (which have 25 pixels out of 784 in them), I thought we would need 785/25 = 32 neurons at all. I mean, doesn't one neuron of the hidden layer learn the property of 25 pixels?
Apparently not, but I am really confused.
You're assuming non-overlapping 5x5 segments, but that's not the case. In this example, the first output is derived from rows 1-5, columns 1-5 of the input. The next one uses rows 1-5, columns 2-6, on to rows 1-5, columns 24-28, then rows 2-6, columns 1-5, etc. etc. until rows 24-28, columns 24-28. This is referred to as a "stride" of 1.

How to detect boundaries of a pattern [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Detecting thin lines in blurry image
So as the title says, I am trying to detect boundaries of patterns. In the images attached, you can basically see three different patterns.
Close stripe lines
One thick L shaped line
The area between 1 & 2
I am trying to separate these three, in say 3 separate images. Depend on where the answers go, I will upload more images if needed. Both idea or code will be helpful.
You can solve (for some values of "solve") this problem using morphology. First, to make the image more uniform, remove irrelevant minima. One way to do this is using the h-dome transform for regional minima, which suppresses minima of height < h. Now, we want to join the thin lines. That is accomplished by a morphological opening with a horizontal line of length l. If the lines were merged, then the regional minima of the current image is the background. So we can fill holes to obtain the relevant components. The following code summarizes these tasks:
f = rgb2gray(imread('http://i.stack.imgur.com/02X9Z.jpg'));
hm = imhmin(f, h);
o = imopen(hm, strel('line', l, 0));
result = imfill(~imregionalmin(o), 'holes');
Now, you need to determine h and l. The parameter h is expected to be easier since it is not related to the scale of the input, and in your example, values in the range [10, 30] work fine. To determine l maybe a granulometry analysis could help. Another way is to check if the result contains two significant connected components, corresponding to the bigger L shape and the region of the thin lines. There is no need to increase l one by one, you could perform something that resembles a binary search.
Here are the hm, o and result images with h = 30 and l = 15 (l in [13, 19] works equally good here). This approach gives flexibility on parameter choosing, making it easier to pick/find good values.
To calculate the area in the space between the two largest components, we could merge them and simply count the black pixels inside the new connected component.
You can pass a window (10x10 pixels?) and collect features for that window. The features could be something as simple as the cumulative gradients (edges) within that window. This would distinguish the various areas as long as the window is big enough.
Then using each window as a data point, you can do some clustering, or if the patterns don't vary that much you can do some simple thresholds to determine which data points belong to which patterns (the larger gradient sums belong to the small lines: more edges, while the smallest gradient sums belong to the thickest lines: only one edge, and those in between belong to the other "in-between" pattern .
Once you have this classification, you can create separate images if need be.
Just throwing out ideas. You can binarize the image and do connected component labelling. Then perform some analysis on the connected components such as width to discriminate between the regions.

Exploded view algorithm for CAD

I'm making a program to view 3D CAD models and would like to build in automated exploded views. All the assemblies that will be viewed are axi-symmetric. Some may not be, but the majority are. I'd like to figure out an algorithm for automatically moving parts in an assembly into an exploded view position. Here is an example of what I want to achieve through an algorithm (minus the labels of course):
The only value I have to work with is the center of the bounding box of each part. If more information than that is needed, I can calculate more information, but it seems like it should be sufficient. The rough approach I have in mind is to calculate a vector from the origin of the assembly to the center of each part along the axi-symmetric axis, then calculate a radial vector to the center of the part with respect to the center axis. From there, I'd need to figure out some calculation that would be able to scale the position of each part along some combination of those two vectors. That's the part where I'm not quite sure what direction to go with this. The image I've included shows the exact functionality I'd like, but I want to be able to scale the position by any float value to expand or contract the exploded view, with 1.0 being the original assembled model. Any ideas?
Your question is quite broad and thus my explanation became somehow lengthy. I'll propose two variants of an explosion algorithm for both axial and radial treatment.
To illustrate them with an example I'll use the following numbers (bounding boxes along the axis only, only five parts):
P1: [ 0,10] (battery)
P2: [10,14] (motor)
P3: [14,16] (cog)
P4: [16,24] (bit holder)
P5: [18,26] (gear casing)
While parts P1 to P4 exactly touch each other, P4 and P5 actually overlap.
The first one is an algorithm which basically scales the distances by a factor, such as you proposed. It will suffer if size of pieces is much different in an assembly but also for overlapping parts (e.g. in your example along the axis the extension of circle cog is much smaller than bit holder).
Let the scaling factor be f, then the center of each bounding box is scaled by f, but extension is not. Parts then would be
P1: 5 + [-5,5] => P1': 5*f + [-5,5]
P2: 12 + [-2,2] => P2': 12*f + [-2,2]
P3: 15 + [-1,1] => P3': 15*f + [-1,1]
P4: 20 + [-4,4] => P4': 20*f + [-4,4]
P5: 22 + [-4,4] => P5': 22*f + [-4,4]
The distance between the parts P1' to P4 is then given by
P2' - P1' : (12*f-2) - (5*f+5) = 7*(f-1)
P3' - P2' : (15*f-1) - (12*f+2) = 3*(f-1)
P4' - P3' : (20*f-4) - (15*f+1) = 5*(f-5)
As expected the difference is zero for f=0 but for any exploded view the distance strongly depends on the sizes of the separate parts. I don't think that this will look too good if variation of sizes is bigger.
Additionally for overlapping parts
P5' - P4' : (22*f-4) - (20*f+4) = 2*f-8
they still overlap for reasonable f.
Another possibility would be to define not a scaling factor for the axis but a constant part-distance d. Then bounding boxes would be aligned like the following:
P1': [ 0,10]
P2': [10,14]+d
P3': [14,16]+2*d
P4': [16,24]+3*d
P5': [18,26]+4*d+6
Note that in the last line we added 24-8=6, i.e. the overlap in order to differentiate the two parts.
While this algorithm handles the above mentioned cases in a (in my opinion) better way we have to add special care to parts which cover multiple other parts and should not be included in the grouping (e.g. handle top in your case).
One possibility would be to group the parts into groups in a first step and then apply the algorithm to the bounding box of these groups. Afterwards it can be applied to parts in each group again, omitting the parts which cover more than one subgroup. In your case it would be (note nested grouping is possible):
[
([battery,(switch,circuit switch),motor],handle top),
motor cog,
tri-cog,
red-cog,
circle-cog,
bit-holder,
(gear casing,spring,lock knob)
]
You might see that I have introduced two different kind of groups: parts/groups in square braces are handled by the algorithm, i.e. a spacing is added between each part/subgroup inside such a group, while the groups inside round braces are not exploded.
Up to now we did not handled the radial explosion because it nicely decouples from the axis treatment. But again the same both approaches can be used for radial explosion also. But again in my opinion the second algorithm yields more pleasant results. E.g. the groups can be done as follows for radial treatment:
[
(battery,switch,<many parts>,gear casing),
(switch,spring),
(handle top, lock knob)
]
In this case we would add an additional component r to all radial centers in the second group and 2*r to all in the third group.
Note that the simple scaling algorithm runs without special user guidance (once the scaling factor is given) while the second one uses additional information (the grouping).
I hope this rather long explanation gives you some ideas how to proceed further. If my explanations are unclear at some point or if you have further questions please feel free to comment.

Making a good XY (scatter) chart in VB6

I need to write an application in VB6 which makes a scatter plot out of a series of data points.
The current workflow:
User inputs info.
A bunch of calculations go down.
The output data is displayed in a series of 10 list boxes.
Each time the "calculate" button is clicked, 2 to 9 entries are entered into the list boxes.
One list box contains x coordinates.
One list box contains the y coordinates.
I need to:
Scan through those list boxes, and select my x's and y's.
Another list box field will change from time to time, varying between 0 and 100, and that field is what needs to differentiate which series on the eventual graph the x's and y's go into. So I will have Series 1 with six (x,y) data points, Series 26 with six data points, Series 99 with six data points, etc. Or eight data points. Or two data points. The user controls how many x's there are.
Ideally, I'll have a graph with multiple series displaying all this info.
I am not allowed to use a 3rd party solution (e.g. Excel). This all has to be contained in a VB6 application.
I'm currently trying to do this with MS Chart, as there seems to be the most documentation for that. However, this seems to focus on pie charts and other unrelated visualizations.
I'm totally open to using MS Graph but I don't know the tool and can't find good documentation.
A 2D array is, I think, a no go, since it would need to be of a constantly dynamically changing size, and that can't be done (or so I've been told). I would ideally cull through the runs, sort the data by that third series parameter, and then plug in the x's and y's, but I'm finding the commands and structure for MS Chart to be so dense that I'm just running around in very small circles.
Edit: It would probably help if you can visualize what my data looks like. (S for series, made up numbers.)
S X Y
1 0 1000000
1 2 500000
1 4 250000
1 6 100000
2 0 1000000
2 2 6500
2 4 5444
2 6 1111
I don't know MSGraph, but I'm sure there is some sort of canvas element in VB6 which you can use to easily draw dots yourself. Scatter plots are an easy graph to make on your own, so long as you don't need to calculate a line of best fit.
I would suggest looking into the canvas element and doing it by hand if you can't find a tool that does it for you.
Conclusion: MSChart and MSGraph can both go suck a lemon. I toiled and toiled and got a whole pile of nothing out of either one. I know they can do scatter plots, but I sure as heck can't make them do 'em well.
#BlackBear! After finding out that my predecessor had the same problems and just used Pset and Line to make some really impressive graphs, I did the same thing - even if it's not reproducible and generic in the future as was desired. The solution that works, albeit less functionally >> the solution with great functionality that exists only in myth.
If anyone is reading this down the line and has an actual answer about scatter plots and MSChart/Graph, I'd still love to know.

Resources