How to order combinations with repetition spatially

How to order combinations with repetition spatially - algorithm

Sorry for choosing a title that isn't really expressive but I couldn't come up with anything better.
I'm looking for an algorithmic way to find combinations with repetition and order them spatially in the following way:
order 2 (2x2):
AA AB
AD AC/BD
order 4 (4x4):
AAAA AAAB AABB ABBB
AAAD AAAC/AABD AABC/ABBD ABBC
AADD AACD/ABDC ABCD/AACC ABCC
ADDD ACDD ACCD ACCC
Especially those two-option fields (e.g. AAAC/AABD) are giving me headaches. The combinations will be used to compute correlations between real pixels. I.e. for instance the combination AA would be the autocorrelation of A while AB would be the cross correlation of A and B.
This scheme is required for a microscopy algorithm I'm currently trying to implement (see the section on cross cumulants):
http://en.wikipedia.org/wiki/Super-resolution_optical_fluctuation_imaging
On top of that, I also need to do this in a most efficient manner as possible as I would typically have to perform the operation on thousands of images with each being at least 512x512 (I'm doing this in CUDA). EDIT: well I just realized that high efficiency is not really required as the pattern won't change once an order has been chosen.
This image may help:

EDIT: I found a MATLAB toolbox for Balanced SOFI analysis (under GNU public license) on this page: http://documents.epfl.ch/users/l/le/leuteneg/www/BalancedSOFI/index.html -- even if you can't use MATLAB code, it may be possible to convert it to your programming language.
Of the eight references of the Wikipedia article, some are behind paywalls, but figures (or thumbnails of them) freely viewable for all but t he two oldest articles. Only one article gives free access to a figure resembling the one from Wikipedia (with AAAB, AABB, etc.):
http://www.opticsinfobase.org/boe/abstract.cfm?uri=boe-2-3-408
If I were in your position, I would attempt to contact the authors of that paper directly and ask them what the figure means, because I did not see how they decided to use these combinations. Within the 3x3 set of interpolated pixels that are bounded symmetrically by pixes A, B, C, D at their four corners, why is A the only letter that is required to occur in every single combination? If I understand what the letters signify (relative weightings of the real pixels in each interpolated pixel's value), then the pixel on the intersection of the diagonals, labeled ABCD AACC, gives twice as much weight to A and C as to B and D; in fact, on average that entire 3x3 block of pixels gives A and C twice as much weight as B and D. That does not seem consistent with the underlying symmetries of the problem.

The mathematics in the wiki link you provided (about Super-resolution optical fluctuation imaging) may require quite a bit of study for me to understand, but I observe at least one possible way to think of diagram A:
I think of A as an anchor from which vectors extend to B, C, and D. The direct vectors are probably fairly obvious - as we move from A directly toward D, the Ds accumulate (i.e., AAAD, AADD, ADDD); as with A to B and A to C.
I see the middle combinations as vectors extending from the border either horizontally or vertically. For example, going down from AAAB, we add one D, then another; going to the right from AADD, we add one C, then reach the middle, ABCD.
I think any one combination in the diagram is sorted alphabetically, which would account for the apparent discrepancy in the positions of some accumulations, for example, the position of the accumulating C when moving from ADDD to ACCC, compared with the position of the accumulating `C' when moving from ABBB to ACCC.

Related

Sorting a list when the comparison between any two elements may be ambiguous?

I'm trying to optimize a sort for an isometric renderer. What is proving tricky is that the comparitor can return "A > B", "A < B", or "the relationship between A and B is ambiguous." All of the standard sorting algorithms expect that you can compare the values of any two objects, but I cannot do that here. Are there known algorithms that are made for this situation?
For example, there can be a situation where the relationship between A and C is ambiguous, but we know A > B, and B > C. The final list should be A,B,C.
Note also that there may be multiple solutions. If A > B, but C is ambiguous to both A and B, then the answer may be C,A,B or A,B,C.
Edit: I did say it was for sorting in an isometric renderer, but several people asked for more info so here goes. I've got an isometric game and I'm trying to sort the objects. Each object has a rectangular, axis-aligned footprint in the world, which means that they appears as a sort of diamond shapes from the perspective of the camera. Height of the objects is not known, so an object in front of another object is assumed to be capable of occluding the one in the back, and therefore must be drawn after (sorted later in the list than) the one in the back.
I also left off an important consideration, which is that a small number of objects (the avatars) move around.
Edit #2: I finally have enough rep to post pictures! A and B...well, they aren't strictly ambiguous because in each case they have a definite relationship compared to each other. However, that relationship cannot be known by looking at the variables of A and B themselves. Only when we look at C as well can we know what order they go in.
I definitely think topographical sort is the way to go, so I'm considering the question answered, but I'm happy to make clarifications to the question for posterity.

You may want to look at sorts for partial orders.
https://math.stackexchange.com/questions/55891/algorithm-to-sort-based-on-a-partial-order links to the right papers on this topic (specifically topological sorting).
Edit:
Given the definition of the nodes:
You need to make sure objects never can cause mutual occlusion. Take a look at the following grid, where the camera is in the bottom left corner.
______
____X_
_YYYX_
_YXXX_
_Y____
As you can see, parts of Y are hidden by X and parts of X are hidden by Y. Any drawing order will cause weird rendering. This can be solved in many ways, the simplest being to only allow convex, hole-free shapes as renderable primitives. Anything concave needs to be broken into chunks.
If you do this, you can then turn your partial order into a total order. Here's an example:
def compare(a,b):
if a.max_x < b.min_x:
return -1
elif a.max_y < b.min_y:
return -1
elif b.max_x < a.min_x:
return 1
elif b.max_y < a.min_y:
return 1
else:
# The bounding boxes intersect
# If the objects are both rectangular,
# this should be impossible
# If you allow non-rectangular convex shapes,
# like circles, you may need to do something fancier here.
raise NotImplementedException("I only handle non-intersecting bounding boxes")
And use any old sorting algortithm to give you your drawing order.

You should first build a directed graph, using that graph you will be able to find the relationships, by DFS-ing from each node.
Once you have relationships, some pairs might still be ambiguous. In that case look for partial sorting.

Tricky algorithm for sorting symbols in an array while preserving relationships via order

The problem
I have multiple groups which specify the relationships of symbols.. for example:
[A B C]
[A D E]
[X Y Z]
What these groups mean is that (for the first group) the symbols, A, B, and C are related to each other. (The second group) The symbols A, D, E are related to each other.. and so forth.
Given all these data, I would need to put all the unique symbols into a 1-dimension array wherein the symbols which are somehow related to each other would be placed closer to each other. Given the example above, the result should be something like:
[B C A D E X Y Z]
or
[X Y Z D E A B C]
In this resulting array, since the symbol A has multiple relationships (namely with B and C in one group and with D and E in another) it's now located between those symbols, somewhat preserving the relationship.
Note that the order is not important. In the result, X Y Z can be placed first or last since those symbols are not related to any other symbols. However, the closeness of the related symbols is what's important.
What I need help in
I need help in determining an algorithm that takes groups of symbol relationships, then outputs the 1-dimension array using the logic above. I'm pulling my hair out on how to do this since with real data, the number of symbols in a relationship group can vary, there is also no limit to the number of relationship groups and a symbol can have relationships with any other symbol.
Further example
To further illustrate the trickiness of my dilemma, IF you add another relationship group to the example above. Let's say:
[C Z]
The result now should be something like:
[X Y Z C B A D E]
Notice that the symbols Z and C are now closer together since their relationship was reinforced by the additional data. All previous relationships are still retained in the result also.

The first thing you need to do is to precisely define the result you want.
You do this by defining how good a result is, so that you know which is the best one. Mathematically you do this by a cost function. In this case one would typically choose the sum of the distances between related elements, the sum of the squares of these distances, or the maximal distance. Then a list with a small value of the cost function is the desired result.
It is not clear whether in this case it is feasible to compute the best solution by some special method (maybe if you choose the maximal distance or the sum of the distances as the cost function).
In any case it should be easy to find a good approximation by standard methods.
A simple greedy approach would be to insert each element in the position where the resulting cost function for the whole list is minimal.
Once you have a good starting point you can try to improve it further by modifying the list towards better solutions, for example by swapping elements or rotating parts of the list (local search, hill climbing, simulated annealing, other).

I think, because with large amounts of data and lack of additional criteria, it's going to be very very difficult to make something that finds the best option. Have you considered doing a greedy algorithm (construct your solution incrementally in a way that gives you something close to the ideal solution)? Here's my idea:
Sort your sets of related symbols by size, and start with the largest one. Keep those all together, because without any other criteria, we might as well say their proximity is the most important since it's the biggest set. Consider every symbol in that first set an "endpoint", an endpoint being a symbol you can rearrange and put at either end of your array without damaging your proximity rule (everything in the first set is an endpoint initially because they can be rearranged in any way). Then go through your list and as soon as one set has one or more symbols in common with the first set, connect them appropriately. The symbols that you connected to each other are no longer considered endpoints, but everything else still is. Even if a bigger set only has one symbol in common, I'm going to guess that's better than smaller sets with more symbols in common, because this way, at least the bigger set stays together as opposed to possibly being split up if it was put in the array later than smaller sets.
I would go on like this, updating the list of endpoints that existed so that you could continue making matches as you went through your set. I would keep track of if I stopped making matches, and in that case, I'd just go to the top of the list and just tack on the next biggest, unmatched set (doesn't matter if there are no more matches to be made, so go with the most valuable/biggest association). Ditch the old endpoints, since they have no matches, and then all the symbols of the set you just tacked on are the new endpoints.
This may not have a good enough runtime, I'm not sure. But hopefully it gives you some ideas.
Edit: Obviously, as part of the algorithm, ditch duplicates (trivial).

The problem as described is essentially the problem of drawing a graph in one dimension.
Using the relationships, construct a graph. Treat the unique symbols as the vertices of the graph. Place an edge between any two vertices that co-occur in a relationship; more sophisticated would be to construct a weight based on the number of relationships in which the pair of symbols co-occur.
Algorithms for drawing graphs place well-connected vertices closer to one another, which is equivalent to placing related symbols near one another. Since only an ordering is needed, the symbols can just be ranked based on their positions in the drawing.
There are a lot of algorithms for drawing graphs. In this case, I'd go with Fiedler ordering, which orders the vertices using a particular eigenvector (the Fiedler vector) of the graph Laplacian. Fiedler ordering is straightforward, effective, and optimal in a well-defined mathematical sense.

It sounds like you want to do topological sorting: http://en.wikipedia.org/wiki/Topological_sorting
Regarding the initial ordering, it seems like you are trying to enforce some kind of stability condition, but it is not really clear to me what this should be from your question. Could you try to be a bit more precise in your description?

Challenge: Take a 48x48 image, find contiguous areas that result in the cheapest Lego solution to create that image! [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Background
Lego produces the X-Large Gray Baseplate, which is a large building plate that is 48 studs wide and 48 studs tall, resulting in a total area of 2304 studs. Being a Lego fanatic, I've modeled a few mosaic-style designs that can be put onto these baseplates and then perhaps hung on walls or in a display (see: Android, Dream Theater, The Galactic Empire, Pokemon).
The Challenge
My challenge is now to get the lowest cost to purchase these designs. Purchasing 2304 individual 1x1 plates can get expensive. Using BrickLink, essentially an eBay for Lego, I can find data to determine what the cheapest parts are for given colors. For example, a 1x4 plate at $0.10 (or $0.025 per stud) would be cheaper than a 6x6 plate at $2.16 (or $0.06 per stud). We can also determine a list of all possible plates that can be used to assemble an image:
1x1
1x2
1x3
1x4
1x6
1x8
1x10
1x12
2x2 corner!
2x2
2x3
2x4
2x6
2x8
2x10
2x12
2x16
4x4 corner!
4x4
4x6
4x8
4x10
4x12
6x6
6x8
6x10
6x12
6x14
6x16
6x24
8x8
8x11
8x16
16x16
The Problem
For this problem, let's assume that we have a list of all plates, their color(s), and a "weight" or cost for each plate. For the sake of simplicity, we can even remove the corner pieces, but that would be an interesting challenge to tackle. How would you find the cheapest components to create the 48x48 image? How would you find the solution that uses the fewest components (not necessarily the cheapest)? If we were to add corner pieces as allowable pieces, how would you account for them?
We can assume we have some master list that is obtained by querying BrickLink, getting the average price for a given brick in a given color, and adding that as an element in the list. So, there would be no black 16x16 plate simply because it is not made or for sale. The 16x16 Bright Green plate, however, would have a value of $3.74, going by the current available average price.
I hope that my write-up of the problem is succint enough. It's something I've been thinking about for a few days now, and I'm curious as to what you guys think. I tagged it as "interview-questions" because it's challenging, not because I got it through an interview (though I think it'd be a fun question!).
EDIT
Here's a link to the 2x2 corner piece and to the 4x4 corner piece. The answer doesn't necessarily need to take into account color, but it should be expandable to cover that scenario. The scenario would be that not all plates are available in all colors, so imagine that we've got a array of elements that identify a plate, its color, and the average cost of that plate (an example is below). Thanks to Benjamin for providing a bounty!
1x1|white|.07
1x1|yellow|.04
[...]
1x2|white|.05
1x2|yellow|.04
[...]
This list would NOT have the entry:
8x8|yellow|imaginarydollaramount
This is because an 8x8 yellow plate does not exist. The list itself is trivial and should only be thought about as providing references for the solution; it does not impact the solution itself.
EDIT2
Changed some wording for clarity.

Karl's approach is basically sound, but could use some more details. It will find the optimal cost solution, but will be too slow for certain inputs. Large open areas especially will have too many possibilities to search through naively.
Anyways, I made a quick implementation in C++ here: http://pastebin.com/S6FpuBMc
It solves filling in the empty space (periods), with 4 different kinds of bricks:
0: 1x1 cost = 1000
1: 1x2 cost = 150
2: 2x1 cost = 150
3: 1x3 cost = 250
4: 3x1 cost = 250
5: 3x3 cost = 1
.......... 1112222221
...#####.. 111#####11
..#....#.. 11#2222#13
..####.#.. 11####1#13
..#....#.. 22#1221#13
.......... 1221122555
..##..#... --> 11##11#555
..#.#.#... 11#1#1#555
..#..##... 11#11##221
.......... 1122112211
......#..# 122221#11#
...####.#. 555####1#0
...#..##.. 555#22##22
...####... 555####444 total cost = 7352
So, the algorithm fills in a given area. It is recursive (DFS):
FindBestCostToFillInRemainingArea()
{
- find next empty square
- if no empty square, return 0
- for each piece type available
- if it's legal to place the piece with upper-left corner on the empty square
- place the piece
- total cost = cost to place this piece + FindBestCostToFillInRemainingArea()
- remove the piece
return the cheapest "total cost" found
}
Once we figure out the cheapest way to fill a sub-area, we'll cache the result. To very efficiently identify a sub-area, we'll use a 64-bit integer using Zobrist hashing. Warning: hash collisions may cause incorrect results. Once our routine returns, we can reconstruct the optimal solution based on our cached values.
Optimizing:
In the example, 41936 nodes (recursive calls) are explored (searching for empty square top-to-bottom). However, if we search for empty squares left-to-right, ~900,000 nodes are explored.
For large open areas: I'd suggest finding the most cost-efficient piece and filling in a lot of the open area with that piece as a pre-process step. Another technique is to divide your image into a few regions, and optimize each region separately.
Good luck! I'll be unavailable until March 26th, so hopefully I didn't miss anything!

Steps
Step 1: Iterate through all solutions.
Step 2: Find the cheapest solution.
Create pieces inventory
For an array of possible pieces (include single pieces of each color), make at least n duplicates of each piece, where n = max(board#/piece# of each color). Therefore, at most n of that piece can cover all of the entire board's colors by area.
Now we have a huge collection of possible pieces, bounded because it is guaranteed that a subset of this collection will completely fill the board.
Then it becomes a subset problem, which is NP-Complete.
Solving the subset problem
For each unused piece in the set
For each possible rotation (e.g. for a square only 1, for a rectangle piece 2, for an elbow piece 4)
For each possible position in the *remaining* open places on board matching the color and rotation of the piece
- Put down the piece
- Mark the piece as used from the set
- Recursively decent on the board (with already some pieces filled)
Optimizations
Obviously being an O(2^n) algorithm, pruning of the search tree early is of utmost importance. Optimizations must be done early to avoid long-running. n is a very large number; just consider a 48x48 board -- you have 48x48xc (where c = number of colors) just for single pieces alone.
Therefore, 99% of the search tree must be pruned from the first few hundred plies in order for this algorithm to complete in any time. For example, keep a tally of the lowest cost solution found so far, and just stop searching all lower plies and backtrack whenever the current cost plus (the number of empty board positions x lowest average cost for each color) > current lowest cost solution.
For example, further optimize by always favoring the largest pieces (or the lowest average-cost pieces) first, so as to reduce the baseline lowest cost solution as quickly as possible and to prune as many future cases as possible.
Finding the cheapest
Calculate cost of each solution, find the cheapest!
Comments
This algorithm is generic. It does not assume a piece is of the same color (you can have multi-colored pieces!). It does not assume that a large piece is cheaper than the sum of smaller pieces. It doesn't really assume anything.
If some assumptions can be made, then this information can be used to further prune the search tree as early as possible. For example, when using only single-colored pieces, you can prune large sections of the board (with the wrong colors) and prune large number of pieces in the set (of the wrong color).
Suggestion
Do not try to do 48x48 at once. Try it on something small, say, 8x8, with a reasonably small set of pieces. Then increase number of pieces and board size progressively. I really have no idea how long the program will take -- but would love for somebody to tell me!

First you use flood fill to break up the problem into filling continuous regions of lego bricks. Then for each of those you can use a dfs with memoization you wish. The flood fill is trivial so I will not describe it farther.
Make sure to follow a right hand rule while expanding the search tree to not repeat states.

My solution will be:
Sort all the pieces by stud cost.
For each piece in the sorted list, try to place as many as you can in the plate:
Raster a 2D image of your design looking for regions of the image with uniform color, the shape of the current piece and free studs for each stud that the piece will use.
If the color of the region found do not exist for that particular piece, ignore an continue searching.
If the color exists: tag the studs used by that pieces and increment a counter for that kind of piece and that color.
Step 2 will be done once for squared pieces, twice for rectangular pieces (once vertical and once horizontal) and 4 times for corner pieces.
Iterate to 2 until the plate is full or no more type of pieces are available.
Once arrived to the end you will have the number of pieces of each kind and each color that you needed with a minimum cost.
If cost by stubs can change by color, then the original sorted list must include not only the type of piece by also the color.

Averaging a set of points on a Google Map into a smaller set

I'm displaying a small Google map on a web page using the Google Maps Static API.
I have a set of 15 co-ordinates, which I'd like to represent as points on the map.
Due to the map being fairly small (184 x 90 pixels) and the upper limit of 2000 characters on a Google Maps URL, I can't represent every point on the map.
So instead I'd like to generate a small list of co-ordinates that represents an average of the big list.
So instead of having 15 sets, I'd end up with 5 sets, who's positions approximate the positions of the 15. Say there are 3 points that are in closer proximity to each-other than to any other point on the map, those points will be collapsed into 1 point.
So I guess I'm looking for an algorithm that can do this.
Not asking anyone to spell out every step, but perhaps point me in the direction of a mathematical principle or general-purpose function for this kind of thing?
I'm sure a similar function is used in, say, graphics software, when pixellating an image.
(If I solve this I'll be sure to post my results.)

I recommend K-means clustering when you need to cluster N objects into a known number K < N of clusters, which seems to be your case. Note that one cluster may end up with a single outlier point and another with say 5 points very close to each other: that's OK, it will look closer to your original set than if you forced exactly 3 points into every cluster!-)

If you are searching for such functions/classes, have a look at MarkerClusterer and MarkerManager utility classes. MarkerClusterer closely matches the described functionality, as seen in this demo.

In general I think the area you need to search around in is "Vector Quantization". I've got an old book title Vector Quantization and Signal Compression by Allen Gersho and Robert M. Gray which provides a bunch of examples.
From memory, the Lloyd Iteration was a good algorithm for this sort of thing. It can take the input set and reduce it to a fixed sized set of points. Basically, uniformly or randomly distribute your points around the space. Map each of your inputs to the nearest quantized point. Then compute the error (e.g. sum of distances or Root-Mean-Squared). Then, for each output point, set it to the center of the set that maps to it. This will move the point and possibly even change the set that maps to it. Perform this iteratively until no changes are detected from one iteration to the next.
Hope this helps.

Computing overlaps of grids

Say I have two maps, each represented as a 2D array. Each map contains several distinct features (rocks, grass, plants, trees, etc.). I know the two maps are of the same general region but I would like to find out: 1.) if they overlap and 2.) if so, where does this overlap occur. Does anyone know of any algorithms which would help me do this?
[EDIT]
Each feature is contained entirely inside an array index. Although it is possibly to discern (for example) a rock from a patch of grass, it is not possible to discern one rock from another (or one patch of grass from another).

When doing this in 1D, I would for each index in the first collection (a string, really), try find the largest match in the second collection. If the match goes to the end, I have an overlap (like in action and ionbeam).
match( A on B ):
for each i in length(A):
see if A[i..] matches B[0..]
if no match found: do the same for B on A.
For 2D, you do the same thing, basically: find an 'edge' of A that overlaps with the opposite edge of B. Only the edges aren't 1D, but 2D:
for each point xa,ya in A:
find a row yb in B that has a match( A[ya] on B[yb] )
see if A[ya..] matches B[yb..]
You need to do this for the 2 diagonals, in each sense.

For one map, go through each feature and find the nearest other feature to it. Record these in a list, storing the type of each of the two features and the dx dy between them. Store in a hash table or sorted list. These are now location invariant, since they record only relative distances.
Now for your second map, start doing the same: pick any feature, find its closest neighbor, find the delta. Look for the same correspondence in the original map list. If the features are shared between the maps, you'll find it in the list and you now know one correspondence between the maps. Repeat for many features if necessary. The results will give you a decent answer of if the maps overlap, and if so, at what offset.

Sounds like image registration (wikipedia), finding a transformation (translation only, in your case) which can align two images. There's a pile of software that does this sort of thing linked off the wikipedia page.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio