highest tower of cubes (with numbers on sides) - algorithm

Problem:
There are N cubes. There are M numbers. Each side of cube has number from 1 to M. You can stack one cube on another if their touching sides have same number (top side of bottom cube and bottom side of top cube has same number). Find the highest tower of cubes.
Input: number N of cubes and number M.
Example:
INPUT: N=5, M=6. Now we generate 5 random cubes with 6 sides = <1,M>.
[2, 4, 3, 1, 4, 1]
[5, 1, 6, 6, 2, 5]
[2, 5, 3, 1, 1, 6]
[3, 5, 6, 1, 3, 4]
[2, 4, 4, 5, 5, 5]
how you interpret single array of 6 numbers is up to you. Opposite sides in cube might be index, 5-index (for first cube opposite side of 4 would be 4). Opposite sides in cube might also be index and index+1 or index-1 if index%2==0 or 1 respectively. I used the second one.
Now let's say first cube is our current tower. Depending on the rotation top color might be one of 1, 2, 3, 4. If the 1 is color on top we can stack
on top of it second, third or fourth cube. All of them has color 1 on their sides. Third cube even has two sides with color 1 so we can stack it in two different ways.
I won't analyse it till the end because this post would be too long. Final answer for these (max height of the tower) is 5.
My current solution (you can SKIP this part):
Now I'm just building the tower recursively. Each function has this subproblem to solve: find highest tower given the top color of current tower and current unused cubes (or current used cubes). This way I can memoize and store results for tuple(top color of tower, array of used cubes). Despite memoization I think that in the worst case (for small M) this solution has to store M*(2^N) values (and this many cases to solve).
What I'm looking for:
I'm looking for something that would help me solve this efficiently for small M. I know that there is tile stacking problem (which uses Dynamic Programming) and tower of cubes (which uses DAG longest path) but I don't see the applicability of these solutions to my problem.

You won't find a polynomial time solution- if you did, we'd be able to solve the decision variant of the longest path problem (which is NP-Complete) in polynomial time. The reduction is as follows: for every edge in an undirected graph G, create a cube with opposing faces (u, v), where u and v are unique identifiers for the vertices of the edge. For the remaining 4 faces, assign globally unique identifiers. Solve for the tallest cube tower, this tower's height will be the length of the longest path of G, return if path length equals the queried value (yes/no).
However, you could still solve it in something like O(M^3*(N/2)!*log(N)) time (I think that bound is a bit loose, but its close). Use divide and conquer with memoization. Find all longest paths using cubes [0, N) beginning with a value B in range [0, M) and ending with a value E in range [0, M), for all possible B and E. To compute this, recurse, partitioning the cubes evenly in every possible way. Keep recursing until you hit the bottom (just one cube). Then begin merging them (by combining cube stacks that end in X with those beginning with X, for all X in [0, M). Once that's all done, at the topmost level just take the max of all the tower heights.

Related

Given n rectangles coordinates, find area of region where k rectangles intersect?

Given a list rectangles [R1, R2, R3] defined by their lower left and upper right [(x1, y1), (x2, y2)] coordinates, and a value k.
Is there an optimal way to find area where k rectangles overlap?
For example:
R1: [(1, 1), (5, 5)]
R2: [(4, 4), (7, 6)]
R3: [(3, 3), (8, 7)]
rectangles = [R1, R2, R3]
k = 2
The area which is overlapped by two rectangles is 8.
Brute force way to solve this is calculate the min and max of x-axis and y-axis coordinates and then used that create a grid and increment one for each cell within rectangle. At the end iterate over the grid to compute number of cells which have a value as k to find the solution.
The approach has a complexity of O(n^3), assuming each each rectangle is of size n x n and there are n rectangles.
Is there a run time optimal way to approach this problem?
The usual way to analyze collections of rectangles is with a sweep line algorithm. Imagine a vertical line that starts at the left of the collection and scans to the right. Store a set of the rectangles that currently intersect the line, initially empty. This set needs to be updated when the line passes a vertical side of any rectangle: adding or removing a rectangle in each case. To make scanning efficient, use a sorted list of the x coordinates of the verticals.
In this case, you'll also need a way to efficiently determine the intervals of the scan line that are covered by k or more rectangles. That can be done efficiently by maintaining an interval tree.
Depending on details, efficiency ought to be roughly O(n log n) for n rectangles with maybe an additional term for the maximum overlap depth. I'll let you work out the details.
Insert the rectangles into a data structure where they are sorted by their bottom coordinate x1. Using e.g. a self-balancing binary search tree, this would have complexity O(N.LogN), and allow to traverse the tree in order in O(N), with N being the number of rectangles. In the example that would be:
[R1, R3, R2]
While inserting the rectangles into the tree, also keep a sorted list of all the unique bottom and top coordinates y1 and y2. In the example that would be:
[1, 3, 4, 5, 6, 7]
Now we will treat each horizontal slice between two consecutive y-coordinates as a 1-dimensional problem (similar to the first method in this answer).
Iterate from the start of the rectangle tree over all the rectangles that fall in this slice (remember the rectangles are sorted by y1, so they are grouped together in the beginning), and make a sorted list of their unique x-coordinates, with a value for each to which you add 1 if it is the left coordinate, and subtract 1 when it is a right coordinate. If you encounter rectangles whose top coordinate equals the slice's top coordinate, remove them from the rectangle tree, which can be done in O(1). For the first slice in the example, with y=1~3 and height 2, that would be:
[1: +1, 5: -1]
If we iterate over it, we find a zone of width 4 (and thus area 8) that is part of 1 rectangle.
For the second slice in the example, with y=3~4 and height 1, that would be:
[1: +1, 3: +1, 5: -1, 8, -1]
If we iterate over it, we find a zone of width 2 (and thus area 2) that is part of 1 rectangle, a zone of width 2 (and thus area 2) that is part of 2 rectangles, and a zone of width 3 (and thus area 3) that is part of 1 rectangle. So any area that is part of k rectangles is added to a total. And so on.
Creating the rectangle tree is O(N.LogN), creating the slice list is O(N.LogN), iterating over the slices is O(N) and within each slice creating the sorted x-coordinate list is O(N.LogN), for a total of O(N2.LogN), independent of how large the rectangles are, how large the total area is, and how much overlap there is between rectangles or clusters of rectangles.

DFS Greedy Chromatic Number

In my school I learned that calculating chromatic number of a arbitrary graph is NP-Complete.
I understand why the greddy algorithm does not work, but what about DFS/Greedy algorithm?
The main idea is do a DFS an for all the vertex not yet colored, take the minimum color index over all the neighbours.
I can't figure out a counter example and this question is blowing my mind.
Thanks for all of your answers.
Pseudocode
Chromatic(Vertex x){
for each neighbour y of vertex x
if color(y) = -1
color(y) <- minimum color over all the neighbours of y
if(y>=numColor) numColors++;
Chromatic(y);
}
Main(){
Set the color of all vertex equal -1
Take an arbitrary vertex u and set color(u) = 0
numColors = 1;
Chromatic(u);
print numColors;
}
Here's a concrete counterexample: the petersen graph. Your algorithm computes 4, regardless of where you start (I think), but the graph's chromatic index is 3.
The petersen graph is a classical counterexample for many greedy attempts at graph problems, and also for conjectures in graph theory.
The answer is that sometimes you will have a vertex which has 2 colors available, and making the wrong choice will cause a problem an undetermined time later.
Suppose you have vertices 1 through 9. Draw them around a circle. Then add edges to make the following true.
1, 2, 3 form a triangle.
3 connects to 4.
4, 5, 6 make a triangle.
5, 6, 7 make a triangle.
6, 7, 8 make a triangle.
7, 8, 9 make a triangle.
8, 9, 1 make a triangle.
9, 1, 2 make a triangle.
It is easy to color this with 3 colors. But a depth-first greedy algorithm has a choice of 2 colors it can give to vertex 4. Make the wrong choice, and you'll wind up needing 4 colors, not 3.

Sort rectangles according to area in O(N)

Let R1,...Rn be n axis-aligned rectangles in the plane for which the corners are points in the n×n-grid. Thus, for each rectangle Ri the four corners are points where both coordinates are integers in {1,...n}.
Now I want to sort these rectangles R1,...Rn by there increasing area in O(n) time.
I have an algorithm to sort them in O(n*log n). But how can it be done in O(n) ?
Using O(n*log n) we can do this :
Calculate all the areas, and then sort using any standard sorting algorithm like Quick sort
I guess some per-processing will be required, so that we can sort in O(n) , because we are given some pre-conditions which can help. I just want algorithm, no code required.
Since the keys (areas) of the rectangles are integers, the task can be completed in O(n) time using a counting sort. You know the minimum key is 0 and maximum key for the problem is n^2, so in the algorithm k=n^2+1. The algorithm completes in three passes: computing histogram, computing starting and ending indexes for each key, then copying the data to the output array, preserving order of inputs with equal keys so that the sort is stable. Each pass is O(n) so altogether the algorithm is O(n).
Example: Suppose n is 3. k is one more than the largest key that appears in the data, so that all keys fit in the range [0..k-1] inclusive, i.e., k is 10. You make a histogram h by setting up an array of 0s with index from 0 to k-1, and you fill the histogram by walking through your set of rectangles and just count them up. Say there are 2 with area 1, 5 with area 2, and 2 with area 4. h = [0, 2, 5, 0, 2, 0, 0, 0, 0, 0]. Then the starting indexes are immediately computed from the histogram as 0, 0, 2, 7, 7, 9, 9, 9, 9, 9. Any rectangle with area 0 goes into output array starting at 0. Any rectangle with area 1 goes into output array starting at 0 (and increment that number when you put a rectangle of area 1 into output). Any rectangle with area 2 goes into output array starting at 2. Any rectangle with area 3 goes into output array starting at 7. Any rectangle with area 4 goes into output array starting at 7.

Linear time algorithm for slicing stacked boxes

I have a problem which has a rather limited time constraint, and would like to see if I could get a nudge in the right direction.
Here is the problem:
You are presented with a wall with columns of different heights. Each column heights is represented as a non-zero integers.
Input state is defined using an array H of length N, containing heights of each
of the N columns on the screen, for example:
Slicing the snapshot at a given height leaves a number of solid pieces
above that height. E.g. slicing at level 2 would cut 3 solid pieces:
Slicing at level 1 would also cut 3 solid pieces:
Similarly, slicing at level 0 would return a single (one) solid piece, while slicing at level 3 wouldn't cut any pieces.
Requirement: Given an array of slice heights S of length M, containing all
levels at which a "slice" should be performed, return an array of length M containing numbers of cut pieces for each respective cut.
For example, given the input H = {2, 1, 3, 2, 3, 1, 1, 2} and S = { 0, 1, 2, 3 }, the program should return quantities {1, 3, 3, 0}, according to the examples above.
Both N and M are in the range of around 20,000, but heights in each array can reach up to 1,000,000.
Both time and space worst-case complexity for the solution cannot exceed
O(N + M + max(M) + max(N)).
The last constraint is what puzzles me: it basically means that I cannot have any nested for-loops, and I cannot seem to escape this.
Obviously, there is some clever preprocessing which needs to be done to yield final results in O(1) per slice, but I haven't been able to come up with it.
I went on to create an array of cut numbers for each slice level, and then update all of them as I iterate through H, but this turns out to be O(N*M), since I need to update all lower height levels.
Is there a data structure which would be appropriate for this task?
For each piece at a height there must be two edges intersecting that height, and by the definition of a "piece", the members of the set of such edges must all be distinct. So the number of pieces is half the number of edges in the set.
Further, the number of edges intersecting a specific height is the number of edges that have started below or at that height, minus the number of edges that have finished below or at it.
As such, we can calculate the number of pieces this way:
Create an array Accumulator of size max(S) filled with zeroes.
Iterate over H, and for each vertical edge you encounter add a +1 at the index i in Accumulator corresponding to the height of the edge's lower end, and add a -1 at the index j at corresponding to the edge's upper end. We're counting "ground level" as zero. Make sure to include the leading and trailing edges!
Iterate over Accumulator and insert in each cell the sum of all cells up to and including it (O(max(S)) time if you keep a running sum)
Divide every value in Accumulator by 2 to get the number of pieces at each height.
Read out the number of pieces corresponding to the heights in S.
Example:
The edge's endpoints, from left to right, are
0,2
1,2
1,3
2,3
2,3
1,3
1,3
0,3
So our Accumulator array looks like this:
{2, 4, 0, -6}
Which after the accumulation step, looks like this:
{2, 6, 6, 0}
Which means the part counts are
{1, 3, 3, 0}
As a warning, I've just come up with this on the spot, so while it feels correct I've no proof whether it actually is. Encouragingly, it worked on a few other token examples I tried as well.
Step 1: Maintain 3 column list (height, index_start, index_end)
Step 2: Sort the list according to height as primary key (decreasing), index_start as secondary key
Step 3: Let h be the highest height
Step 4: Merge all columns at height at least h and contiguous block
Step 5: Resulting number of blocks is blocks at height h
Step 6: h=h-1
Step 7: Go to step 4
This is the rough algo. Effectively the complexity is O(nlogn)

Optimization problem - finding a maximum

I have a problem at hand which can be reduced to something like this :
Assume a bunch of random points in a two-dimension plane X-Y where for each Y, there could be multiple points on X and for each X, there could be multiple points on Y.
Whenever a point (Xi,Yi) is chosen, no other point with X = Xi OR Y = Yi can be chosen. We have to choose the maximum number of points.
This can be reduced to simple maximum flow problem. If you have a point (xi, yi), in graph it should be represented with path from source S to point xi, from xi to yi and from yi to the last node (sink) T.
Note, if we have points (2, 2) and (2, 5), there's still only one path from S to x2. All paths (edges) have capacity 1.
The flow in this network is the answer.
about general problem
http://en.wikipedia.org/wiki/Max_flow
update
I don't have graphic editor right now to visualise problem, but you can easily draw example by hand. Let's say, points are (3, 3) (3, 5) (2, 5)
Then edges (paths) would be
S -> x2, S -> x3
y3 -> T, y5 -> T
x3 -> y3, x3 -> y5, x2 -> y5
Flow: S -> x2 -> y5 -> T and S -> x3 -> y3 -> T
The amount of 'water' going from source to sink is 2 and so is the answer.
Also there's a tutorial about max flow algorithms
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=maxFlow
Isn't this just the Hungarian algorithm?
Create an n×n matrix, with 0 at marked vertices, and 1 at unmarked vertices. The algorithm will choose n vertices, one for each row and column, which minimizes their sum. Simply count all the chosen vertices which equal 0, and you have your answer.
from munkres import Munkres
matrix = [[0, 0, 1],
[0, 1, 1],
[1, 0, 0]]
m = Munkres()
total = 0
for row, column in m.compute(matrix):
if matrix[row][column] == 0:
print '(%i, %i)' % (row, column)
total += 1
print 'Total: %i' % total
This runs in O(n3) time, where n is the number of rows in the matrix. The maximum flow solution runs in O(V3), where V is the number of vertices. As long as there are more than n chosen intersections, this runs faster; in fact, it runs orders of magnitude faster, as the number of chosen vertices goes up.
Different solution. It turns out that there's a lot of symmetry, and the answer is a lot simpler than I originally thought. The maximum number of things you can ever do is the minimum of the unique X's and unique Y's, which is O(NlogN) if you just want the result.
Every other shape is equivalent to a rectangle that contains the points, because it doesn't matter how many points you pull from the center of a rectangle, the order will never matter (if handled as below). Any shape that you pluck a point from now has one less unique X and one less unique Y, just like a rectangle.
So the optimal solution has nothing to do with connectedness. Pick any point that is on the edge of the smallest dimension (i.e. if len(unique-Xs)>len(unique-Ys), pick anything that has either maximum or minimum X). It doesn't matter how many connections it has, just which dimension is biggest, which can easily be done while looking at the sorted-unique lists created above. If you keep a unique-x and unique-y counter and decrement them when you delete all the unique nodes in that element of the list, then each deletion is O(1) and recalculating the lengths is O(1). So repeating this N times is at worst O(N), and the final complexity is O(NlogN) (due solely to the sorting).
You can pick any point along the shortest edge because:
if there's only one on that edge, you better pick it now or something else will eliminate it
if there's more than one on that edge, who cares, you will eliminate all of them with your pick anyways
Basically, you're maximizing "max(uniqX,uniqY)" at each point.
Update: IVlad caught an edge case:
If the dimensions are equal, take the edge with the least points. Even if they aren't equal, take the top or bottom of the unique-stack you're eliminating from that has the least points.
Case in point:
Turn 1:
Points: (1, 2); (3, 5); (10, 5); (10, 2); (10, 3)
There are 3 unique X's: 1, 3, 10
There are 3 unique Y's: 2, 3, 5
The "bounding box" is (1,5),(10,5),(10,2),(1,2)
Reaction 1:
The "outer edge" (outermost uniqueX or uniqueY lists of points) that has the least points is the left. Basically, look at the sets of points in x=1,x=10 and y=2,y=5. The set for x=1 is the smallest: one point. Pick the only point for x=1 -> (1,2).
That also eliminates (10,2).
Turn 2:
Points: (3, 5); (10, 5); (10, 3)
There are 2 unique X's: 3, 10
There are 2 unique Y's: 3, 5
The "bounding box" is (3,5),(10,5),(10,3),(3,3)
Reaction 2:
The "edge" of the bounding box that has the least is either the bottom or the left. We reached the trivial case - 4 points means all edges are the outer edges. Eliminate one. Say (10,3).
That also eliminates (10,5).
Turn 3:
Points: (3, 5)
Reaction 3:
Remove (3,5).
For each point, identify the number of other points (N) that would be disqualified by the selection of that point (i.e. the ones with the same X or Y values). Then, iterate over the non-disqualified points in order of increasing number of N disqualified points. When you are finished, you will have removed the maximum number of points.
The XY plane is a red herring. Phrase it as a set of elements, each of which has a set of mutually exclusive elements.
The algorithm then becomes a depth-first search. At each level, for each candidate node, calculate the set of excluded elements, the union of currently excluded elements with the elements excluded by the candidate node. Try candidate nodes in order of fewest excluded elements to most. Keep track of the best solution so far (the fewest excluded nodes). Prune any subtrees that are worse than the current best.
As a slight improvement at the cost of possible missed solutions, you can use Bloom filters for keeping track of the excluded sets.
This looks like a problem that can be solved with dynamic programming. Look into the algorithms for longest common substring, or the knapsack problem.
Based on a recommendation from IVlad, I looked into the Hopcroft–Karp algorithm. It's generally better than both the maximum flow algorithm and the Hungarian algorithm for this problem, often significantly. Some comparisons:
In general:
Max Flow: O(V3) where V is the number of vertices.
Hungarian: O(n3) where n is the number of rows in the matrix
Hopcroft-Karp: O(V √2V) where V is the number of vertices.
For a 50×50 matrix, with 50% chosen vertices:
Max Flow: 1,2503 = 1,953,125,000
Hungarian: 503 = 125,000
Hopcroft-Karp: 1,250 √2,500 = 62,500
For a 1000×1000 matrix, with 10 chosen vertices:
Max Flow: 103 = 1,000
Hungarian: 10003 = 1,000,000,000
Hopcroft-Karp: 10 √20 ≅ 44.7
The only time the Hungarian algorithm is better is when there is a significantly high proportion of points selected.
For a 100×100 matrix, with 90% chosen vertices:
Max Flow: 9,0003 = 729,000,000,000
Hungarian: 1003 = 1,000,000
Hopcroft-Karp: 9,000 √18,000 ≅ 1,207,476.7
The Max Flow algorithm is never better.
It's also quite simple, in practice. This code uses an implementation by David Eppstein:
points = {
0 : [0, 1],
1 : [0],
2 : [1, 2],
}
selected = bipartiteMatch(points)[0]
for x, y in selected.iteritems():
print '(%i, %i)' % (x, y)
print 'Total: %i' % len(selected)

Resources