Algorithm - How to pair blocks efficiently - algorithm

lets say, we have two cubes of 3X3 with different heights in cell. Each cell value represents the height of that cell. For example in below block 1, cell (1,1) has height of 1, cell(1,2) has height of 2 and so on.
block-1,
1 2 3
1 3 2
3 1 2
block-2,
4 3 2
4 2 3
2 4 3
Giver two such blocks how to check efficiently whether two blocks can be connected in such a way that there would be no cell mismatched and both blocks together produce a cuboid.
For example, above block-1 + block-2 can be connected and resultant block will be a perfect cuboid height 5. Resultant cuboid will be,
5 5 5
5 5 5
5 5 5
Extension of the problem: Given a set (size >= 50K) of such 4X4 blocks how to connect pair of blocks and produce maximum height sum of resultant cuboid? You can take only matched blocks full height to maximise total height sum. Non matched blocks will be ignored. Each cell height can be up to 20 unit.
Further extension of the problem: Blocks can be given in such a way that might be rotated to make pair with other to maximise resultant cuboids height sum.
Any clue?

You could solve the problem in two steps (1) find all pairs of blocks that connect (build a cuboid) and (2) find the best pairing that maximizes the total height.
Find connecting pairs
For this I would (a) build a surface representation for each block, (b) hash the blocks by their surface representation and (c) search for each block all connecting blocks by looking for the connecting surface models.
(a) Building the surface model
The basic idea is to represent each block by its surface. For this you just subtract the minimum entry in the matrix from every entry in the matrix
surface representation of block-1 will be
1 2 3 -1 0 1 2
1 3 2 --> 0 2 1
3 1 2 2 0 2
and surface representation of block-2 will be
4 3 2 -2 2 1 0
4 2 3 --> 2 0 1
2 4 3 0 2 1
(b) hash the blocks
Now you hash the blocks by their surface representation
(c) Finding connecting pairs
For each block you then compute the connecting surface model, by taking the maximum value in the surface representation and subtracting the entries in the matrix from it,
for block-1 this will yield
0 1 2 2 1 0
2 - 0 2 1 = 2 0 1
2 0 2 0 2 0
the blocks with this surface representation can be found using the hash table (note that the surface representation of block-2 will match).
Note: when you allow for rotation then you will have to perform 4 queries on the hash table with all possible rotations.
Finding the best pairing
To find the best pairing (maximizing the sum of connected blocks) I would use the Hungarian Algorithm. In order to do this you will have to build a matrix where the entry (i, j) contains the height of the block when the two blocks i and j connect and 0 otherwise.
Edit
I think the second step (finding best pairing) can be done more efficiently, by connecting pairs of matching blocks greedily (connecting pairs resulting in highest blocks first).
The intuition for this is: When you have two blocks a and b and they both have the same surface model. Then they will either both connect to another block c or they both won't connect to c. With this in mind, after the "find connecting pairs" step you will end up with pairs of groups of blocks (Xi, Yi) where each block of Xi connects to each block of Yi. If the two groups Xi and Yi are of equal size, then we can connect in any way we want and will always get the same sum of height of resulting cuboids. If one of the groups (wlog Yi) contains less elements then we want to avoid connecting to the smallest blocks in Xi. Thus we can greedily connect starting with the largest blocks and in doing so avoid connecting to the smallest blocks.
So the algorithm could work as follows:
(Hash each block according to its surface representation. Sort
blocks with the same surface representation descending according to
their offset (height of block minus surface representation)
Process blocks in order of descending offset, for each block: Search
for connecting block cBlock with highest offset, connect the two
blocks, remove cBlock from the hash table and the processing
pipeline.
Overall this should be doable in O(n log n)

Related

How to get max sum of k 2x1 or 1x2 tiles in Nx3 matrix

I have a problem where I have a N x 3 matrix with int values. I need to tile it with K 2x1 or 1x2 tiles so that they do not overlap and that I get the maximum sum with the use of dynamic programming.
What would the best way be to solve such a problem?
Example 5 x 3 matrix, K = 5:
2 6 2
6 5 6
2 6 2
1 1 1
1 1 1
Good tiles: (6,2), (6,2), (6,2), (6,5), (2,1)
Result = 38
And an example with an edge case:
2 x 3 Matrix, K = 2
0 4 1
3 4 1
Good tiles: (4,1), (4,3)
Result = 12
Let's define the state of a row as the cells that are covered by some of the K bricks. You have 8 combinations (2^3) from 000 (everything is not covered) to 111 (everything is covered) (you can use binary to encode the state for efficiency).
The dynamic programming matrix will be a[row][tiles][state]. Where row is the row we are processing, going top to bottom, tiles is how many tiles you placed already, state is the state as we defined above and the value is the current maximum sum.
To fill it we go top to bottom. We simplify things by only allowing a vertical tile to be placed on the current and the row above (not below). You can iterate through tile placement combinations between the rows (some are mutually exclusive). You have 3 vertical options and 2 horizontal options on the current row (5 options, for a total of 12 combinations, if I've done the math right). Also iterate through the possible values of 'titles'. For each combination look for all possible combination that allow it's placement on the previous row (so that the vertical tiles don't overlap) take the maximum and update the dynamic matrix. Some combinations are very strict (3 vertical tiles require 000 in the row above), while some are very relaxed (1 horizontal tile allows for every posibility). Do this on paper a few times to see how it works.
As an optimization note that you only need to know the values from the previous row, as the ones above that don't factor into so you can keep only the previous row and current row.
Algorithm should look something like this
For i from 0 to N
for tiles from 0 to K
for each combination
if tiles - combination.tiles < 0: continue
m = -1
for each state compatible with combination.previous_row
m = max(m, a[i-1][tiles - combination.tiles][state])
if m > 0
a[i][tiles][combination.state] = max(a[i][tiles][combination.state], m)
The solution is the maximum between the states on last row with tiles=K.
Complexity will be N*K* 12 combinations * 2^3 states so O(N*K). Memory can be O(K) with the trick I've mentioned above.

Maximum path cost in matrix

Can anyone tell the algorithm for finding the maximum path cost in a NxM matrix starting from top left corner and ending with bottom right corner with left ,right , down movement is allowed in a matrix and contains negative cost. A cell can be visited any number of times and after visiting a cell its cost is replaced with 0
Constraints
1 <= nxm <= 4x10^6
INPUT
4 5
1 2 3 -1 -2
-5 -8 -1 2 -150
1 2 3 -250 100
1 1 1 1 20
OUTPUT
37
Explanation is given in the image
Explanation of Output
Since you have also negative costs then use bellman-ford. What you do is that you change sign of all the costs(convert negative signs to positive and positive to negative) then find the shortest path and this path will be the longest because you have changed the signs.
If the sign is never becoms negative then use dijkstra shrtest-path but before that make all values negative and this will return you the longest path with it's cost.
You matrix is a direct graph. In your image you are trying to find a path(max or min) from index (0,0) to (n-1,n-1).
You need these things to represent it as a graph.
You need a linkedlist and in each node you have a first_Node, second_Node,Cost to move from first node to second.
An array of linkedlist. In each array index you save a linkedlist.If for example there is a path from 0 to 5 and 0 to 1(it's an undirected graph) then your graph will look like this.
If you want a direct-graph then simply add in adj[0] = 5 and do not add in adj[5] = 0 , this means that there is path from 0 to 5 but not from 5 to zero.
Here linkedlist represents only nodes which are connected not there cost. You have to add extra variable there which keep cost for each two nodes and it will look like this.
Now instead of first linkedlist put this linkedlist in your array and you have a graph now to run shortest or longest path algorithm.
If you want an intellgent algorithm then you can use A* with heuristic, i guess manhattan will be best.
If cost of your edges is not negative then use Dijkstra.
If cost is negative then use bellman-ford algorithm.
You can always find the longest path by converting the minus sign to plus and plus to minus and then run shortest path algorithm. Path founded will be longest.
I answered this question and as you said in comments to look at point two. If that's a task then main idea of this assignment is ensure the Monotonocity.
h stands for heuristic cost.
A stands for accumulated cost.
Which says that each node the h(A) =< h(A) + A(A,B). Means if you want to move from A to B then cost should not be decreasing(can you do something with your values such that this property will hold) but increasing and once you satisfy this condition then everyone node which A* chooses , that node will be part of your path from source to Goal because this is the path with shortest/longest value.
pathMax You can enforece monotonicity. If there is path from A to B such that f(S...AB) < f(S ..B) then set cost of the f(S...AB) = Max(f(S...AB) , f(S...A)) where S means source.
Since moving up is not allowed, paths always look like a set of horizontal intervals that share at least 1 position (for the down move). Answers can be characterized as, say
struct Answer {
int layer[N][2]; // layer[i][0] and [i][1] represent interval start&end
// with 0 <= layer[i][0] <= layer[i][1] < M
// layer[0][0] = 0, layer[N][1] = M-1
// and non-empty intersection of layers i and i+1
};
An alternative encoding is to note only layer widths and offsets to each other; but you would still have to make sure that the last layer includes the exit cell.
Assuming that you have a maxLayer routine that finds the highest-scoring interval in each layer (const O(M) per layer), and that all such such layers overlap, this would yield an O(N+M) optimal answer. However, it may be necessary to expand intervals to ensure that overlap occurs; and there may be multiple highest-scoring intervals in a given layer. At this point I would model the problem as a directed graph:
each layer has one node per score-maximizing horizontal continuous interval.
nodes from one layer are connected to nodes in the next layer according to the cost of expanding both intervals to achieve at least 1 overlap. If they already overlap, the cost is 0. Edge costs will always be zero or negative (otherwise, either source or target intervals could have scored higher by growing bigger). Add the (expanded) source-node interval value to the connection cost to get an "edge weight".
You can then run Dijkstra on this graph (negate edge weights so that the "longest path" is returned) to find the optimal path. Even better, since all paths pass once and only once through each layer, you only need to keep track of the best route to each node, and only need to build nodes and edges for the layer you are working on.
Implementation details ahead
to calculate maxLayer in O(M), use Kadane's Algorithm, modified to return all maximal intervals instead of only the first. Where the linked algorithm discards an interval and starts anew, you would instead keep a copy of that contender to use later.
given the sample input, the maximal intervals would look like this:
[0]
1 2 3 -1 -2 [1 2 3]
-5 -8 -1 2 -150 => [2]
1 2 3 -250 100 [1 2 3] [100]
1 1 1 1 20 [1 1 1 1 20]
[0]
given those intervals, they would yield the following graph:
(0)
| =>0
(+6)
\ -1=>5
\
(+2)
=>7/ \ -150=>-143
/ \
(+7) (+100)
=>12 \ / =>-43
\ /
(+24)
| =>37
(0)
when two edges incide on a single node (row 1 1 1 1 20), carry forward only the highest incoming value.
For each element in a row, find the maximum cost that can be obtained if we move horizontally across the row, given that we go through that element.
Eg. For the row
1 2 3 -1 -2
The maximum cost for each element obtained if we move horizontally given that we pass through that element will be
6 6 6 5 3
Explanation:
for element 3: we can move backwards horizontally touching 1 and 2. we will not move horizontally forward as the values -1 and -2, reduces the cost value.
So the maximum cost for 3 = 1 + 2 + 3 = 6
The maximum cost matrix for each of elements in a row if we move horizontally, for the input you have given in the description will be
6 6 6 5 3
-5 -7 1 2 -148
6 6 6 -144 100
24 24 24 24 24
Since we can move vertically from one row to the below row, update the maximum cost for each element as follows:
cost[i][j] = cost[i][j] + cost[i-1][j]
So the final cost matrix will be :
6 6 6 5 3
1 -1 7 7 -145
7 5 13 -137 -45
31 29 37 -113 -21
Maximum value in the last row of the above matrix will be give you the required output i.e 37

Transform Matrix A to B through swapping elements

Well, I got this homework where i must find the minimum number of swaps to convert some matrix A to other matrix B given, the constraints are very limited ("may not exceed 10 elements on the matrix and the matrix will also be N=M"), that means that it will be always a 1x1 matrix and a 2x2 matrix, (which is trivial), the problem is at the 3x3 matrix.
I already tried to backtrack the elements by seeking the manhattan distance between two elements on the matrix that are separated, multiply by two and substract - 1, e.g.
The rules of swapping are: You may swap elements that are adjacent, we define adjacent when they share the same row or the same column.
1 3 2
6 5 4
7 8 9
target:
1 2 3
4 5 6
7 8 9
The manhattan distance between {1,3} is 1, so 2*1 - 1 = 1, 1 swap needed.
for {6,4} is 2, so 2*2 - 1 = 3, 3 swaps needed, then, the final answer is 4 swaps needed.
However, my program is getting rejected by the automatic corrector, any ideas on how to solve this problem?

Determining the Longest Continguous Subsequence

There are N nodes (1 <= N <= 100,000) various positions along a
long one-dimensional length. The ith node is at position x_i (an
integer in the range 0...1,000,000,000) and has a node type b_i(an integer in
the range 1..8). Nodes can not be in the same position
You want to get a range on this one-dimension in which all of the types of nodes are fairly represented. Therefore, you want to ensure that, for whatever types of nodes that are present in the range, there is an equal number of each node type (for example, a range with 27 each of types 1 and 3 is ok, a range with 27 of types 1, 3, and 4 is
ok, but 9 of type 1 and 10 of type 3 is not ok). You also want
at least K (K >= 2) types (out of the 8 total) to be represented in the
rand. Find the maximum size of this range that satisfies the constraints. The size of a photo is the difference between the maximum and minimum positions of the nodes in the photo.
If there are no ranges satisfying the constraints, output -1 instead.
INPUT:
* Line 1: N and K separated by a space
* Lines 2..N+1: Each line contains a description of a node as two
integers separated by a space; x(i) and its node type.
INPUT:
9 2
1 1
5 1
6 1
9 1
100 1
2 2
7 2
3 3
8 3
INPUT DETAILS:
Node types: 1 2 3 - 1 1 2 3 1 - ... - 1
Locations: 1 2 3 4 5 6 7 8 9 10 ... 99 100
OUTPUT:
* Line 1: A single integer indicating the maximum size of a fair
range. If no such range exists, output -1.
OUTPUT:
6
OUTPUT DETAILS:
The range from x = 2 to x = 8 has 2 each of types 1, 2, and 3. The range
from x = 9 to x = 100 has 2 of type 1, but this is invalid because K = 2
and so you need at least 2 distinct types of nodes.
Could You Please help in suggesting some algorithm to solve this. I have thought about using some sort of priority queue or stack data structure, but am really unsure how to proceed.
Thanks, Todd
It's not too difficult to invent almost linear-time algorithm because recently similar problem was discussed on CodeChef: "ABC-Strings".
Sort nodes by their positions.
Prepare all possible subsets of node types (for example, we could expect types 1,2,4,5,7 to be present in resulting interval and all other types not present there). For K=2 there may be only 256-8-1=247 subsets. For each subset perform remaining steps:
Initialize 8 type counters to [0,0,0,0,0,0,0,0].
For each node perform remaining steps:
Increment counter for current node type.
Take L counters for types included to current subset, subtract first of them from other L-1 counters, which produces L-1 values. Take remaining 8-L counters and combine them together with those L-1 values into a tuple of 7 values.
Use this tuple as a key for hash map. If hash map contains no value for this key, add a new entry with this key and value equal to the position of current node. Otherwise subtract value in the hash map from the position of current node and (possibly) update the best result.

1000 items, 1000 nodes, 3 items per node, best replication scheme to minimize data loss as nodes fail? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I was wondering what would be the right answer for Question 2-44 in Skiena's Algorithm Design Manual (2nd ed.)
The question is the following:
We have 1,000 data items to store on 1,000 nodes. Each node can store
copies of exactly three different items. Propose a replication scheme
to minimize data loss as nodes fail. What is the expected number of
data entries that get lost when three random nodes fail?
I was thinking about node n having data item from n, n+1 & n+2.
So if 3 consecutive nodes are lost then we lose 1 item.
Is there a better solution?
The approach you propose is not bad but also take a look here. The ideas used in RAID may give you some ideas. For instance if you have 2 data items, than having storage for 3 items you can recover any of them if the other fails. The idea is quite simple - you store the items in 2 nodes and the xor of their bits in the third item. I believe if you utilize this idea you will be able to have more then 3 backups of a single data item(i.e. more then 3 nodes have to fail in order to loose the information).
I thought of methods like RAID levels but Skiena says "each node can store copies of exactly three different items." Even though XOR'red bit patterns of two separate data can be stored in the same amount of space, I did not think that it was something the problem was looking for.
So, I started with what the OP thought of: Store the three copies of each data to its next two neighbors in a striped fashion. For example, the following is for when N==6 and the data are the integers from 0 to 5 (4 and 5 wrap around and use the nodes 0 and 1):
nodes: 0 1 2 3 4 5
===========
copy 0 -> 0 1 2 3 4 5
copy 1 -> 5 0 1 2 3 4
copy 2 -> 4 5 0 1 2 3
Of all the 20 combinations of three-node failures, there are six that lose exactly one piece of data. For example; when nodes 1, 2, and 3 fail, the data 1 gets lost:
===========
0 X X X 4 5
5 X X X 3 4
4 X X X 2 3
Similar for each other data, making 6 of the 20 combinations lose data. Since Skiena does not describe what "data loss" means for the application: Does the loss of a single data point mean that the entire collection is wasted, or losing a single data point is acceptable and is better than losing two?
If the loss of even a single data point means that the entire collection is wasted, then we can do better. Three times better! :)
Instead of distributing the copies of data to the right-hand nodes in a striped fashion, define groups of three nodes that share data. For example, let 0, 1, and 2 share their data and 3, 4, and 5 share their data:
nodes: 0 1 2 3 4 5
===========
copy 0 -> 0 1 2 3 4 5
copy 1 -> 2 0 1 5 3 4
copy 2 -> 1 2 0 4 5 3
This time, there are only 2 of the 20 combinations produce data loss ever. Data 0, 1, and 2 are lost together when nodes 0, 1, and 2 fail:
===========
x x x 3 4 5
x x x 5 3 4
x x x 4 5 3
And data 3, 4, and 5 are lost together when nodes 3, 4, and 5 fail:
===========
0 1 2 x x x
2 0 1 x x x
1 2 0 x x x
That amounts to just 2 of the 20 combinations of three-node failures. When the same nodes share same data, it effectively merges data losses into fewer number of combinations.
Ali
Let,
D = {1,...,d_i,...,d} denote the data items and d_i a given data element
N = {1,...,n_k,...,n} denote the storage cluster and n_k a given storage node.
We say d_i is stored by n_k, loosely denoted by d_i \in n_k.
My replication model has the following assumptions:
1- Every data item must be stored at least in one given node during initialization. I.e.:
Exist at least one 1 <= k <=n s.t. P(d_i \in n_k) = 1.
2- From (1), at initialization time, the probability of d_i to be in a given node is at least 1/n. I.e.:
For any data item 1 <= i <= d and a random node n, P(d_i \in n) >= 1/n.
Given the problem statement, by design, we want to have this distribution uniform across the data set.
3- Lastly, by design, the probability of a data item d_i to be in a given node n should be independent between data items. I.e.:
P(d_i \in n | d_j \in n) = P(d_i \in n)
This is because we don't assume the probability of node failure is independent between adjacent nodes (e.g.: in datacenters adjacent nodes be sharing the same network switch, etc).
From these assumptions, I proposed the following replication model (for the problem instance where d = n and each node stores exactly 3 distinct data items).
(1) Perform a random permutation of data set.
(2) Using a sliding window of length 3 and stride 1, rotate over the shuffled data set and map the data items to each node.
E.g.:
D = {A,B,C,D}
N = {1,2,3,4}
(1) {C, B, A, D}
(2) 1 -> {C, B, A}, 2 -> {B, A, D}, 3-> {A, D, C}, 4-> {D, C, B}
the random shuffling will ensure independent (3) and uniform distribution (2). While the sliding window of stride 1 guarantees (1).
Let's denote, the sliding window of a given node n_k as the ordered set w_k = {w_k1, w_k2, w_k3}. n_k is said to be the master node for w_k1 (first element of w_k). Any other node n_j containing w_k1 is a replica node. N.B.: the proposed replication model guarantees only one master node for any d_i, while the number of replica nodes depends on the window length.
In the example above: n_1 is the master node for C and n_3 and n_4 replica nodes.
Back to the original problem, given this schema, we can state the probability of data loss is the lost of the master node and all replicas for a given data item.
P(d_i is lost) = P(master node for d_i fails and replica 1 fails and replica 2 fails).
without formal proof, an unbiased random permutation in step (1) above would result
P(d_i is lost) = P(master node for d_i fails) * P(replica 1 fails) * P(replica 2 fails).
again, the random permutation is a heuristic to abstract the joint distribution for nodes failure.
From assumptions (2) and (3), P(d_i is lost) = c, for any d_i, at initialization time.
That said for d = n = 1000 and replication factor of 3 (i.e.: window length equals 3).
P(d_i is lost) = 1/1000 * 1/999 * 1/998 ~ 10^-9
Your approach seems essentially correct but can benefit from a failover strategy. Notice that Prof. Skiena has asked "to minimize data loss as nodes fail" which suggests that failing nodes will be a common occurrence.
You may want to have a look at consistent hashing.
Also, there is a great post by reddit engineers about the perils of not using consistent hashing (instead using a fixed MOD hashing).

Resources