How to find contiguous areas in matrix with less memory? - algorithm

Suppose I have a matrix of 0 and 1. Now I would like to count all contiguous areas in the matrix that contain only 1.
allocate a new matrix "visited" to track visited elements
COUNT = 0
for each element E in the matrix
if E == 1 and E is not visited
COUNT = COUNT + 1
run BFS/DFS to visit all not-visited matrix elements connected to E
return COUNT
If the matrix is N x N then we need additional N x N "visited" matrix and also a queue/stack for BFS/DFS. Now I wonder if there is an algorithm, which solves this problem and requires less memory.

If memory is so critical, then you can use the following approach:
Additionally keep only an array with the length of matrix's one row, then scan matrix rows from up to bottom, and each time when you process current row, there are intervals of continuous 1's in it, which can merge different pieces located above them, for example
1110001110011
1111001110011
0011111000011
if you consider only first 2 rows, there are three different pieces, but 3rd row merges first two of them. So you can use the additional array to keep what different pieces are there until last row, i.e. initially it will be filled with 0's, and when you are just starting scanning 3rd row of matrix, that array should look like 1111002220033, i.e. each number shows to which part that cell belongs. After processing 3rd row they will be merged and additional array will become 0011111000033. So after scanning each row some parts are merged, and some parts disappear. Each time a part disappears, you can increment the counter to have the answer in the end.
However, this is more tricky and complex approach than BFS/DFS, and I don't think that you will need to use this anyway in practice.

Related

Bin Packing with Overflow

I have N bins (homogenous or heterogeneous in size, depending on variant of task) in which I am trying to fit M items (always heterogeneous in size). Items can be larger than a single bin and are allowed to overflow to the next bin(s) (but no wrap around from bin N-1 to 0).
The more bins an item spans, the higher its allocation cost.
I want to minimize the overall allocation cost of fitting all M into N bins.
Everything is static. It is guaranteed that all M fit in N.
I think I am looking for a variant of the Bin Packing algorithm. But any hints towards an existing solution/approximation/good heuristic are appreciated.
My current approach looks like this:
sort items by size
for i in items:
for b in bins:
try allocation of i starting at b
if allocation valid:
record cost
do allocation of i in b with lowest recorded cost
update all b fill level
So basically a greedy by size approach with O(MxNxC) runtime, where C~"longest allocation across banks" (try allocation takes C time).
I would suggest dynamic programming for the exact solution. To make it easier to visualize, assume each bin's size is the length of an array of cells. Since they are contiguous, you can visualize them as a contiguous set of arrays, e.g.
|xxx|xx|xxx|xxxx|
the delimiters of the arrays are || and the positions in the arrays are given by x. so this has 4 arrays, arr_0 of size 3, arr_1 of size 2 and so on.
If you place an item at position i, it will ocuppy position i to i+(h-1), where h is the size of the items. E.g. if items are of size h=5, and you place an item at position 1, you would get
|xb|bb|bbx|xxxx|
One trick to use is that if we introduce the additional constraint that the items need to be inserted “in order”, i.e. the first inserted is the leftmost inserted item, the second is the second-leftmost inserted item, etc. then this problem will have the same optimal solution solution as the original one (since we can just take the optimal solution and insert it in order).
Consider pos(k-1,i) to be the optimal position to insert the k-th object
Given that we the k-1th object ended at (I-1). opt_c(k-i,i) the optimal extra-cost of inserting the k-1…N pieces, given that the k-1th object ended at (I-1).
Then pos(N-1,i) can be easily calculated by running through the cells and calculating the extra-cost (and even easier by noting that at least one of the borders should match up with pos(N-1,i), but to make analysis easier we will evaluate the extra-cost each of the I…NumCells-h). opt_c(N-1,i) equals this extra-cost.
Similarly,
pos(N-2,i) = argmin_x extra_cost(x,i, pos(N-1,x+h)) + opt(N-1,x+h)
Where extra_cost is the extra_cost(x,i,j) of inserting at x, given that the last inserted object ended at I-1 and the next inserted object will start at j.
And by substituting x = opt(N-2,i)
opt(N-2,i) = extra_cost(opt(N-2,i),I, pos(N-1,opt(N-2,i)+h)) + opt(N-1,opt(N-2,i)+h)
By induction, for all 1<=W<N-1
pos(W,i) = argmin_x extra_cost(x,i, pos(W+1,x+h)) + opt(W+1,x+h)
And
opt(W,i) = extra_cost( pos(W,i),I, pos(N+1, pos(W,i)+h)) + opt(N-1, pos(W,i)+h)
And your final result is given by the minimal over i of all opt(0,i).

Find every possible permutation of bits in a 2D array that has a single group of contiguous 1s

Below I have represented 2 permutations of bits in a 2D bit array (1s are red). The matrix on the left has a single group of contiguous 1s but the right matrix has 2.
I would like to loop through every possible permutation of binary values in such an array that has a single group of contiguous 1s. I am aware that for a 10×7 grid like above there are 2(10 × 7) permutations when you include non-contiguous permutations, but my hope is that by excluding non-contiguous permutations I will be able to go through them all in reasonable CPU time.
Speaking of reasonableness, I am also interested in an algorithm to determine how many permutations are contiguous.
My question is similar to, but different from, these:
2D Bit matrix with every possible combination
Finding Contiguous Areas of Bits in 2D Bit Array
Any help is appreciated. I'm a bit stuck.
So, I found that the OEIS (Online Encyclopedia of Integer Sequences) has a sequence from n = 0..7 for the "number of nonzero n X n binary arrays with all 1's connected" (A059525). They provide no formula though except for grids fixed at 1 cell wide (triangular numbers), or 2 or 3 cells wide. There's a sequence for 4 x n too but no formula.
Two approaches I can think of. One is to iterate through all possible sequences and devise a test for non-contiguous groups and some method for skipping over large regions guaranteed to be non-contiguous.
A second approach is to attempt to build all sets of contiguous groups so you don't need to test. This is the approach I would take:
Let n = width * height
Enumerate blocks left to right, top to bottom from 0 to n - 1
Fix a block at position 0.
Generate all contiguous solutions between 1 and n blocks extending from position 0
Omit position 0 and fix a block at position 1
Find all contiguous solutions between 1 and n - 1 blocks extending from position 1
Continue until you've reached position n
You can place your pieces according to the following rules, backtracking for the next placement at each depth:
To left of most recently placed piece if placed in row above prior piece provided that no other neighbors exist for that vacancy.
Above left-most available piece in row of most recently placed piece if no other neighbors exist for that vacancy.
To right of most recently placed piece (if adjacent piece exists)
On the next row, farthest left vacancy such that upper row has a piece above any contiguous right remaining neighbors
Next move for any backtracked position is first available move to the right of, or in the row below, the backtracked position (obeying prior 4 rules)

Algorithm challenge to merge sets

Given n sets of numbers. Each set contains some numbers from 1 to 100. How to select sets to merge into the longest set under a special rule, only two non-overlapping sets can merge. [1,2,3] can merge with [4,5] but not [3,4]. What will be an efficient algorithm to merge into the longest set.
My first attempt is to form an n by n matrix. Each row/column represents a set. Entry(i,j) equals to 1 if two sets overlap, entry(i,i) stores the length of set i. Then the questions becomes can we perform row and column operations at the same time to create a diagonal sub-matrix on top left corner whose trace is as large as possible.
However, I got stuck in how to efficiently perform row and column operations to form such a diagonal sub-matrix on top left corner.
As already pointed out in the comments (maximum coverage problem) you have a NP-hart problem. Luckily, matlab offers solvers for integer linear programming.
So we try to reduce the problem to the form:
min f*x subject to Ax<=b , 0<=x
There are n sets, we can encode a set as a vector of 0s and 1s. For example (1,1,1,0,0,...) would represent {1,2,3} and (0,0,1,1,0,0...) - {3,4}.
Every column of A represents a set. A(i,j)=1 means that the i-th element is in the j-th set, A(i,j)=0 means that the i-th element is not in the j-th set.
Now, x represents the sets we select: if x_j=1 than the set j is selected, if x_j=0 - than not selected!
As every element must be at most in one set, we choose b=(1, 1, 1, ..., 1): If we take two sets which contain the i-th element, than the i-th element of (Ax) would be at least 2.
The only question left is what is f? We try to maximize the number of elements in the union, so we choose f_j=-|set_j| (minus owing to min<->max conversion), with |set_j| - number of elements in the j-th set.
Putting it all in matlab we get:
f=-sum(A)
xopt=intlinprog(f.',1:n,A,ones(m,1),[],[],zeros(n,1),ones(n,1))
f.' - cost function as column
1:n - all n elements of x are integers
A - encodes n sets
ones(m,1) - b=(1,1,1...), there are m=100 elements
[],[] - there are no constrains of the form Aeq*x=beq
zeros(n,1), 0<=x must hold
ones(n,1), x<=1 follows already from others constrains, but maybe it will help the solver a little bit
You can represent sets as bit fields. A bitwise and operation yielding zero would indicate non-overlapping sets. Depending on the width of the underlying data type, you may need to perform multiple and operations. For example, with a 64 bit machine word size, I'd need two words to cover 1 to 100 as a bit field.

Touching segments

Can anyone please suggest me algorithm for this.
You are given starting and the ending points of N segments over the x-axis.
How many of these segments can be touched, even on their edges, by exactly two lines perpendicular to them?
Sample Input :
3
5
2 3
1 3
1 5
3 4
4 5
5
1 2
1 3
2 3
1 4
1 5
3
1 2
3 4
5 6
Sample Output :
Case 1: 5
Case 2: 5
Case 3: 2
Explanation :
Case 1: We will draw two lines (parallel to Y-axis) crossing X-axis at point 2 and 4. These two lines will touch all the five segments.
Case 2: We can touch all the points even with one line crossing X-axis at 2.
Case 3: It is not possible to touch more than two points in this case.
Constraints:
1 ≤ N ≤ 10^5
0 ≤ a < b ≤ 10^9
Let assume that we have a data structure that supports the following operations efficiently:
Add a segment.
Delete a segment.
Return the maximum number of segments that cover one point(that is, the "best" point).
If have such a structure, we can get use the initial problem efficiently in the following manner:
Let's create an array of events(one event for the start of each segment and one for the end) and sort by the x-coordinate.
Add all segments to the magical data structure.
Iterate over all events and do the following: when a segment start, add one to the number of currently covered segments and remove it from that data structure. When a segment ends, subtract one from the number of currently covered segment and add this segment to the magical data structure. After each event, update the answer with the value of the number of currently covered segments(it shows how many segments are covered by the point which corresponds to the current event) plus the maximum returned by the data structure described above(it shows how we can choose another point in the best possible way).
If this data structure can perform all given operations in O(log n), then we have an O(n log n) solution(we sort the events and make one pass over the sorted array making a constant number of queries to this data structure for each event).
So how can we implement this data structure? Well, a segment tree works fine here. Adding a segment is adding one to a specific range. Removing a segment is subtracting one from all elements in a specific range. Get ting the maximum is just a standard maximum operation on a segment tree. So we need a segment tree that supports two operations: add a constant to a range and get maximum for the entire tree. It can be done in O(log n) time per query.
One more note: a standard segment tree requires coordinates to be small. We may assume that they never exceed 2 * n(if it is not the case, we can compress them).
An O(N*max(logN, M)) solution, where M is the medium segment size, implemented in Common Lisp: touching-segments.lisp.
The idea is to first calculate from left to right at every interesting point the number of segments that would be touched by a line there (open-left-to-right on the lisp code). Cost: O(NlogN)
Then, from right to left it calculates, again at every interesting point P, the best location for a line considering segments fully to the right of P (open-right-to-left on the lisp code). Cost O(N*max(logN, M))
Then it is just a matter of looking for the point where the sum of both values tops. Cost O(N).
The code is barely tested and may contain bugs. Also, I have not bothered to handle edge cases as when the number of segments is zero.
The problem can be solved in O(Nlog(N)) time per test case.
Observe that there is an optimal placement of two vertical lines each of which go through some segment endpoints
Compress segments' coordinates. More info at What is coordinate compression?
Build a sorted set of segment endpoints X
Sort segments [a_i,b_i] by a_i
Let Q be a priority queue which stores right endpoints of segments processed so far
Let T be a max interval tree built over x-coordinates. Some useful reading atWhat are some sources (books, etc.) from where I can learn about Interval, Segment, Range trees?
For each segment make [a_i,b_i]-range increment-by-1 query to T. It allows to find maximum number of segments covering some x in [a,b]
Iterate over elements x of X. For each x process segments (not already processed) with x >= a_i. The processing includes pushing b_i to Q and making [a_i,b_i]-range increment-by-(-1) query to T. After removing from Q all elements < x, A= Q.size is equal to number of segments covering x. B = T.rmq(x + 1, M) returns maximum number of segments that do not cover x and cover some fixed y > x. A + B is a candidate for an answer.
Source:
http://www.quora.com/What-are-the-intended-solutions-for-the-Touching-segments-and-the-Smallest-String-and-Regex-problems-from-the-Cisco-Software-Challenge-held-on-Hackerrank

Solving ACM ICPC - SEERC 2009

I have been sitting on this for almost a week now. Here is the question in a PDF format.
I could only think of one idea so far but it failed. The idea was to recursively create all connected subgraphs which works in O(num_of_connected_subgraphs), but that is way too slow.
I would really appreciate someone giving my a direction. I'm inclined to think that the only way is dynamic programming but I can't seem to figure out how to do it.
OK, here is a conceptual description for the algorithm that I came up with:
Form an array of the (x,y) board map from -7 to 7 in both dimensions and place the opponents pieces on it.
Starting with the first row (lowest Y value, -N):
enumerate all possible combinations of the 2nd player's pieces on the row, eliminating only those that conflict with the opponents pieces.
for each combination on this row:
--group connected pieces into separate networks and number these
networks starting with 1, ascending
--encode the row as a vector using:
= 0 for any unoccupied or opponent position
= (1-8) for the network group that that piece/position is in.
--give each such grouping a COUNT of 1, and add it to a dictionary/hashset using the encoded vector as its key
Now, for each succeeding row, in ascending order {y=y+1}:
For every entry in the previous row's dictionary:
--If the entry has exactly 1 group, add it's COUNT to TOTAL
--enumerate all possible combinations of the 2nd player's pieces
on the current row, eliminating only those that conflict with the
opponents pieces. (change:) you should skip the initial combination
(where all entries are zero) for this step, as the step above actually
covers it. For each such combination on the current row:
+ produce a grouping vector as described above
+ compare the current row's group-vector to the previous row's
group-vector from the dictionary:
++ if there are any group-*numbers* from the previous row's
vector that are not adjacent to any gorups in the current
row's vector, *for at least one value of X*, then skip
to the next combination.
++ any groups for the current row that are adjacent to any
groups of the previous row, acquire the lowest such group
number
++ any groups for the current row that are not adjacent to
any groups of the previous row, are assigned an unused
group number
+ Re-Normalize the group-number assignments for the current-row's
combination (**) and encode the vector, giving it a COUNT equal
to the previous row-vector's COUNT
+ Add the current-row's vector to the dictionary for the current
Row, using its encoded vector as the key. If it already exists,
then add it's COUNT to the COUNT for the pre-exising entry
Finally, for every entry in the dictionary for the last row:
If the entry has exactly one group, then add it's COUNT to TOTAL
**: Re-Normalizing simply means to re-assign the group numbers so as to eliminate any permutations in the grouping pattern. Specifically, this means that new group numbers should be assigned in increasing order, from left-to-right, starting from one. So for example, if your grouping vector looked like this after grouping ot to the previous row:
2 0 5 5 0 3 0 5 0 7 ...
it should be re-mapped to this normal form:
1 0 2 2 0 3 0 2 0 4 ...
Note that as in this example, after the first row, the groupings can be discontiguous. This relationship must be preserved, so the two groups of "5"s are re-mapped to the same number ("2") in the re-normalization.
OK, a couple of notes:
A. I think that this approach is correct , but I I am really not certain, so it will definitely need some vetting, etc.
B. Although it is long, it's still pretty sketchy. Each individual step is non-trivial in itself.
C. Although there are plenty of individual optimization opportunities, the overall algorithm is still pretty complicated. It is a lot better than brute-force, but even so, my back-of-the-napkin estimate is still around (2.5 to 10)*10^11 operations for N=7.
So it's probably tractable, but still a long way off from doing 74 cases in 3 seconds. I haven't read all of the detail for Peter de Revaz's answer, but his idea of rotating the "diamond" might be workable for my algorithm. Although it would increase the complexity of the inner loop, it may drop the size of the dictionaries (and thus, the number of grouping-vectors to compare against) by as much as a 100x, though it's really hard to tell without actually trying it.
Note also that there isn't any dynamic programming here. I couldn't come up with an easy way to leverage it, so that might still be an avenue for improvement.
OK, I enumerated all possible valid grouping-vectors to get a better estimate of (C) above, which lowered it to O(3.5*10^9) for N=7. That's much better, but still about an order of magnitude over what you probably need to finish 74 tests in 3 seconds. That does depend on the tests though, if most of them are smaller than N=7, it might be able to make it.
Here is a rough sketch of an approach for this problem.
First note that the lattice points need |x|+|y| < N, which results in a diamond shape going from coordinates 0,6 to 6,0 i.e. with 7 points on each side.
If you imagine rotating this diamond by 45 degrees, you will end up with a 7*7 square lattice which may be easier to think about. (Although note that there are also intermediate 6 high columns.)
For example, for N=3 the original lattice points are:
..A..
.BCD.
EFGHI
.JKL.
..M..
Which rotate to
A D I
C H
B G L
F K
E J M
On the (possibly rotated) lattice I would attempt to solve by dynamic programming the problem of counting the number of ways of placing armies in the first x columns such that the last column is a certain string (plus a boolean flag to say whether some points have been placed yet).
The string contains a digit for each lattice point.
0 represents an empty location
1 represents an isolated point
2 represents the first of a new connected group
3 represents an intermediate in a connected group
4 represents the last in an connected group
During the algorithm the strings can represent shapes containing multiple connected groups, but we reject any transformations that leave an orphaned connected group.
When you have placed all columns you need to only count strings which have at most one connected group.
For example, the string for the first 5 columns of the shape below is:
....+ = 2
..+++ = 3
..+.. = 0
..+.+ = 1
..+.. = 0
..+++ = 3
..+++ = 4
The middle + is currently unconnected, but may become connected by a later column so still needs to be tracked. (In this diagram I am also assuming a up/down/left/right 4-connectivity. The rotated lattice should really use a diagonal connectivity but I find that a bit harder to visualise and I am not entirely sure it is still a valid approach with this connectivity.)
I appreciate that this answer is not complete (and could do with lots more pictures/explanation), but perhaps it will prompt someone else to provide a more complete solution.

Resources