Perimeter of union of N rectangles - algorithm

I want to know the efficient way to solve this problem:
Given N rectangles that given a top-left and bottom-right corner, please find the perimeter of union of N rectangles.
I only have O(N^2) algorithm and it's too slow, so please find more efficient algorithm.
You can assume that coordinate value is positive integer and less than 100000.
EDIT:
For example, in this case, the perimeter is 30.
An O(n^2) algorithm:
for x=0 to maxx
for i=0 to N
if lowx[i] = x
for j=lowy[i] to highy[i]
d[j]++
if d[j] = 1 then ret++
if highy[i] = x
for j=lowy[i] to highy[i]
d[j]--
if d[j] = 0 then ret++
for y=0 to maxy
if d[y] = 0 && d[y + 1] >= 1 then ret++
if d[y] >= 1 && d[y + 1] = 0 then ret++
The final ret is the answer.

There's an O(n log n)-time sweepline algorithm. Apply the following steps to compute the vertical perimeter of the shape. Transpose the input and apply them again to compute the horizontal perimeter.
For each rectangle, prepare a start event keyed by the left x-coordinate whose value is the y-interval, and a stop event keyed by the right x-coordinate whose value is the y-interval. Sort these events by x-coordinate and process them in order. At all times, we maintain a data structure capable of reporting the number of points at which the boundary intersects the sweepline. On the 2n - 1 intervals between event points, we add this number times the width of the interval to the perimeter.
The data structure we need supports the following operations in time O(log n).
insert(ymin, ymax) -- inserts the interval [ymin, ymax] into the data structure
delete(ymin, ymax) -- deletes the interval [ymin, ymax] from the data structure
perimeter() -- returns the perimeter of the 1D union of the contained intervals
Since the input coordinates are bounded integers, one possible implementation is via a segment tree. (There's an extension to real inputs that involves sorting the y-coordinates of the input and remapping them to small integers.) Each segment has some associated data
struct {
int covers_segment;
bool covers_lower;
int interior_perimeter;
bool covers_upper;
};
whose scope is the union of segments descended from it that are present in the input intervals. (Note that a very long segment has no influence on the leafmost levels of the tree.)
The meaning of covers_segment is that it's the number of intervals that have this segment in their decomposition. The meaning of covers_lower is that it's true if one of the segments descended from this one with the same lower endpoint belongs to the decomposition of some interval. The meaning of interior_perimeter is the 1D perimeter of segments in scope (as described above). The meaning of covers_upper is akin to covers_lower, with the upper endpoint.
Here's an example.
0 1 2 3 4 5 6 7 8 9
[---A---]
[---B---] [-D-]
[-C-]
Intervals are A ([0, 4]) and B ([2, 4], [4, 6]) and C [3, 4] [4, 5] and D [7, 8] [8, 9].
c_s c_l i_p c_u
[0, 1] 0 F 0 F
[0, 2] 0 F 0 F
[1, 2] 0 F 0 F
[0, 4] 1 T 0 T
[2, 3] 0 F 0 F
[2, 4] 1 T 1 T
[3, 4] 1 T 0 T
[0, 8] 0 T 2 F
[4, 5] 1 T 0 T
[4, 6] 1 T 1 T
[5, 6] 0 F 0 F
[4, 8] 0 T 2 F
[6, 7] 0 F 0 F
[6, 8] 0 F 1 F
[7, 8] 1 T 0 T
[0, 9] 0 T 2 T
[8, 9] 1 T 0 T
To insert (delete) an interval, insert (delete) its constituent segments by incrementing (decrementing) covers_segment. Then, for all ancestors of the affected segments, recalculate the other fields as follows.
if s.covers_segment == 0:
s.covers_lower = s.lower_child.covers_lower
s.interior_perimeter =
s.lower_child.interior_perimeter +
(1 if s.lower_child.covers_upper != s.upper_child.covers_lower else 0) +
s.upper_child.interior_perimeter
s.covers_upper = s.upper_child.covers_upper
else:
s.covers_lower = true
s.interior_perimeter = 0
s.covers_upper = true
To implement perimeter, return
(1 if root.covers_lower else 0) +
root.interior_perimeter +
(1 if root.covers_upper else 0)
where root is the root of the segment tree.

This might help in some cases of your problem:
Consider that this,
_______
| |_
| |
| _|
|___ |
| |
|___|
has the same perimeter as this:
_________
| |
| |
| |
| |
| |
|_________|

On the one hand, the classic solition for this problem would be a sweep-line-based "boolean merge" algorithm, which in its original form builds the union of these rectangles, i.e. builds the polygonal boundary of the result. The algorithm can easily be modified to calculate the perimeter of the resultant boundary without physically building it.
On the other hand, sweep-line-based "boolean merge" can do this for arbitrary polygonal input. Given that in your case the input is much more restricted (and simplified) - just a bunch of isothetic rectangles - it is quite possible that a more lightweight and clever solution exists.
Note, BTW, that union of such rectangles might actually be a multi-connected polygon, i.e. an area with holes in it.

Related

Kth element in transformed array

I came across this question in recent interview :
Given an array A of length N, we are supposed to answer Q queries. Query form is as follows :
Given x and k, we need to make another array B of same length such that B[i] = A[i] ^ x where ^ is XOR operator. Sort an array B in descending order and return B[k].
Input format :
First line contains interger N
Second line contains N integers denoting array A
Third line contains Q i.e. number of queries
Next Q lines contains space-separated integers x and k
Output format :
Print respective B[k] value each on new line for Q queries.
e.g.
for input :
5
1 2 3 4 5
2
2 3
0 1
output will be :
3
5
For first query,
A = [1, 2, 3, 4, 5]
For query x = 2 and k = 3, B = [1^2, 2^2, 3^2, 4^2, 5^2] = [3, 0, 1, 6, 7]. Sorting in descending order B = [7, 6, 3, 1, 0]. So, B[3] = 3.
For second query,
A and B will be same as x = 0. So, B[1] = 5
I have no idea how to solve such problems. Thanks in advance.
This is solvable in O(N + Q). For simplicity I assume you are dealing with positive or unsigned values only, but you can probably adjust this algorithm also for negative numbers.
First you build a binary tree. The left edge stands for a bit that is 0, the right edge for a bit that is 1. In each node you store how many numbers are in this bucket. This can be done in O(N), because the number of bits is constant.
Because this is a little bit hard to explain, I'm going to show how the tree looks like for 3-bit numbers [0, 1, 4, 5, 7] i.e. [000, 001, 100, 101, 111]
*
/ \
2 3 2 numbers have first bit 0 and 3 numbers first bit 1
/ \ / \
2 0 2 1 of the 2 numbers with first bit 0, have 2 numbers 2nd bit 0, ...
/ \ / \ / \
1 1 1 1 0 1 of the 2 numbers with 1st and 2nd bit 0, has 1 number 3rd bit 0, ...
To answer a single query you go down the tree by using the bits of x. At each node you have 4 possibilities, looking at bit b of x and building answer a, which is initially 0:
b = 0 and k < the value stored in the left child of the current node (the 0-bit branch): current node becomes left child, a = 2 * a (shifting left by 1)
b = 0 and k >= the value stored in the left child: current node becomes right child, k = k - value of left child, a = 2 * a + 1
b = 1 and k < the value stored in the right child (the 1-bit branch, because of the xor operation everything is flipped): current node becomes right child, a = 2 * a
b = 1 and k >= the value stored in the right child: current node becomes left child, k = k - value of right child, a = 2 * a + 1
This is O(1), again because the number of bits is constant. Therefore the overall complexity is O(N + Q).
Example: [0, 1, 4, 5, 7] i.e. [000, 001, 100, 101, 111], k = 3, x = 3 i.e. 011
First bit is 0 and k >= 2, therefore we go right, k = k - 2 = 3 - 2 = 1 and a = 2 * a + 1 = 2 * 0 + 1 = 1.
Second bit is 1 and k >= 1, therefore we go left (inverted because the bit is 1), k = k - 1 = 0, a = 2 * a + 1 = 3
Third bit is 1 and k < 1, so the solution is a = 2 * a + 0 = 6
Control: [000, 001, 100, 101, 111] xor 011 = [011, 010, 111, 110, 100] i.e. [3, 2, 7, 6, 4] and in order [2, 3, 4, 6, 7], so indeed the number at index 3 is 6 and the solution (always talking about 0-based indexing here).

Matching between two series after manipulation

Suppose, we're given two series of integer numbers as X[..] And Y[..], which
has the same length. We can choose any position i of the series X[] and
doing the operation like , X[i]=X[i] + 3, X[i + 2] = X[i + 2] + 2 , X[i + 4] = X[i + 4] + 1.
After manipulating the series with any number of time, is it possible to
find the same series like Y[..]?
I am thinking to implement it by brute force and normal combinational matching after manipulation. Is there any other process which can make it faster?
Given two series,
X [ 1, 2, 3 ,4, 5 ,6,8 ]
Y [ 1, 5, 6 ,6, 7 ,7,9 ]
if i=2 then
X [ 1, 5, 3 ,6, 5 ,7,8 ]
Y [ 1, 5, 6 ,6, 7 ,7,9 ]
and if i=3 then
X [ 1, 5, 6 ,6, 7 ,7,9 ]
Y [ 1, 5, 6 ,6, 7 ,7,9 ]
Matches the series.
You can see that for every index p resulting cell could be represented as
Y[p] = X[p] + F(p-4) + 2 * F(p-2) + 3 * F[p]
where F[p] is number of operation at p-th index.
So you have system of p linear equations for p unknowns Fi.
This is tridiagonal (sparse) system, it could be solved with some fast methods or with usual Gaussian elimination.
System might be inconsistent - in this case there are no solutions
Since an operation at index i modifies only elements present at index i, i + 2 and i + 4, that is, all indices >= i, we can build a greedy algorithm which iterates over the array X from left-to-right and at every index i compares the value with array Y.
Case X[i] > Y[i]: Then it's not possible to update X[i] to Y[i], hence return not possible.
Case X[i] == Y[i]: Then continue iterating over the next element at i + 1
Case X[i] < Y[i]: If (Y[i] - X[i]) mod 3 != 0, then return not possible, else compute m = (Y[i] - X[i])/3 and increment X[i] by 3 * m, X[i + 2] by 2 * m and X[i + 4] by m and continue iterating.
If we reach the end of array X, then it means it's possible to construct array Y from X using these operations.
Overall time complexity of the solution is O(n).

hash for particular array

I have a very particular problem that I want to solve efficiently.
A geometry is defined by V volumes, numbered from 0 to V-1.
Each volume is bounded by different surfaces, numbered from 0 to N-1).
Volume | Surfaces
--------------------
Geometry A (V=2, N=7): 0 | [0 3 5 6 2]
1 | [5 4 2 1]
2 | [4 0 1 3 6]
Note that a surface will only appear once in a volume.
Also, a surface is at most in 2 volumes of a geometry.
Here is the problem:
I have two different descriptions of the same underlying geometry and I want to find which volume in Geometry A correspond to which volume in Geometry B. In other words, I have the same N surfaces, but the V volumes are defined differently.
Here is a Geometry B that could correspond to Geometry A above:
Volume | Surfaces
--------------------
Geometry B (V=2, N=7): 0 | [1 5 4 2]
1 | [3 6 5 0 2]
2 | [0 1 3 6 4]
Given Geometry A and B, I want to be able to bind each volume of Geometry A to its corresponding volume in Geometry B, the most efficiently as possible.
A 0 1 2
B 1 0 2
Draft of solution:
Sort each array of surfaces in ascending or descending order, than sort each volume following the lexicographic order of their surfaces. The problem is easily and robustly solved this way.
Better solution:
Compute a quick, unique hash for each array, than sort volumes following this hash. The hash should not depend on the order of surfaces in the array.
Why do I think a hash can be a good solution ?
Take hash(Volume) = min([Surfaces])
This hash already has at most 1 collision, because a surface can only appear in 2 volumes !
Now, if I take hash(Volume) = min([Surfaces]) + max([Sufaces])*N, I still have at most 1 collision, but the probability becomes very small when there is a lot of volumes and surfaces.
As mentioned, your solution is a good approximation for what you want. However, if you seek a perfect hash function, you can use the following method:
suppose p_i is the i-th prime number such that p_0 = 2, p_1 = 3, p_2 = 5, p_3 = 7, p_4 = 11, p_5 = 13, p_6 = 17, p_7 = 19 .... We can define a hash function on x_0, x_1, ..., x_k from an array such that h(x_0, ..., x_k) = p_{x_0} p_{x_1} ... p_{x_k}. Also, for the repeated numbers, we can apply the number of repetition as a power of the p_{x_i}. It means, for example, if x_i is repeated 3 times, the power of p_{x_i} in h would be p_{x_i}^3. if number of repetition of x_i is a_i we will have h(x_0, ..., x_k) = p_{x_0}^{a_0} p_{x_1}^{a_1} ... p_{x_k}^{a_k}.
Hence, for geometry A we have:
Volume | Surfaces | Hash
----------------------------------
geometry A 0 | [0, 3, 5, 6, 2] | 2 * 7 * 13 * 17 * 5 = 15470
1 | [5, 4, 2, 1] | 13 * 11 * 5 * 3 = 2145
2 | [4, 0, 1, 3, 6] | 11 * 2 * 3 * 7 * 17 = 7854
And the similar way for geometry B. As this function returns a unique value for each array (without concern with the order) you can arrange the surfaces using the correspondence hash value. If the value of N is not big, you can use the precomputed list of prime values.
I found a pretty good hash function, that should almost never have collisions:
V: [S_0 S_1 S_2 S_3...S_N-1]
u64 hash(V) = 0;
for i in {0..N-1} :
hash(V) = hash(V) ^ (1<<(S_i & 63))
end
This gives a unique 64 bit number, and all numbers are possible (unlike Omg's solution, where most numbers are impossible to get given that there is no repetition in the list of surface)
In the extreme case where there is a collision (which I will see after sorting), I will compare the arrays lexicographically in a stupid manner.

Finding all possible combinations of row in a matrix where sum of columns represents a specific row vector

I need to find out all possible combinations of row in a matrix where sum of columns represents a specific row matrix.
Example:
Consider the following matrix
| 0 0 2 |
| 1 1 0 |
| 0 1 2 |
| 1 1 2 |
| 0 1 0 |
| 2 1 2 |
I need to get the following row matrix from where sum of columns:
| 2 2 2 |
The possible combination were:
1.
| 1 1 0 |
| 1 1 2 |
2.
| 0 1 0 |
| 2 1 2 |
What is the best way to find out that.
ALGORITHM
One option is to turn this into the subset sum problem by choosing a base b and treating each row as a number in base b.
For example, with a base of 10 your initial problem turns into:
Consider the list of numbers
002
110
012
112
010
212
Find all subsets that sum to 222
This problem is well known and is solvable via dynamic programming (see the wikipedia page).
If all your entries are nonnegative, then you can use David Psinger's linear time algorithm which has complexity O(nC) where C is the target number and n is the length of your list.
CHOICE OF BASE
The complexity of the algorithm is determined by the choice of the base b.
For the algorithm to be correct you need to choose the base larger than the sum of all the digits in each column. (This is needed to avoid solving the problem due to an overflow from one digit into the next.)
However, note that if you choose a smaller base you will still get all the correct solutions, plus some incorrect solutions. It may be worth considering using a smaller base (which will make the subset sum algorithm work much faster), followed by a postprocessing stage that checks all the solutions found and discards any incorrect ones.
Too small a base will produce an exponential number of incorrect solutions to discard, so the best size of base will depend on the details of your problem.
EXAMPLE CODE
Python code to implement this algorithm.
from collections import defaultdict
A=[ [0, 0, 2],
[1, 1, 0],
[0, 1, 2],
[1, 1, 2],
[0, 1, 0],
[2, 1, 2] ]
target = [2,2,2]
b=10
def convert2num(a):
t=0
for d in a:
t+=b*t+d
return t
B = [convert2num(a) for a in A]
M=defaultdict(list)
for v,a in zip(B,A):
M[v].append(a) # Store a reverse index to allow us to look up rows
# First build the DP array
# Map from number to set of previous numbers
DP = defaultdict(set)
DP[0] = set()
for v in B:
for old_value in DP.keys():
new_value = old_value+v
if new_value<=target:
DP[new_value].add(v)
# Then search for solutions
def go(goal,sol):
if goal==0:
# Double check
assert map(sum,zip(*sol[:]))==target
print sol
return
for v in DP[goal]:
for a in M[v]:
sol.append(a)
go(goal-v,sol)
sol.pop()
go(convert2num(target),[])
This code assumes that b has been chosen large enough to avoid overflow.

Turning an array of integers into an array of nonnegative integers

Start with an array of integers so that the sum of the values is some positive integer S. The following routine always terminates in the same number of steps with the same results. Why is this?
Start with an array x = [x_0, x_1, ..., x_N-1] such that all x_i's are integers. While there is a negative entry, do the following:
Choose any index i such that x_i < 0.
Add x_i (a negative number) to x_(i-1 % N).
Add x_i (a negative number) to x_(i+1 % N).
Replace x_i with -x_i (a positive number).
This process maintains the property that x_0 + x_1 + ... + x_N-1 = S. For any given starting array x, no matter which index is chosen at any step, the number of times one goes through these steps is the same as is the resulting vector. It is not even obvious (to me, at least) that this process terminates in finite time, let alone has this nice invariant property.
EXAMPLE:
Take x = [4 , -1, -2] and flipping x_1 to start, the result is
[4, -1, -2]
[3, 1, -3]
[0, -2, 3]
[-2, 2, 1]
[2, 0, -1]
[1, -1, 1]
[0, 1, 0]
On the other hand, flipping x_2 to start gives
[4, -1, -2]
[2, -3, 2]
[-1, 3, -1]
[1, 2, -2]
[-1, 0, 2]
[1, -1, 1]
[0, 1, 0]
and the final way give this solution with arrays reversed from the third on down if you choose x_2 instead of x_0 to flip at the third array. In all cases, 6 steps lead to [0,1,0].
I have an argument for why this is true, but it seems to me to be overly complicated (it has to do with Coxeter groups). Does anyone have a more direct way to think about why this happens? Even finding a reason why this should terminate would be great.
Bonus points to anyone who finds a way to determine the number of steps for a given array (without going through the process).
I think the easiest way to see why the output vector and the number of steps are the same no matter what index you choose at each step is to look at the problem as a bunch of matrix and vector multiplications.
For the case where x has 3 components, think of x as a 3x1 vector: x = [x_0 x_1 x_2]' (where ' is the transpose operation). Each iteration of the loop will choose to flip one of x_0,x_1,x_2, and the operation it performs on x is identical to multiplication by one of the following matrices:
-1 0 0 1 1 0 1 0 1
s_0 = 1 1 0 s_1 = 0 -1 0 s_2 = 0 1 1
1 0 1 0 1 1 0 0 -1
where multiplication by s_0 is the operation performed if the index i=0, s_1 corresponds to i=1, and s_2 corresponds to i=2. With this view, you can interpret the algorithm as multiplying the corresponding s_i matrix by x at each iteration. So in the first example where x_1 is flipped at the start, the algorithm computes: s_1*s_2*s_0*s_1*s_2*s_1[4 -1 -2]' = [0 1 0]'
The fact that the index you choose doesn't affect the final output vector arises from two interesting properties of the s matrices. First, s_i*s_(i-1)*s_i = s_(i-1)*s_i*s(i-1), where i-1 is computed modulo n, the number of matrices. This property is the only one needed to see why you get the same result in the examples with 3 elements:
s_1*s_2*s_0*s_1*s_2*s_1 = s_1*s_2*s_0*(s_1*s_2*s_1) = s_1*s_2*s_0*(s_2*s_1*s_2), which corresponds to choosing x_2 at the start, and lastly:
s_1*s_2*s_0*s_2*s_1*s_2 = s_1*(s_2*s_0*s_2)*s_1*s_2 = s_1*(s_0*s_2*s_0)*s1*s2, which corresponds to choosing to flip x_2 at the start, but then choosing to flip x_0 in the third iteration.
The second property only applies when x has 4 or more elements. It is s_i*s_k = s_k*s_i whenever k <= i-2 where i-2 is again computed modulo n. This property is apparent when you consider the form of matrices when x has 4 elements:
-1 0 0 0 1 1 0 0 1 0 0 0 1 0 0 1
s_0 = 1 1 0 0 s_1 = 0 -1 0 0 s_2 = 0 1 1 0 s_3 = 0 1 0 0
0 0 1 0 0 1 1 0 0 0 -1 0 0 0 1 1
1 0 0 1 0 0 0 1 0 0 1 1 0 0 0 -1
The second property essentially says that you can exchange the order in which non-conflicting flips occur. For example, in a 4 element vector, if you first flipped x_1 and then flipped x_3, this has the same effect as first flipping x_3 and then flipping x_1.
I picture pushing the negative value(s) out in two directions until they dampen. Since addition is commutative, it doesn't matter what order you process the elements.
Here is an observation for when N is divisible by 3... Probably not useful, but I feel like writing it down.
Let w (complex) be a primitive cube root of 1; that is, w^3 = 1 and 1 + w + w^2 = 0. For example, w = cos(2pi/3) + i*sin(2pi/3).
Consider the sum x_0 + x_1*w + x_2*w^2 + x_3 + x_4*w + x_5*w^2 + .... That is, multiply each element of the sequence by consecutive powers of w and add them all up.
Something moderately interesting happens to this sum on each step.
Consider three consecutive numbers [a, -b, c] from the sequence, with b positive. Suppose these elements line up with the powers of w such that these three numbers contribute a - b*w + c*w^2 to the sum.
Now perform the step on the middle element.
After the step, these numbers contribute (a-b) + b*w + (c-b)*w^2 to the sum.
But since 1 + w + w^2 = 0, b + b*w + b*w^2 = 0 too. So we can add this to the previous expression to get a + 2*b*w + c. Which is very similar to what we had before the step.
In other words, the step merely added 3*b*w to the sum.
If the three consecutive numbers had lined up with powers of w to contribute (say) a*w - b*w^2 + c, it turns out that the step will add 3*b*w^2.
In other words, no matter how the powers of w line up with the three numbers, the step increases the sum by 3*b, 3*b*w, or 3*b*w^2.
Unfortunately, since w^2 = -(w+1), this does not actually yield a steadily increasing function. So, as I said, probably not useful. But it still seems like a reasonable strategy is to seek a "signature" for each position that changes monotonically with each step...

Resources