Complement of a set of intervals - algorithm

I have a set of intervals inside the range [0,k].
How can I produce the complement set of this set of intervals?
I can come up with an algorithm, but it requires sorting the intervals.
Therefore, the complexity is O(nlogn), where n is the number of intervals.
Is there any faster algorithm to do this? If not, is there any way to prove that this is the optimal complexity?
Thank you.

In practice, let us assume that you have found an algorithm to perform this task (find the complement set) in O(n).
Then we can show that you have invented a new sort algorithm working in O(n).
To simplify, let us assume that the array to be sorted consists of natural numbers, and that there is no repetition.
If [a1 a2 ... an] need to be sorted, then consider the intervals [a1, a1+1) [a2, a2 + 1) ... [an, an + 1).
Applying you algorithm to generate the complement set of intervals in O(n), we get n intervals
​ [x1a + 1, x1b) [x2a + 1, x2b])... [xna + 1, xnb)
where the {xia, xib} corresponds to successive aj elements after sorting.
Let us assimilate this relation as a directed edge in a graph, connecting the two vertices xia and xib.
To get the original array in sorted array, we need to find the start of the graph, and then walking through the graph, which can be done in O(n).
The sort has been performed in O(n).
The fact that we did not consider repetitions is not too annoying from a theoretical point of view, if we consider for example that with hashing, we can suppress the repetitions in O(n).
The fact to not consider floating point values is a detail: finding a new O(n) sort algorithm for natural numbers
would be already a great result.

Related

Finding a maximal sorted subsequence

Assume that we're given a set of pairs S={(x_1,y_1),...,(x_n,y_n)} of integers. What is the most efficient way of computing a maximal sequence of elements (a_1,b_1),...,(a_m,b_m) in S with the property that
a_i <= a_{i+1}
b_i <= b_{i+1}
for i=1,...,m-1, i.e. the sequence is ordered with respect to both components. I can come up with a quadratic algorithm that does the following:
We sort the elements of S with respect to the first coordinate, giving (c_1,d_1),...,(c_n,d_n), where c_i <= c_{i+1}.
Using dynamic programming, for each (c_i,d_i) we compute the longest sequence ordered with respect to both components that ends in (c_i,d_i). This can be done in linear time, once we know the longest such sequence for (c_1,d_1)...,(c_{i+1},d_{i+1}).
Since we have to perform an O(nlogn) sort in step 1 and a linear search for each index in step 2, which is quadratic, we end up with a quadratic runtime.
I've been trying to figure out whether there's a faster, i.e. O(nlogn) way of generating the maximal sequence from having two sorts of the set S: one with respect to the first component, and one with respect to the second. Is this possible?
Yes, it is possible to do it O(n log n).
Let's sort the elements of the set in lexicographical order. The first components are ordered correctly now, so we can forget about them.
Let's take a look at any sorted subsequence of this sorted sequence. The second elements form an increasing subsequence. That's why we can just find the longest increasing subsequence in the sorted sequence for the second element of each pair(completely ignoring first elements as they are already sorted properly). The longest increasing subsequence for an array of numbers can be found in O(n log n) time(it is a well-known problem).

Rewrite O(N W) in terms of N

I have this question that asks to rewrite the subset sum problem in terms of only N.
If unaware the problem is that given weights, each with cost 1 how would you find the optimal solution given a max weight to achieve.
So the O(NW) is the space and time costs, where space will be for the 2d matrix and in the use of dynamic programming. This problem is a special case of the knapsac problem.
I'm not sure how to approach this as I tried to think about it and only thing I thought of was find the sum of all weights and just have a general worst case scenario. Thanks
If the weight is not bounded, and so the complexity must depend solely on N, there is at least an O (2N) approach, which is trying all possible subsets of N elements and computing their sums.
If you are willing to use exponential space rather than polynomial space, you can solve the problem in O(n 2^(n/2)) time and O(2^(n/2)) space if you split your set of n weights into two sets A and B of roughly equal size and compute the sum of weights for all the subsets of the two sets, and then hash all sums of subsets in A and hash W - x for all sums x of subsets of B, and if you get a collision between a subset of A and a subset of B in the hash table then you have found a subset that sums to W.

Dividing an integer array into two equal sized sub-arrays?

I came across this question and couldn't find a reasonable solution.
How would you divide an unsorted integer array into 2 equal sized sub-arrays such that, difference between sub-array sums is minimum.
For example: given an integer array a[N] (unsorted), we want to split the array into be split into a1 and a2 where a1.length == a2.length i.e N/2 and (sum of all numbers in a1 - sum of all numbers in a2) should be minimum.
For the sake of simplicity, let's assume all numbers are positve but there might be repetitions.
While others have mentioned that this is a case of the partition problem with modification, I'd like to point out, more specifically, that it is actually a special case of the minimum makespan problem with two machines. Namely, if you solve the two-machine makespan problem and obtain a value m, you obtain the minimum difference 2*m - sum(i : i in arr)
As the wikipedia article states, the problem is NP-complete for more than 2 machines. However, in your case, the List scheduling algorithm, which in general provides an
approximate answer, is optimal and polynomial-time for the two-machine and three-machine case given a sorted list in non-increasing order.
For details, and some more theoretical results on this algorithm, see here.

How to enumerate all k-combinations of a set by sum?

Suppose I have a finite set of numeric values of size n.
Question: Is there an efficient algorithm for enumerating the k-combinations of that set so that combination I precedes combination J iff the sum of the elements in I is less than or equal to the sum of the elements in J?
Clearly it's possible to simply enumerate the combinations and sort them according to their sums. If the set is large, however, brute enumeration of all combinations, let alone sorting, will be infeasible. If I'm only interested in obtaining the first m << choose(n,k) combinations ranked by sum, is it possible to obtain them before the heat death of the universe?
There is no polynomial algorithm for enumerating the set this way (unless P=NP).
If there was such an algorithm (let it be A), then we could solve the subset sum problem polynomially:
run A
Do a binary search to find the subset that sums closest to the desired number.
Note that step 1 runs polynomially (assumption) and step 2 runs in O(log(2^n)) = O(n).
Conclusion: Since the Subset Sum problem is NP-Complete, solving this problem efficiently will prove P=NP - thus there is no known polynomial solution to the problem.
Edit: Even though the problem is NP-Hard, getting the "smallest" m subsets can be done on O(n+2^m) by selecting the smallest m elements, generating all the subsets from these m elements - and choosing the minimal m of those. So for fairly small values of m - it might be feasible to calculate it.

Finding pair of big-small points from a set of points in a 2D plane

The following is an interview question which I've tried hard to solve. The required bound is to be less than O(n^2). Here is the problem:
You are given with a set of points S = (x1,y1)....(xn,yn). The points
are co-ordinates on the XY plane. A point (xa,ya) is said to be
greater than point (xb,yb) if and only if xa > xb and ya > yb.
The objective is the find all pairs of points p1 = (xa,ya) and p2 = (xb,yb) from the set S such that p1 > p2.
Example:
Input S = (1,2),(2,1),(3,4)
Answer: {(3,4),(1,2)} , {(3,4),(2,1)}
I can only come up with an O(n^2) solution that involves checking each point with other. If there is a better approach, please help me.
I am not sure you can do it.
Example Case: Let the points be (1,1), (2,2) ... (n,n).
There are O(n^2) such points and outputting them itself takes O(n^2) time.
I am assuming you actually want to count such pairs.
Sort descendingly by x in O(n log n). Now we have reduced the problem to a single dimension: for each position k we need to count how many numbers before it are larger than the number at position k. This is equivalent to counting inversions, a problem that has been answered many times on this site, including by me, for example here.
The easiest way to get O(n log n) for that problem is by using the merge sort algorithm, if you want to think about it yourself before clicking that link. Other ways include using binary indexed trees (fenwick trees) or binary search trees. The fastest in practice is probably by using binary indexed trees, because they only involve bitwise operations.
If you want to print the pairs, you cannot do better than O(n^2) in the worst case. I would be interested in an output-sensitive O(num_pairs) algorithm too however.
Why don't you just sort the list of points by X, and Y as a secondary index? (O(nlogn))
Then you can just give a "lazy" indicator that shows for each point that all the points on its right are bigger than it.
If you want to find them ALL, it will take O(n^2) anyway, because there's O(n^2) pairs.
Think of a sorted list, the first one is smallest, so there's n-1 bigger points, the second one has n-2 bigger points... which adds up to about (n^2)/2 == O(n^2)

Resources