Segments with most points algorithm analysis - algorithm

We define x1, x2,..., x_n to be a sequence of points (numbers) and [s_i, t_i] be a set of n segments for 1 ≤ i ≤ n. Point x_j is inside the segment i if s_i ≤ x_j ≤ t_i. I want to find the segment with the most points.
Now to solve this, I am thinking we can sort x and the intervals based on s. Keep a separate array, T, such that T[i] = maximum points in the segment i. Initialize all the values in this array to 0. Then, for each x, check all the intervals that fit the constraint and increment T[i] accordingly.
This in the worst case scenario can take O(n^2). But I feel like I have a lot of redundancy here. How do I make this more efficient?

Just to clarify, if you problem is one-dimensional, the points in X (x_1 to x_n) are numbers, and the segments are intervals.
You can easily solve this by sorting X and using the resulting indices. You can effectively calculate the number of points within a segment [s, t] by finding the two corresponding indices i and j. Find (using binary-search or whatever is most efficient) i such that x_i < s <= x_(i+1), and j such that x_j <= t < x_(j+1). Note the inequalities (in case s or t might be in X). The number of points within [s, t] is equal to j-i.
If it is possible that s < x_1 or t > x_n, simply append a point to both ends of X (a minimum and a maximum).
This has complexity O(n log n), limited by the sorting algorithm. If you can use something like counting sort that uses the values as indices into an array (or keys into a multiset), then you can improve on that by doing some more work.
Let S be the set of points containing every s and every t for all the segments [s, t]. The idea is to build an indexing array for X (kind of like for a counting sort).
First, build the array A such that A[x in X] = 1 and A[x not in X] = 0. Then, go through it again to build the array A_less such that A_less[i] equals the sum of all A[j] with j < i.
For example, if A = [1, 0, 0, 1, 0, 1, 0], then A_less = [0, 1, 1, 1, 2, 2, 3]. You can build this array using a simple counter.
You can now refer directly to this array to get the number of points which values are less than or equal to another. In the previous example, there are clearly three points in X, with values 0, 3, and 5. By refering to A_less, you can know that there are A_less[4] = 2 points with values less than or equal to 4.
Similarly, build A_less_equal such that A_less_equal[i] equals the sum of all A[j] with j <= i. Using the same example, A_less_equal = [1, 1, 1, 2, 2, 3, 3].
Now, for any segment [s, t], you can get the number of points it contains by computing A_less_equal[t] - A_less[s]. All of that has complexity O(n).
If your points are not integers (are at least, not easily usable as indices), then you can still use the same idea, replacing the arrays with sorted sets, the keys of which are every value in X or S (you need to add the values in S to be able to look them up at the end).

Related

How To Find K-th Smallest Element in Multiset-sum?

Need some help designing an algorithm to solve this problem.
Let a and b be integers with a ≤ b, and let [a,b] denote the set {a, a + 1, a + 2, ..., b}. Suppose we are given n such sets, [a1,b1],...[an,bn], their multiset-sum is
S = {a1, a1 + 1,..., b1, a2,a2 + 1,...,b2,...,an,an + 1, ..., bn}
For example, the multiset-sum of [5,25], [3,10], and [8,12], is
{3,4,5,5,6,6,7,7,8,8,8,9,9,9,10,10,10,...,25}
Given the sets[a1, b1],...,[an, bn] such that 0 ≤ ai, bi ≤ N and an integer k > 0, design an efficient algorithm that outputs the k smallest element in S, the multiset-sum of the sets. Determine the running time of the algorithm in terms of n and N.
I've already designed two helper algorithms called FindElementsBefore(x, [a1,b1]...[an,bn]) and FindElementsAfter(x, [a1,b1]...[an,bn]). These both accept an element x and each of the sets and return the number of elements in S less than x and greater than x respectively.
I've been told by my professor that using these two helper methods, I should be able to solve the above problem, but I am absolutely stumped. How do I solve this?
Use a binary search.
You already know the largest and smallest values in your multiset-sum. Thus, you have an upper and lower bound for the k-th smallest element. Now you can simply recurse on the upper and lower bounds, depending on the value of FindElementsBefore(mid, ...) <= k.

How more effectively find the minimal composition from n sets that satisfies the given condition?

We have N sets of triples,like
1. { (4; 0,1), (5 ; 0.3), (7; 0,6) }
2. { (7; 0.2), (8 ; 0.4), (1 ; 0.4) }
...
N. { (6; 0.3), (1; 0.2), (9 ; 0.5) }
and need to choose only one pair from each triple, so that the sum of the first members in pair will be minimal, but also we have a condition that sum of the second members in pair must be not less than a given P number.
We can solve this by sorting all possible pair combinations with the sum of their first members (3 ^ N combinations), and in that sorted list choose the first one which also satisfies the second condition.
Could you please help to suggest a better, non trivial solution for this problem?
If there are no constraints on the values inside your triplets, then we are facing a pretty general version of integer programming problem, more specifically a 0-1 linear programming problem, as it can be represented as a system of equations with every coefficient being 0 or 1. You can find the possible approaches on the wiki page, but there is no fast-and-easy solution for this problem in general.
Alternatively, if the second numbers of each pair (the ones that need to sum up to >= P) are from a small enough range, we could view this as Dynamic Programming problem similar to a Knapsack problem. "Small enough" there is a bit hard to define because the original data has non-integer numbers. If they were integers, then the algorithmic complexity of solution I will describe is O(P * N). For non-integer numbers, they need to be first converted to integers by multiplying them all, as well as P, by a large enough number. In your example, the precision of each number is 1 digit after zero, so multiplying by 10 is enough. Hence, the actual complexity is O(M * P * N), where M is the factor everything was multiplied by to achieve integer numbers.
After this, we are essentially solving a modified Knapsack problem: instead of constraining the weight from above, we are constraining it from below, and on each step we are choosing a pair from a triplet, as opposed to deciding whether to put an item into the knapsack or not.
Let's define a function minimum_sum[i][s] which at values i, s represents the minimum possible sum (of first numbers in each pair we took) we can achieve if the sum of the second numbers in pairs taken so far is equal to s and we already considered the first i triplets. One exception to this definition is that minimum_sum[i][P] has the minimum for all sums exceeding P as well. If we can compute all values of this function, then minimum_sum[N][P] is the answer. The function values can be computed with something like this:
minimum_sum[0][0]=0, all other values are set to infinity
for i=0..N-1:
for s=0..P:
for j=0..2:
minimum_sum[i+1][min(P, s+B[i][j])] = min(minimum_sum[i+1][min(P, s+B[i][j])], minimum_sum[i][s] + A[i][j]
A[i][j] here denote the first number in i-th triplet's j-th pair, and B[i][j] denote the second number of the same triplet.
This solution is viable if N is large, but P is small and precision on Bs isn't too high. For instance, if N=50, there is little hope to compute 3^N possibilities, but with M*P=1000000 this approach would work extremely fast.
Python implementation of the idea above:
def compute(A, B, P):
n = len(A)
# note that I use 1,000,000 as “infinity” here, which might need to be increased depending on input data
best = [[1000000 for i in range(P + 1)] for j in range(n + 1)]
best[0][0] = 0
for i in range(n):
for s in range(P+1):
for j in range(3):
best[i+1][min(P, s+B[i][j])] = min(best[i+1][min(P, s+B[i][j])], best[i][s]+A[i][j])
return best[n][P]
Testing:
A=[[4, 5, 7], [7, 8, 1], [6, 1, 9]]
# second numbers in each pair after scaling them up to be integers
B=[[1, 3, 6], [2, 4, 4], [3, 2, 5]]
In [7]: compute(A, B, 0)
Out[7]: 6
In [14]: compute(A, B, 7)
Out[14]: 6
In [15]: compute(A, B, 8)
Out[15]: 7
In [20]: compute(A, B, 13)
Out[20]: 14

maximum ratio of a min subset and a max subset of size k in a collection of n value pairs

So, say you have a collection of value pairs on the form {x, y}, say {1, 2}, {1, 3} & {2, 5}.
Then you have to find a subset of k pairs (in this case, say k = 2), such that the ratio of the sum of all x in the subset divided by all the y in the subset is as high as possible.
Could you point me in the direction for relevant theory or algorithms?
It's kind of like maximum subset sum, but since the pairs are "bound" to each other it introduces a restriction that changes it from problems known to me.
Initially I thought that a simple greedy approach could work here, but commentators pointed out some counter examples.
Instead I think a bisection approach should work.
Suppose we want to know whether it is possible to achieve a ratio of g.
We need to add a selection of k vectors to end up above a line of gradient g.
If we project each vector perpendicular to this line to get values p1,p2,p3, then the final vector will be above the line if and only if the sum of the p values is positive.
Now, with the projected values it does seem right that the optimal solution is to choose the largest k.
We can then use bisection to find the highest ratio that is achievable.
Mathematical justification
Suppose we want to have the ratio above g, i.e.
(x1+x2+x3)/(y1+y2+y3) >= g
=> (x1+x2+x3) >= g(y1+y2+y3)
=> (x1-g.y1) + (x2-g.y2) + (x3-g.y3) >= 0
=> p1 + p2 + p3 >= 0
where pi is defined to be xi-g.yi.

Select k numbers maximizing sum of pairwise xor

Given a range [l, r] (where l < r), and a number k (where k <= r - l), I want to select a set S of k distinct numbers in [l, r] which maximizes the sum of pairwise xors. For example, if [l, r] = [2, 10] and k = 3 and we choose S = {4, 5, 6}, the sum of xors is d(4, 5) + d(4, 6) + d(5, 6) = 1 + 1 + 2 = 4.
Here's my thinking so far: in [l, r], for each bit index i less than or equal to the index of the highest set bit in r, the number of elements in S ^ S with the ith bit set is equal to j * (k-j), where j is the count of the elements in S with the ith bit set. To optimize this we want to select S such that, for each bit i, S contains k/2 elements with the ith bit set. This is easy for k = 2, but I'm stuck on generalizing this for k > 2.
At a first glance it seems that there is no algebraic solution for this problem. I mean, this seems like a NP-hard problem (a optimizational problem) that is not solvable in polynomial time.
As almost always possible one can brute force through the feasible space.
Intuitively, I can suggest to look into Locality Sensitive Hashing. In LSH one normally tries to find similarities between two sets. But in you case, you can abuse this algorithm in the following sense.
The domain is subdivided into few buckets.
You sample randomly points in the space [l,r].
High probable points (large Hamming distance) are placed in the buckets.
In the end you brute force in the most probable bucket.
In the end one can expect that points with large Hamming distances should be in the same neighborhood (that's why the name Locality Sensitive Hashing). However, it is just an idea.

Sum-subset with a fixed subset size

The sum-subset problem states:
Given a set of integers, is there a non-empty subset whose sum is zero?
This problem is NP-complete in general. I'm curious if the complexity of this slight variant is known:
Given a set of integers, is there a subset of size k whose sum is zero?
For example, if k = 1, you can do a binary search to find the answer in O(log n). If k = 2, then you can get it down to O(n log n) (e.g. see Find a pair of elements from an array whose sum equals a given number). If k = 3, then you can do O(n^2) (e.g. see Finding three elements in an array whose sum is closest to a given number).
Is there a known bound that can be placed on this problem as a function of k?
As motivation, I was thinking about this question How do you partition an array into 2 parts such that the two parts have equal average? and trying to determine if it is actually NP-complete. The answer lies in whether or not there is a formula as described above.
Barring a general solution, I'd be very interested in knowing an optimal bound for k=4.
For k=4, space complexity O(n), time complexity O(n2 * log(n))
Sort the array. Starting from 2 smallest and 2 largest elements, calculate all lesser sums of 2 elements (a[i] + a[j]) in the non-decreasing order and all greater sums of 2 elements (a[k] + a[l]) in the non-increasing order. Increase lesser sum if total sum is less than zero, decrease greater one if total sum is greater than zero, stop when total sum is zero (success) or a[i] + a[j] > a[k] + a[l] (failure).
The trick is to iterate through all the indexes i and j in such a way, that (a[i] + a[j]) will never decrease. And for k and l, (a[k] + a[l]) should never increase. A priority queue helps to do this:
Put key=(a[i] + a[j]), value=(i = 0, j = 1) to priority queue.
Pop (sum, i, j) from priority queue.
Use sum in the above algorithm.
Put (a[i+1] + a[j]), i+1, j and (a[i] + a[j+1]), i, j+1 to priority queue only if these elements were not already used. To keep track of used elements, maintain an array of maximal used 'j' for each 'i'. It is enough to use only values for 'j', that are greater, than 'i'.
Continue from step 2.
For k>4
If space complexity is limited to O(n), I cannot find anything better, than use brute force for k-4 values and the above algorithm for the remaining 4 values. Time complexity O(n(k-2) * log(n)).
For very large k integer linear programming may give some improvement.
Update
If n is very large (on the same order as maximum integer value), it is possible to implement O(1) priority queue, improving complexities to O(n2) and O(n(k-2)).
If n >= k * INT_MAX, different algorithm with O(n) space complexity is possible. Precalculate a bitset for all possible sums of k/2 values. And use it to check sums of other k/2 values. Time complexity is O(n(ceil(k/2))).
The problem of determining whether 0 in W + X + Y + Z = {w + x + y + z | w in W, x in X, y in Y, z in Z} is basically the same except for not having annoying degenerate cases (i.e., the problems are inter-reducible with minimal resources).
This problem (and thus the original for k = 4) has an O(n^2 log n)-time, O(n)-space algorithm. The O(n log n)-time algorithm for k = 2 (to determine whether 0 in A + B) accesses A in sorted order and B in reverse sorted order. Thus all we need is an O(n)-space iterator for A = W + X, which can be reused symmetrically for B = Y + Z. Let W = {w1, ..., wn} in sorted order. For all x in X, insert a key-value item (w1 + x, (1, x)) into a priority queue. Repeatedly remove the min element (wi + x, (i, x)) and insert (wi+1 + x, (i+1, x)).
Question that is very similar:
Is this variant of the subset sum problem easier to solve?
It's still NP-complete.
If it were not, the subset-sum would also be in P, as it could be represented as F(1) | F(2) | ... F(n) where F is your function. This would have O(O(F(1)) + O(F(2)) + O(F(n))) which would still be polynomial, which is incorrect as we know it's NP-complete.
Note that if you have certain bounds on the inputs you can achieve polynomial time.
Also note that the brute-force runtime can be calculated with binomial coefficients.
The solution for k=4 in O(n^2log(n))
Step 1: Calculate the pairwise sum and sort the list. There are n(n-1)/2 sums. So the complexity is O(n^2log(n)). Keep the identities of the individuals which make the sum.
Step 2: For each element in the above list search for the complement and make sure they don't share "the individuals). There are n^2 searches, each with complexity O(log(n))
EDIT: The space complexity of the original algorithm is O(n^2). The space complexity can be reduced to O(1) by simulating a virtual 2D matrix (O(n), if you consider space to store sorted version of the array).
First about 2D matrix: sort the numbers and create a matrix X using pairwise sums. Now the matrix is ins such a way that all the rows and columns are sorted. To search for a value in this matrix, search the numbers on the diagonal. If the number is in between X[i,i] and X[i+1,i+1], you can basically halve the search space by to matrices X[i:N, 0:i] and X[0:i, i:N]. The resulting search algorithm is O(log^2n) (I AM NOT VERY SURE. CAN SOMEBODY CHECK IT?).
Now, instead of using a real matrix, use a virtual matrix where X[i,j] are calculated as needed instead of pre-computing them.
Resulting time complexity: O( (nlogn)^2 ).
PS: In the following link, it says the complexity of 2D sorted matrix search is O(n) complexity. If that is true (i.e. O(log^2n) is incorrect), then the finally complexity is O(n^3).
To build on awesomo's answer... if we can assume that numbers are sorted, we can do better than O(n^k) for given k; simply take all O(n^(k-1)) subsets of size (k-1), then do a binary search in what remains for a number that, when added to the first (k-1), gives the target. This is O(n^(k-1) log n). This means the complexity is certainly less than that.
In fact, if we know that the complexity is O(n^2) for k=3, we can do even better for k > 3: choose all (k-3)-subsets, of which there are O(n^(k-3)), and then solve the problem in O(n^2) on the remaining elements. This is O(n^(k-1)) for k >= 3.
However, maybe you can do even better? I'll think about this one.
EDIT: I was initially going to add a lot proposing a different take on this problem, but I've decided to post an abridged version. I encourage other posters to see whether they believe this idea has any merit. The analysis is tough, but it might just be crazy enough to work.
We can use the fact that we have a fixed k, and that sums of odd and even numbers behave in certain ways, to define a recursive algorithm to solve this problem.
First, modify the problem so that you have both even and odd numbers in the list (this can be accomplished by dividing by two if all are even, or by subtracting 1 from numbers and k from the target sum if all are odd, and repeating as necessary).
Next, use the fact that even target sums can be reached only by using an even number of odd numbers, and odd target sums can be reached using only an odd number of odd numbers. Generate appropriate subsets of the odd numbers, and call the algorithm recursively using the even numbers, the sum minus the sum of the subset of odd numbers being examined, and k minus the size of the subset of odd numbers. When k = 1, do binary search. If ever k > n (not sure this can happen), return false.
If you have very few odd numbers, this could allow you to very quickly pick up terms that must be part of a winning subset, or discard ones that cannot. You can transform problems with lots of even numbers to equivalent problems with lots of odd numbers by using the subtraction trick. The worst case must therefore be when the numbers of even and odd numbers are very similar... and that's where I am right now. A uselessly loose upper bound on this is many orders of magnitudes worse than brute-force, but I feel like this is probably at least as good as brute-force. Thoughts are welcome!
EDIT2: An example of the above, for illustration.
{1, 2, 2, 6, 7, 7, 20}, k = 3, sum = 20.
Subset {}:
{2, 2, 6, 20}, k = 3, sum = 20
= {1, 1, 3, 10}, k = 3, sum = 10
Subset {}:
{10}, k = 3, sum = 10
Failure
Subset {1, 1}:
{10}, k = 1, sum = 8
Failure
Subset {1, 3}:
{10}, k = 1, sum = 6
Failure
Subset {1, 7}:
{2, 2, 6, 20}, k = 1, sum = 12
Failure
Subset {7, 7}:
{2, 2, 6, 20}, k = 1, sum = 6
Success
The time complexity is trivially O(n^k) (number of k-sized subsets from n elements).
Since k is a given constant, a (possibly quite high-order) polynomial upper bounds the complexity as a function of n.

Resources