how to solve this (selecting intervals) - algorithm

I've given some intervals I = {I(1), I(2), ..., I(m)} for I(i) = [a_i, b_i] (1<=a_i<=b_i<=n). You may suppose that intervals cover each other(sorry i'm poor in english), so there's no intervals such as {[1,5], [3,6]}, {[2,5], [5,7]}. And {[1,1], [2,2], ..., [n,n]} must be included in I.
Let's suppose C(i) = b_i - a_i + 1.
I want to find {I(c_1), I(c_2), ..., I(c_k)} that are non overlapped by each other, and C(c_1) + C(c_2) + ... + C(c_k) = T. (1 <= T <= n).
I could find O(n*T) DP solution using Subset Sum problem, and I think it's NP, but I'm not sure. Can I optimize more than O(n*T)?

The problem is reduceable from the Subset Sum problem (Given a set of numbers and a target number, find out if there is a subset that sums to this target) with a simple reduction:
Given an instance of subset-sum: S={c_1,c_2,..,c_n},T - create an instance of this problem by creating n non overlapping intervals, interval i, with c_i points (easy to do by ascending order). The same T remains.
Now, the answer to the subset-sum problem is true if and only if there is a subset of intervals that sums to T. It is basically the same problem, since all intervals do not overlap each other by definition of the problem.
From this we can conclude - your problem is NP-Hard.
Moreover, if we could solve the problem better then O(T*n), we could use the same approach to solve the subset sum problem better then O(T*n)1,2.
However, AFAIK, best pseudo polynomial solution to subset sum is O(T*n), so if you have such solution - stick with it.
(1) Converting the problem is O(n)
(2) This claim is true for this specific reduction alone, and NOT for the general case of polynomial reductions.

Related

Greedy Attempt for covering all the numbers with the given intervals

Let S be a set of intervals (containing n number of intervals) of the natural numbers that might overlap and N be a list of numbers (containing n number of numbers).
I want to find the smallest subset (let's call P) of S such that for each number
in our list N, there exists at least one interval in P that contains it. The intervals in P are allowed to overlap.
Trivial example:
S = {[1..4], [2..7], [3..5], [8..15], [9..13]}
N = [1, 4, 5]
// so P = {[1..4], [2..7]}
I think a dynamic algorithm might not work always, so if anybody knows of a solution to this problem (or a similar one that can be converted into), that would be great. I am trying to make a O(n^2 solution)
Here is one greedy approach
P = {}
for each q in N: // O(n)
if q in P // O(n)
continue
for each i in S // O(n)
if q in I: // O(n)
P.add(i)
break
But that is O(n^4).. Any help with creating a greedy approach that is O(n^2) would be great!
Thanks!
* Update: * I've been slamming at this problem and I think I have an O(n^2) solution!!
Let me know if you think I'm right!!!
N = MergeSort (N)
upper, lower = infinity, -1
P = empty set
for each q in N do
if (q>=lower and q<=upper)=False
max_interval = [-infinity, infinity]
for each r in S do
if q in r then
if r.rightEndPoint > max_interval.rightEndPoint
max_interval = r
P.append(max_interval)
lower = max_interval.leftEndPoint
upper = max_interval.rightEndPoint
S.remove(max_interval)
I think this should work!! I'm trying to find a counter solution; but yeah!!
This problem is similar to set cover problem, which is NP-complete (i.e., arguably has no solution faster than exponential). What makes it different is that intervals always cover adjacent elements (not arbitrary subset of N), which opens ways for faster solutions.
http://en.wikipedia.org/wiki/Set_cover_problem
I think that the solution proposed by Mike is good enough. But I think I have quite straightforward O(N^2) greedy algo. It starts like the Mike's one (moreover, I believe Mike's solution can also be improved in similar way):
You sort your N numbers and place them sorted into array ELEM; COMPLEXITY O(N*lg N);
Using binary search, for each interval S[i] you identify starting and ending index of elements in ELEM that are covered by S[i]. Say, you place this pair of numbers into array COVER, the difference between the two indices tells you how many elements you cover, for simplicity, let us place it array COVER_COUNT; COMPLEXITY O(N*lg N);
You introduce index pointer p, that shows till which element in ELEM, your N is already covered. you set p = 0, meaning that all elements up to 0-th (excluded) are initially covered (i.e., no elements); Complexity O(1). Moreover you introduce boolean array IS_INCLUDED, that reflects if interval S[i] is already included in your coverage set. Complexity O(N)
Then you start from the 0-th element in ELEM and see what is the interval that contains ELEM[0] and has greater coverage COVER_COUNT[i]. Imagine that it is i-th interval. We then mark it as included by setting IS_INCLUDED[i] to true. Then you set p to end[i] + 1 where end[i] is the ending index in COVER[i] pair (indeed now all elements til end[i] are covered). Then, knowing p you update all elements in COVER_COUNT so that they reflect how many elements of not yet covered elements each interval covers (this can be easily done in O(N) time). Then you perform the same step for ELEM[p] and continues till p >= ELEM.length. It can be observed that the overall complexity is O(N^2).
You finish in O(n^2) and in IS_INCLUDED has true for intervals of S included in optimal cover set
Let me know if this solution seems reasonable to you and if I calculated everything well.
P.S. Just wanted to add that the optimality of ythe solution found by algo can be proved by induction and contradiction. By contradiction, it is easy to show that at least one optimal solution includes the longest interval of those covering element ELEM[0]. If so, by induction we can show that for each next element in algo, we can keep on following the strategy of selelcting the interval that is the longest with respect to the number of remaining elements covered and that covers the leftmost yet uncovered element.
I am not sure, but mb some think like this.
1) For each interval create a list with elements from N witch contain in interval, it will take O(n^2) lets call it Q[i] for S[i]
2) Then sort our S by length of Q[i], O(n*lg(n))
3) Go throw this array excluding Q[i] from N O(n) and from Q[i+1]...Q[n] = O(n^2)
4) Repeat 2 while N is not empty.
It's not O(n^2), it's O(n^3) but if you can use hashmap, i think you can improve this.

Partitioning a list of integers to minimize difference of their sums

Given a list of integers l, how can I partition it into 2 lists a and b such that d(a,b) = abs(sum(a) - sum(b)) is minimum. I know the problem is NP-complete, so I am looking for a pseudo-polynomial time algorithm i.e. O(c*n) where c = sum(l map abs). I looked at Wikipedia but the algorithm there is to partition it into exact halves which is a special case of what I am looking for...
EDIT:
To clarify, I am looking for the exact partitions a and b and not just the resulting minimum difference d(a, b)
To generalize, what is a pseudo-polynomial time algorithm to partition a list of n numbers into k groups g1, g2 ...gk such that (max(S) - min(S)).abs is as small as possible where S = [sum(g1), sum(g2), ... sum(gk)]
A naive, trivial and still pseudo-polynomial solution would be to use the existing solution to subset-sum, and repeat for sum(array)/2to 0 (and return the first one found).
Complexity of this solution will be O(W^2*n) where W is the sum of the array.
pseudo code:
for cand from sum(array)/2 to 0 descending:
subset <- subsetSumSolver(array,cand)
if subset != null:
return subset
The above will return the maximal subset that is lower/equals sum(array)/2, and the other part is the complement for the returned subset.
However, the dynamic programming for subset-sum should be enough.
Recall that the formula is:
f(0,i) = true
f(x,0) = false | x != 0
f(x,i) = f(x-arr[i],i-1) OR f(x,i-1)
When building the matrix, the above actually creates you each row with value lower than the initial x, if you input sum(array)/2 - it's basically all values.
After you generate the DP matrix, just find the maximal value of x such that f(x,n)=true, and this is the best partition you can get.
Complexity in this case is O(Wn)
You can phrase this as a 0/1 integer linear programming optimization problem. Let wi be the ith number, and let xi be a 0/1 variable which indicates whether wi is in the first set or not. Then you want to minimize sum(xi wi) - sum((1 - xi) wi) subject to
sum(xi wi) >= sum((1 - xi) wi)
and also subject to all xi being 0 or 1. There has been a lot of research into optimizing 0/1 linear programming solvers. For large total sum W this may be an improvement over the O(W n) pseudo-polynomial time algorithm presented because the W factor is scary.
My first thought is to:
Sort list of integers
Create two empty lists A and B
While iterating from biggest integer to smallest integer...add next integer to the list with the smallest current sum.
This is, of course, not guaranteed to give you the best result but you can bound the result it will give you by the size of the biggest integer in your list

Efficient multiselection algorithm

I have to implement an algorithm that solves the multi-selection problem.
The multiselection problem is:
Given a set S of n elements drawn from a linearly ordered set, and a set K = {k1, k2,...,kr} of positive integers between 1 and n, the multiselection problem is to select the ki-th smallest element for all values of i, 1 <= i <= r
I need to solve the average case on Θ(n log r)
I've found a paper that implements the solution I need, but it assumes that there are no repeated numbers on the set S. The problem is that I can't assume that and I don't know how to adapt the algorithm of that paper to support repeated numbers.
The paper is here: http://www.ccse.kfupm.edu.sa/~suwaiyel/publications/multiselection_parCom.pdf
and the algorithm is on the second page. Any tips are welcome!
For posterity: the algorithm to which Ivan refers is to sort K, then solve the problem recursively as follows. Use QuickSelect to find the ki-th smallest element x where i is ceil(r/2), then recurse on the smaller halves of K and S, and the larger halves of K and S, splitting K about i and S about x.
Finding algorithms that work in the presence of degeneracy (here, equal elements) is often not a high priority for authors of theoretical works, because it makes the presentation of the common case more difficult and doesn't often play a role in determining the computational complexity of the problem. This is essentially a one-dimensional problem, and the black box solution is easy; replace the i-th element of the input yi by (yi, i) and break ties in the comparisons using the second component.
In practice, we can do better. Instead of recursing on {y : y in S, y < x} and {y : y in S, y > x}, use a three-way partitioning algorithm about x (see, e.g., every sufficiently complete treatment of QuickSort), then divide the array S by index instead of value.

Computing Combinations

I am facing difficulty in coming up with a solution for the problem given below:
We are given n boxes each having a weight ( it means each ball in box B_i have weight C_i),
Each box contain some balls specifically
{b1,b2,b3...,b_n} (b_i is the count of balls in Box B_i).
we have to choose m balls out of it such that sum of the weights of m chosen balls be less than a given number T.
How many ways to do it?
First, let's have a look on a similar problem:
The similar problem is: you are looking to maximize the sum (such that it is still smaller then T), you are facing a variation of subset-sum problem, which is NP-Hard. The variation with a constant number of items is discussed in this thread: Sum-subset with a fixed subset size.
An alternative way to look at the problem is with a 2-dimensional knapsack problem, where weight = cost, and an extra dimension for number of elements. This concept is discussed in this thread: What's the fastest way to solve knapsack prob with two properties
Now, look at your problem: Finding the number of possible ways to achieve a sum which is smaller/equal T is still NP-Hard.
Assume you had a polynomial algorithm to do it, let it be A.
Running A(T) and A(T-1) will give you two numbers, if A(T) > A(T-1), the answer to the subset sum problem would have been true - otherwise it is false, so given a polynomial solution to this problem, we could prove P=NP.
You can solve it by using dynamic programming techniques.
Let f[i][j][k] denote the number of ways to choose j balls from B_1 to B_i with sum of weights to be exactly k. The answer you want to get is f[n][m][T].
Initially, let f[i][j][k] = 1 for all i,j,k
for i = 1 to n
for j = 0 to m
for k = 0 to T
for x = 0 to min(b_i,j) # choose x balls from B_i
y = x * C_i
if y <= k
f[i][j][k] = f[i][j][k] * f[i-1][j-x][k-y] * Comb(b_i,x)
Comb(n,k) is the number of ways to choose k elements from n elements.
The time complexity is O(n m T b) where b is the maximum number of balls in a box.
Note that, because of the T in the big-O notation, theoretically it is NP-hard. However, in practice, when T is relatively small, this algorithm is still feasible.

Find sum in array equal to zero

Given an array of integers, find a set of at least one integer which sums to 0.
For example, given [-1, 8, 6, 7, 2, 1, -2, -5], the algorithm may output [-1, 6, 2, -2, -5] because this is a subset of the input array, which sums to 0.
The solution must run in polynomial time.
You'll have a hard time doing this in polynomial time, as the problem is known as the Subset sum problem, and is known to be NP-complete.
If you do find a polynomial solution, though, you'll have solved the "P = NP?" problem, which will make you quite rich.
The closest you get to a known polynomial solution is an approximation, such as the one listed on Wikipedia, which will try to get you an answer with a sum close to, but not necessarily equal to, 0.
This is a Subset sum problem, It's NP-Compelete but there is pseudo polynomial time algorithm for it. see wiki.
The problem can be solved in polynomial if the sum of items in set is polynomially related to number of items, from wiki:
The problem can be solved as follows
using dynamic programming. Suppose the
sequence is
x1, ..., xn
and we wish to determine if there is a
nonempty subset which sums to 0. Let N
be the sum of the negative values and
P the sum of the positive values.
Define the boolean-valued function
Q(i,s) to be the value (true or false)
of
"there is a nonempty subset of x1, ..., xi which sums to s".
Thus, the solution to the problem is
the value of Q(n,0).
Clearly, Q(i,s) = false if s < N or s
P so these values do not need to be stored or computed. Create an array to
hold the values Q(i,s) for 1 ≤ i ≤ n
and N ≤ s ≤ P.
The array can now be filled in using a
simple recursion. Initially, for N ≤ s
≤ P, set
Q(1,s) := (x1 = s).
Then, for i = 2, …, n, set
Q(i,s) := Q(i − 1,s) or (xi = s) or Q(i − 1,s − xi) for N ≤ s ≤ P.
For each assignment, the values of Q
on the right side are already known,
either because they were stored in the
table for the previous value of i or
because Q(i − 1,s − xi) = false if s −
xi < N or s − xi > P. Therefore, the
total number of arithmetic operations
is O(n(P − N)). For example, if all
the values are O(nk) for some k, then
the time required is O(nk+2).
This algorithm is easily modified to
return the subset with sum 0 if there
is one.
This solution does not count as
polynomial time in complexity theory
because P − N is not polynomial in the
size of the problem, which is the
number of bits used to represent it.
This algorithm is polynomial in the
values of N and P, which are
exponential in their numbers of bits.
A more general problem asks for a
subset summing to a specified value
(not necessarily 0). It can be solved
by a simple modification of the
algorithm above. For the case that
each xi is positive and bounded by the
same constant, Pisinger found a linear
time algorithm.[2]
It is well known Subset sum problem which NP-complete problem.
If you are interested in algorithms then most probably you are math enthusiast that I advise you look at
Subset Sum problem in mathworld
and here you can find the algorithm for it
Polynomial time approximation algorithm
initialize a list S to contain one element 0.
for each i from 1 to N do
let T be a list consisting of xi+y,
for all y in S
let U be the union of T and S
sort U
make S empty
let y be the smallest element of U
add y to S
for each element z of U in
increasing order do //trim the list by
eliminating numbers
close one to another
if y<(1-c/N)z, set y=z and add z to S
if S contains a number between (1-c)s and s, output yes, otherwise no

Resources