Efficient multiselection algorithm - algorithm

I have to implement an algorithm that solves the multi-selection problem.
The multiselection problem is:
Given a set S of n elements drawn from a linearly ordered set, and a set K = {k1, k2,...,kr} of positive integers between 1 and n, the multiselection problem is to select the ki-th smallest element for all values of i, 1 <= i <= r
I need to solve the average case on Θ(n log r)
I've found a paper that implements the solution I need, but it assumes that there are no repeated numbers on the set S. The problem is that I can't assume that and I don't know how to adapt the algorithm of that paper to support repeated numbers.
The paper is here: http://www.ccse.kfupm.edu.sa/~suwaiyel/publications/multiselection_parCom.pdf
and the algorithm is on the second page. Any tips are welcome!

For posterity: the algorithm to which Ivan refers is to sort K, then solve the problem recursively as follows. Use QuickSelect to find the ki-th smallest element x where i is ceil(r/2), then recurse on the smaller halves of K and S, and the larger halves of K and S, splitting K about i and S about x.
Finding algorithms that work in the presence of degeneracy (here, equal elements) is often not a high priority for authors of theoretical works, because it makes the presentation of the common case more difficult and doesn't often play a role in determining the computational complexity of the problem. This is essentially a one-dimensional problem, and the black box solution is easy; replace the i-th element of the input yi by (yi, i) and break ties in the comparisons using the second component.
In practice, we can do better. Instead of recursing on {y : y in S, y < x} and {y : y in S, y > x}, use a three-way partitioning algorithm about x (see, e.g., every sufficiently complete treatment of QuickSort), then divide the array S by index instead of value.

Related

Integer partition weighted minimum of maximum

Given a non-negative integer n and a positive real weight vector w with dimension m, partition n into a length-m non-negative integer vector that sums to n (call it v) such that max w_iv_i is the smallest, that is, we want to find the vector v such that the maximum of element-wise product between w and v is the smallest. There maybe several partitions, and we only want the smallest value of max w_iv_i among all possible v.
Seems like this problem can use a greedy algorithm to solve. From a target vector v for n-1, we add 1 to each entry, and find the minimum among those m vectors. but I don't think it's correct. The intuition is that it might add "over" the minimum. That is, there exists another partition not yielded by the add 1 procedure that falls in between the "minimum" of n-1 produced by this greedy algorithm and that of n produced by this greedy algorithm. Can anyone prove if this is correct or incorrect?
If you already know the maximum element-wise product P, then you can just set vi = floor(P/wi) until you run out of n.
Use binary search to find the smallest possible value of P.
The largest guess you need to try is n * min(w), so that means testing log(n) + log(min(w)) candidates, spending O(m) time for each test, or O(m*(log n + log(min(w))) all together.

Greedy Attempt for covering all the numbers with the given intervals

Let S be a set of intervals (containing n number of intervals) of the natural numbers that might overlap and N be a list of numbers (containing n number of numbers).
I want to find the smallest subset (let's call P) of S such that for each number
in our list N, there exists at least one interval in P that contains it. The intervals in P are allowed to overlap.
Trivial example:
S = {[1..4], [2..7], [3..5], [8..15], [9..13]}
N = [1, 4, 5]
// so P = {[1..4], [2..7]}
I think a dynamic algorithm might not work always, so if anybody knows of a solution to this problem (or a similar one that can be converted into), that would be great. I am trying to make a O(n^2 solution)
Here is one greedy approach
P = {}
for each q in N: // O(n)
if q in P // O(n)
continue
for each i in S // O(n)
if q in I: // O(n)
P.add(i)
break
But that is O(n^4).. Any help with creating a greedy approach that is O(n^2) would be great!
Thanks!
* Update: * I've been slamming at this problem and I think I have an O(n^2) solution!!
Let me know if you think I'm right!!!
N = MergeSort (N)
upper, lower = infinity, -1
P = empty set
for each q in N do
if (q>=lower and q<=upper)=False
max_interval = [-infinity, infinity]
for each r in S do
if q in r then
if r.rightEndPoint > max_interval.rightEndPoint
max_interval = r
P.append(max_interval)
lower = max_interval.leftEndPoint
upper = max_interval.rightEndPoint
S.remove(max_interval)
I think this should work!! I'm trying to find a counter solution; but yeah!!
This problem is similar to set cover problem, which is NP-complete (i.e., arguably has no solution faster than exponential). What makes it different is that intervals always cover adjacent elements (not arbitrary subset of N), which opens ways for faster solutions.
http://en.wikipedia.org/wiki/Set_cover_problem
I think that the solution proposed by Mike is good enough. But I think I have quite straightforward O(N^2) greedy algo. It starts like the Mike's one (moreover, I believe Mike's solution can also be improved in similar way):
You sort your N numbers and place them sorted into array ELEM; COMPLEXITY O(N*lg N);
Using binary search, for each interval S[i] you identify starting and ending index of elements in ELEM that are covered by S[i]. Say, you place this pair of numbers into array COVER, the difference between the two indices tells you how many elements you cover, for simplicity, let us place it array COVER_COUNT; COMPLEXITY O(N*lg N);
You introduce index pointer p, that shows till which element in ELEM, your N is already covered. you set p = 0, meaning that all elements up to 0-th (excluded) are initially covered (i.e., no elements); Complexity O(1). Moreover you introduce boolean array IS_INCLUDED, that reflects if interval S[i] is already included in your coverage set. Complexity O(N)
Then you start from the 0-th element in ELEM and see what is the interval that contains ELEM[0] and has greater coverage COVER_COUNT[i]. Imagine that it is i-th interval. We then mark it as included by setting IS_INCLUDED[i] to true. Then you set p to end[i] + 1 where end[i] is the ending index in COVER[i] pair (indeed now all elements til end[i] are covered). Then, knowing p you update all elements in COVER_COUNT so that they reflect how many elements of not yet covered elements each interval covers (this can be easily done in O(N) time). Then you perform the same step for ELEM[p] and continues till p >= ELEM.length. It can be observed that the overall complexity is O(N^2).
You finish in O(n^2) and in IS_INCLUDED has true for intervals of S included in optimal cover set
Let me know if this solution seems reasonable to you and if I calculated everything well.
P.S. Just wanted to add that the optimality of ythe solution found by algo can be proved by induction and contradiction. By contradiction, it is easy to show that at least one optimal solution includes the longest interval of those covering element ELEM[0]. If so, by induction we can show that for each next element in algo, we can keep on following the strategy of selelcting the interval that is the longest with respect to the number of remaining elements covered and that covers the leftmost yet uncovered element.
I am not sure, but mb some think like this.
1) For each interval create a list with elements from N witch contain in interval, it will take O(n^2) lets call it Q[i] for S[i]
2) Then sort our S by length of Q[i], O(n*lg(n))
3) Go throw this array excluding Q[i] from N O(n) and from Q[i+1]...Q[n] = O(n^2)
4) Repeat 2 while N is not empty.
It's not O(n^2), it's O(n^3) but if you can use hashmap, i think you can improve this.

Partitioning a list of integers to minimize difference of their sums

Given a list of integers l, how can I partition it into 2 lists a and b such that d(a,b) = abs(sum(a) - sum(b)) is minimum. I know the problem is NP-complete, so I am looking for a pseudo-polynomial time algorithm i.e. O(c*n) where c = sum(l map abs). I looked at Wikipedia but the algorithm there is to partition it into exact halves which is a special case of what I am looking for...
EDIT:
To clarify, I am looking for the exact partitions a and b and not just the resulting minimum difference d(a, b)
To generalize, what is a pseudo-polynomial time algorithm to partition a list of n numbers into k groups g1, g2 ...gk such that (max(S) - min(S)).abs is as small as possible where S = [sum(g1), sum(g2), ... sum(gk)]
A naive, trivial and still pseudo-polynomial solution would be to use the existing solution to subset-sum, and repeat for sum(array)/2to 0 (and return the first one found).
Complexity of this solution will be O(W^2*n) where W is the sum of the array.
pseudo code:
for cand from sum(array)/2 to 0 descending:
subset <- subsetSumSolver(array,cand)
if subset != null:
return subset
The above will return the maximal subset that is lower/equals sum(array)/2, and the other part is the complement for the returned subset.
However, the dynamic programming for subset-sum should be enough.
Recall that the formula is:
f(0,i) = true
f(x,0) = false | x != 0
f(x,i) = f(x-arr[i],i-1) OR f(x,i-1)
When building the matrix, the above actually creates you each row with value lower than the initial x, if you input sum(array)/2 - it's basically all values.
After you generate the DP matrix, just find the maximal value of x such that f(x,n)=true, and this is the best partition you can get.
Complexity in this case is O(Wn)
You can phrase this as a 0/1 integer linear programming optimization problem. Let wi be the ith number, and let xi be a 0/1 variable which indicates whether wi is in the first set or not. Then you want to minimize sum(xi wi) - sum((1 - xi) wi) subject to
sum(xi wi) >= sum((1 - xi) wi)
and also subject to all xi being 0 or 1. There has been a lot of research into optimizing 0/1 linear programming solvers. For large total sum W this may be an improvement over the O(W n) pseudo-polynomial time algorithm presented because the W factor is scary.
My first thought is to:
Sort list of integers
Create two empty lists A and B
While iterating from biggest integer to smallest integer...add next integer to the list with the smallest current sum.
This is, of course, not guaranteed to give you the best result but you can bound the result it will give you by the size of the biggest integer in your list

Efficiently calculate next permutation of length k from n choices

I need to efficiently calculate the next permutation of length k from n
choices. Wikipedia lists a great
algorithm
for computing the next permutation of length n from n choices.
The best thing I can come up with is using that algorithm (or the Steinhaus–Johnson–Trotter algorithm), and then just only considering the first k items of the list, and iterating again whenever the changes are all above that position.
Constraints:
The algorithm must calculate the next permutation given nothing more than
the current permutation. If it needs to generate a list of all permutations,
it will take up too much memory.
It must be able to compute a permutation of only length k of n (this is
where the other algorithm fails
Non-constraints:
Don't care if it's in-place or not
I don't care if it's in lexographical order, or any order for that matter
I don't care too much how efficiently it computes the next permutation,
within reason of course, it can't give me the next permutation by making a
list of all possible ones each time.
You can break this problem down into two parts:
1) Find all subsets of size k from a set of size n.
2) For each such subset, find all permutations of a subset of size k.
The referenced Wikipedia article provides an algorithm for part 2, so I won't repeat it here. The algorithm for part 1 is pretty similar. For simplicity, I'll describe it for "find all subsets of size k of the integers [0...n-1].
1) Start with the subset [0...k-1]
2) To get the next subset, given a subset S:
2a) Find the smallest j such that j ∈ S ∧ j+1 ∉ S. If j == n-1, there is no next subset; we're done.
2b) The elements less than j form a sequence i...j-1 (since if any of those values were missing, j wouldn't be minimal). If i is not 0, replace these elements with i-i...j-i-1. Replace element j with element j+1.

Subset sum for exactly k integers?

Following from these question Subset sum problem and Sum-subset with a fixed subset size I was wondering what the general algorithm for solving a subset sum problem, where we are forced to use EXACTLY k integers, k <= n.
Evgeny Kluev mentioned that he would go for using optimal for k = 4 and after that use brute force approach for k- 4 and optimal for the rest. Anyone could enlight what he means by a brute force approach here combined with optimal k=4 algo?
Perhaps someone knows a better, general solution?
The original dynamic programming algorithm applies, with a slight extension - in addition to remembering partial sums, you also need to remember number of ints used to get the sums.
In the original algorithm, assuming the target sum is M and there are n integers, you fill a boolean n x M array A, where A[i,m] is true iff sum m can be achieved by picking (any number of) from first i+1 ints (assuming indexing from 0).
You can extend it to a three dimensional array nxMxk, which has a similar property - A[i,m,l] is true iff, sum m can be achieved by picking exactly l from first i+1 ints.
Assuming the ints are in array j[0..n-1]:
The recursive relation is pretty similar - the field A[0,j[0],1] is true (you pick j[0], getting sum j[0] with 1 int (duh)), other fields in A[0,*,*] are false and deriving fields in A[i+1,*,*] from A[i,*,*] is also similar to the original algorithm: A[i+1,m,l] is true if A[i,m,l]is true (if you can pick m from first i ints, then obviously you can pick m from first i+1 ints) or if A[i, m - j[i+1], l-1] is true (if you pick j[i+1] then you increase the sum by j[i+1] and the number of ints by 1).
If k is small then obviously it makes sense to skip all of the above part and just iterate over all combinations of k ints and checking their sums. k<=4 indeed seems like a sensible threshold.

Resources