Rewrite O(N W) in terms of N - algorithm

I have this question that asks to rewrite the subset sum problem in terms of only N.
If unaware the problem is that given weights, each with cost 1 how would you find the optimal solution given a max weight to achieve.
So the O(NW) is the space and time costs, where space will be for the 2d matrix and in the use of dynamic programming. This problem is a special case of the knapsac problem.
I'm not sure how to approach this as I tried to think about it and only thing I thought of was find the sum of all weights and just have a general worst case scenario. Thanks

If the weight is not bounded, and so the complexity must depend solely on N, there is at least an O (2N) approach, which is trying all possible subsets of N elements and computing their sums.

If you are willing to use exponential space rather than polynomial space, you can solve the problem in O(n 2^(n/2)) time and O(2^(n/2)) space if you split your set of n weights into two sets A and B of roughly equal size and compute the sum of weights for all the subsets of the two sets, and then hash all sums of subsets in A and hash W - x for all sums x of subsets of B, and if you get a collision between a subset of A and a subset of B in the hash table then you have found a subset that sums to W.

Related

Complement of a set of intervals

I have a set of intervals inside the range [0,k].
How can I produce the complement set of this set of intervals?
I can come up with an algorithm, but it requires sorting the intervals.
Therefore, the complexity is O(nlogn), where n is the number of intervals.
Is there any faster algorithm to do this? If not, is there any way to prove that this is the optimal complexity?
Thank you.
In practice, let us assume that you have found an algorithm to perform this task (find the complement set) in O(n).
Then we can show that you have invented a new sort algorithm working in O(n).
To simplify, let us assume that the array to be sorted consists of natural numbers, and that there is no repetition.
If [a1 a2 ... an] need to be sorted, then consider the intervals [a1, a1+1) [a2, a2 + 1) ... [an, an + 1).
Applying you algorithm to generate the complement set of intervals in O(n), we get n intervals
​ [x1a + 1, x1b) [x2a + 1, x2b])... [xna + 1, xnb)
where the {xia, xib} corresponds to successive aj elements after sorting.
Let us assimilate this relation as a directed edge in a graph, connecting the two vertices xia and xib.
To get the original array in sorted array, we need to find the start of the graph, and then walking through the graph, which can be done in O(n).
The sort has been performed in O(n).
The fact that we did not consider repetitions is not too annoying from a theoretical point of view, if we consider for example that with hashing, we can suppress the repetitions in O(n).
The fact to not consider floating point values is a detail: finding a new O(n) sort algorithm for natural numbers
would be already a great result.

Finding the weighted median in an unsorted array in linear time

This is from the practice problem in one of coursera's Algorithms courses; I've been stuck for a couple of weeks.
The problem is this:
Given an array of n distinct unsorted elements x1, x2, ..., xn ε X with positive weights w1, w2, ..., wn ε W, a weighted median is an element xk for which the total weight of all elements with values less than xk is at most (total weight)/2 and also the total weight of elements with values larger than xk is at most (total weight)/2. Observe that there are at most two weighted. Show how to compute all weighted medians in O(n) worst time
The course mostly covered divide and conquer algorithms, so I think the key to get started on this would be to identify which of the algorithms covered can be used for this problem.
One of the algorithms covered was the RSelect algorithm in the form RSelect(array X, length n, order statistic i) which for a weighted median could be written as RSelect(array X, weights W, length n, order statistic i). My issue with this approach is that it assumes I know the median value ahead of time, which seems unlikely. There's also the issue that the pivot is chosen uniformly at random, which I don't imagine is likely to work with weights without computing every weight for every entry.
Next is the DSelect algorithms, where using a median of medians approach a pivot may be computed without randomization so we can compute a proper median. This seems like the approach that could work, where I have trouble is that it also assumes that I know ahead of time the value I'm looking for.
DSelect(array A, length n, order statistic i) for an unweighted array
DSelect(array A, weights W, length n, order statistic i) for a weighted array
Am I overthinking this? Should I use DSelect assuming that I know the value of (total weight) / 2 ahead of time? I guess even if I compute it it would add only linear time to the running time. But then it would be no different from precomputing a weighted array (combine A, W into Q where qi = xi*wi) and transforming this back to an unweighted array problem where I can use RSelect (plus some accounting for cases where there are two medians)
I've found https://archive.org/details/lineartimealgori00blei/page/n3 and https://blog.nelsonliu.me/2016/07/05/gsoc-week-6-efficient-calculation-of-weighted-medians/ which describe this problem, but their approach doesn't seem to be something covered in the course (and I'm not familiar with heaps/heapsort)
This problem can be solved with a simple variant of quickselect:
Calculate the sum of all weights and divide by 2 to get the target sum
Choose a pivot and partition the array into larger and smaller elements
Sum the weights in the smaller partition, and subtract from the total to get the sum in the other partition
go back to 2 to process the appropriate partition with the appropriate target sum
Just like normal quickselect, this becomes linear in the worst case if you use the (normal, unweighted) median-of-medians approach to choose a pivot.
This average performance can be achieved with Quickselect.
The randomly chosen pivot can be chosen - with weighting - with the Reservoir Sampling Algorithm. You are correct that it is O(n) to find the first pivot, but the size of the lists that you're working with will follow a geometric series, so the total cost of finding pivots will still work out to be only O(n).

Algorithm to output a subset of set A such that it maximizes the overall pairwise sum

Suppose I have a set A={a_1, a_2, ..., a_n}. I also have a function f:AxA->R that assigns a pair from A a certain real value. I want to extract a subset S_k of size k from A such that it maximizes the overall pairwise sum of all elements in S_k
Is there any known algorithm that would do this in reasonable time? polynomial/quasi-polynomial time perhaps?
Edit: Worked Example
Suppose A={a_1,a_2,a_3,a_4} with k=3 and f is defined as:
f(a_1,a_2)=0,f(a_1,a_3)=0,f(a_1,a_4)=0,f(a_2,a_3)=1,f(a_2,a_4)=5,f(a_3,a_4)=10.
Then S_k={a_2,a_3,a_4} since it maximizes the sum f(a_2,a_3)+f(a_2,a_4)+f(a_3,a_4). (i.e. the pairwise sum of all elements in S_k)
Unlikely -- this problem generalizes the problem of finding a k-clique (set the weights to the adjacency matrix of the graph), for which the best known algorithms are exponential (see also the strong exponential time hypothesis).

Finding a k element subset in a set of real numbers (Programming Pearls book)

I am solving problems from Column2 of Programming Pearls. I came across this problem:
"Given a set of n real numbers, a real number t, and an integer k, how quickly can you determine whether there exists a k-element subset of the set that sums to at most t?"
My solution is to sort the set of real numbers and then look at the sum for the first k elements. If this sum is less than or equal to t, then we know there exists at least one
set that satisfies the condition.
Is the solution correct?
Is there a better or different solution?
Note: Just to make it clear, do not assume the input to be already sorted.
Because you need only first k elements sorted as per your problem , I suggest following:-
Select the kth element in array using randomised select O(N)
Take sum of first k elements in array and check if its less than t
Time complexity O(N + k) = O(N) as k is O(N)
Randomized Selection
Note:- when k is very small as compared to N then max heap can be very efficient as the storage does not cost that much and it can solve problem in worst case O(Nlogk).

How to enumerate all k-combinations of a set by sum?

Suppose I have a finite set of numeric values of size n.
Question: Is there an efficient algorithm for enumerating the k-combinations of that set so that combination I precedes combination J iff the sum of the elements in I is less than or equal to the sum of the elements in J?
Clearly it's possible to simply enumerate the combinations and sort them according to their sums. If the set is large, however, brute enumeration of all combinations, let alone sorting, will be infeasible. If I'm only interested in obtaining the first m << choose(n,k) combinations ranked by sum, is it possible to obtain them before the heat death of the universe?
There is no polynomial algorithm for enumerating the set this way (unless P=NP).
If there was such an algorithm (let it be A), then we could solve the subset sum problem polynomially:
run A
Do a binary search to find the subset that sums closest to the desired number.
Note that step 1 runs polynomially (assumption) and step 2 runs in O(log(2^n)) = O(n).
Conclusion: Since the Subset Sum problem is NP-Complete, solving this problem efficiently will prove P=NP - thus there is no known polynomial solution to the problem.
Edit: Even though the problem is NP-Hard, getting the "smallest" m subsets can be done on O(n+2^m) by selecting the smallest m elements, generating all the subsets from these m elements - and choosing the minimal m of those. So for fairly small values of m - it might be feasible to calculate it.

Resources