Using subset-sum oracle to determine which numbers are members of the subset - algorithm

I am having trouble starting off this particular homework problem. Here is the problem:
Suppose that you are given an algorithm as a black box – you cannot see how it is designed – it has the following properties: if you input any sequence of real numbers and an integer k, the algorithm will answer YES or NO indicating whether there is a subset of numbers whose sum is exactly k. Show how to use this black box to find the subset of a given sequence X1, …., Xn whose sum is k. You can use the black box O(n) times.
I figure that the sequence should be sorted first, and anything < k should only be considered. Any help to get started would be greatly appreciated. Thanks.

Sorting is the wrong approach. Think about it this way: how can you use the oracle to determine whether a particular item in the set is part of the sum? Once you know whether that item is part of the sum, how can you use the oracle to figure out whether some other item is part of the sum?

The blackbox is something like this, in C# (ignore that I used int instead of real for the sequence, it's inconsequential to the problem).
bool blackbox(List<int> subSequence, int k)
{
// unknown
}
You are tasked with passing in a subset of the sequence and finding what part of the sequence equals k.
Start with the whole sequence, just to see if k is in it at all.
Then, if it contains k, try a subsequence to see if that subsequence contains k.
Repeat until you have the subsequence that contains k.

Related

What is the following known as?

I have a simple algorithmic problem.
I have a set of positive integers S and a positive maximum integer i.
Let's say the sum of S (or a subset of S) is the sum of its elements.
I need to find a subset s of S whose sum does not exceed i and is "maximally summing" - meaning no other subset of S has a greater sum than s without exceeding i.
The trivial solution I came up with is to go over each set of the power set of S and sum the integers, keeping track of the set with the properties I seek, but this algorithm is obviously exponential.
There must be a well-known name for this problem, as I don't think I am the first to come across this need. Could someone help me out?
Solve subset sum problem for your set using dynamic programming.
Then scan filled table from i-th entry to smaller values until you find non-zero entry (i.e. such sum exists). This is the largest sum of subsets that not exceeding given value.

Knapsack with unique elements

I'm trying to solve the following:
The knapsack problem is as follows: given a set of integers S={s1,s2,…,sn}, and a given target number T, find a subset of S that adds up exactly to T. For example, within S={1,2,5,9,10} there is a subset that adds up to T=22 but not T=23. Give a correct programming algorithm for knapsack that runs in O(nT) time.
but the only algorithm I could come up with is generating all the 1 to N combinations and try the sum out (exponential time).
I can't devise a dynamic programming solution since the fact that I can't reuse an object makes this problem different from a coin rest exchange problem and from a general knapsack problem.
Can somebody help me out with this or at least give me a hint?
The O(nT) running time gives you the hint: do dynamic programming on two axes. That is, let f(a,b) denote the maximum sum <= b which can be achieved with the first a integers.
f satisfies the recurrence
f(a,b) = max( f(a-1,b), f(a-1,b-s_a)+s_a )
since the first value is the maximum without using s_a and the second is the maximum including s_a. From here the DP algorithm should be straightforward, as should outputting the correct subset of S.
I did find a solution but with O(T(n2)) time complexity. If we make a table from bottom to top. In other words If we sort the array and start with the greatest number available and make a table where columns are the target values and rows the provided number. We will need to consider the sum of all possible ways of making i- cost [j] +j . Which will take n^2 time. And this multiplied with target.

Sum of a number with any k elements in Array

Design an algorithm that, given a set S of n integers and another
integer x, determines whether or not there exist k (n>k>2) elements in
S whose sum is exactly x. Please give the running time of your
algorithm
I have been preparing for an interview, and i have come across this algorithm. I have solved problems where k has been specified in the problem. like 2 or 3. But i cannot find an answer where i might solve for any k that might exist. I have tried solving it using dynamic programming but didn't get results. Can anyone help me on this.
You can make an int array cnt of size x, then go through the set, and mark reachable points. All elements of cnt are set to -1 initially except element zero, which is set to zero.
Repeat the same process for each element si of S: for each element of cnt at position i that is non-negative, check element cnt[i+si] (if it's within the bounds of the array). If it is, set it to cnt[si+i] = max(cnt[i]+1, cnt[si+i]).
Once you go through all elements of S, check cnt[x]. If it is set to two or more, then there exists a combination of two or more elements in S adding up to x.
This algorithm is pseudo-polynomial, with running time O(x*|S|)

k-size possible number combinations ordered by each sum

Given a set of n numbers; What is the code that generate all possible k-size subsets in descending order (decreasing each sum of values)?
Example:
Set={9,8,6,2,1} => n=5 and k=3. So the output is:
[9,8,6]
[9,8,2]
[9,8,1]
[9,6,2]
[9,6,1]
[8,6,2]
[8,6,1]
[9,2,1]
[8,2,1]
[6,2,1]
It is preferred the most efficient algorithm, but the algorithm with NP-Complete complexity (n choose k permutations) is the answer yet.
One-by-one generation in the Matlab Code is preferred for implementation. Or a solution that the maximum size of the ordered list in it can be determined (by this, for greater n and k, one may use an approximation and return specific size of this list without computing all possibilities).
Note: 1)Please give attention to the position of [9,2,1] in this ordered list. So index ordering is not the correct answer.
2)This may be a type of Lexicographical order.
Thanks to Divakar, Yvon, and Luis, one of the possible answers to this question:
There are sorted set combinations in the SSC, so
combs = nchoosek(Set,k);
[~,ind] = sort(sum(combs,2),'descend');
SSC = combs(ind,:);
if you want the index of each number array in the Set (has unique numbers), with num_arr index in SSC use this code
for i=1:k
Index(i)=find(SSC(num_arr,j)==Set(1,:));
end
this code returns [1,3,5] for [9,6,1] in Index.
for greater n
In this case, the computation is very time-consuming or even is impractical. An approximation may solves this issue, for such situations, you can find the first arbitrary answer by modifying the nchoosek.m in the Matlab.

Efficiently selecting a set of random elements from a linked list

Say I have a linked list of numbers of length N. N is very large and I don’t know in advance the exact value of N.
How can I most efficiently write a function that will return k completely random numbers from the list?
There's a very nice and efficient algorithm for this using a method called reservoir sampling.
Let me start by giving you its history:
Knuth calls this Algorithm R on p. 144 of his 1997 edition of Seminumerical Algorithms (volume 2 of The Art of Computer Programming), and provides some code for it there. Knuth attributes the algorithm to Alan G. Waterman. Despite a lengthy search, I haven't been able to find Waterman's original document, if it exists, which may be why you'll most often see Knuth quoted as the source of this algorithm.
McLeod and Bellhouse, 1983 (1) provide a more thorough discussion than Knuth as well as the first published proof (that I'm aware of) that the algorithm works.
Vitter 1985 (2) reviews Algorithm R and then presents an additional three algorithms which provide the same output, but with a twist. Rather than making a choice to include or skip each incoming element, his algorithm predetermines the number of incoming elements to be skipped. In his tests (which, admittedly, are out of date now) this decreased execution time dramatically by avoiding random number generation and comparisons on each in-coming number.
In pseudocode the algorithm is:
Let R be the result array of size s
Let I be an input queue
> Fill the reservoir array
for j in the range [1,s]:
R[j]=I.pop()
elements_seen=s
while I is not empty:
elements_seen+=1
j=random(1,elements_seen) > This is inclusive
if j<=s:
R[j]=I.pop()
else:
I.pop()
Note that I've specifically written the code to avoid specifying the size of the input. That's one of the cool properties of this algorithm: you can run it without needing to know the size of the input beforehand and it still assures you that each element you encounter has an equal probability of ending up in R (that is, there is no bias). Furthermore, R contains a fair and representative sample of the elements the algorithm has considered at all times. This means you can use this as an online algorithm.
Why does this work?
McLeod and Bellhouse (1983) provide a proof using the mathematics of combinations. It's pretty, but it would be a bit difficult to reconstruct it here. Therefore, I've generated an alternative proof which is easier to explain.
We proceed via proof by induction.
Say we want to generate a set of s elements and that we have already seen n>s elements.
Let's assume that our current s elements have already each been chosen with probability s/n.
By the definition of the algorithm, we choose element n+1 with probability s/(n+1).
Each element already part of our result set has a probability 1/s of being replaced.
The probability that an element from the n-seen result set is replaced in the n+1-seen result set is therefore (1/s)*s/(n+1)=1/(n+1). Conversely, the probability that an element is not replaced is 1-1/(n+1)=n/(n+1).
Thus, the n+1-seen result set contains an element either if it was part of the n-seen result set and was not replaced---this probability is (s/n)*n/(n+1)=s/(n+1)---or if the element was chosen---with probability s/(n+1).
The definition of the algorithm tells us that the first s elements are automatically included as the first n=s members of the result set. Therefore, the n-seen result set includes each element with s/n (=1) probability giving us the necessary base case for the induction.
References
McLeod, A. Ian, and David R. Bellhouse. "A convenient algorithm for drawing a simple random sample." Journal of the Royal Statistical Society. Series C (Applied Statistics) 32.2 (1983): 182-184. (Link)
Vitter, Jeffrey S. "Random sampling with a reservoir." ACM Transactions on Mathematical Software (TOMS) 11.1 (1985): 37-57. (Link)
This is called a Reservoir Sampling problem. The simple solution is to assign a random number to each element of the list as you see it, then keep the top (or bottom) k elements as ordered by the random number.
I would suggest: First find your k random numbers. Sort them. Then traverse both the linked list and your random numbers once.
If you somehow don't know the length of your linked list (how?), then you could grab the first k into an array, then for node r, generate a random number in [0, r), and if that is less than k, replace the rth item of the array. (Not entirely convinced that doesn't bias...)
Other than that: "If I were you, I wouldn't be starting from here." Are you sure linked list is right for your problem? Is there not a better data structure, such as a good old flat array list.
If you don't know the length of the list, then you will have to traverse it complete to ensure random picks. The method I've used in this case is the one described by Tom Hawtin (54070). While traversing the list you keep k elements that form your random selection to that point. (Initially you just add the first k elements you encounter.) Then, with probability k/i, you replace a random element from your selection with the ith element of the list (i.e. the element you are at, at that moment).
It's easy to show that this gives a random selection. After seeing m elements (m > k), we have that each of the first m elements of the list are part of you random selection with a probability k/m. That this initially holds is trivial. Then for each element m+1, you put it in your selection (replacing a random element) with probability k/(m+1). You now need to show that all other elements also have probability k/(m+1) of being selected. We have that the probability is k/m * (k/(m+1)*(1-1/k) + (1-k/(m+1))) (i.e. probability that element was in the list times the probability that it is still there). With calculus you can straightforwardly show that this is equal to k/(m+1).
Well, you do need to know what N is at runtime at least, even if this involves doing an extra pass over the list to count them. The simplest algorithm to do this is to just pick a random number in N and remove that item, repeated k times. Or, if it is permissible to return repeat numbers, don't remove the item.
Unless you have a VERY large N, and very stringent performance requirements, this algorithm runs with O(N*k) complexity, which should be acceptable.
Edit: Nevermind, Tom Hawtin's method is way better. Select the random numbers first, then traverse the list once. Same theoretical complexity, I think, but much better expected runtime.
Why can't you just do something like
List GetKRandomFromList(List input, int k)
List ret = new List();
for(i=0;i<k;i++)
ret.Add(input[Math.Rand(0,input.Length)]);
return ret;
I'm sure that you don't mean something that simple so can you specify further?

Resources