find subset with elements smaller than S - algorithm

I have to find an algorithm for the following problem:
input are two numbers S and k of natural numbers and an unsorted set of n pair-wise different numbers.
decide in O(n) if there is a subset of k numbers that sum up to <=S. Note: k should not be part of the time complexity.
algorithm({x_1, ..., x_n}, k, S):
if exists |{x_i, ..., x_j}| = k and x_i + ... x_j <= S return true
I don't find a solution with time complexity O(n).
What I was able to get is in O(kn), as we search k times the minimum and sum is up:
algorithm(a={x_1, ..., x_n}, k, S):
sum = 0
for i=1,...,k:
min = a.popFirst()
for i=2,...,len(a):
if(a[i] < min):
t = a[i]
a[i] = min
min = t
sum += min
if sum <= S:
return true
else:
return false
this is in O(n) and return the right result. How can i loose the k?
Thanks for helping me, im really struggeling on this one!

Quickselect can be used to find the k smallest elements: https://en.wikipedia.org/wiki/Quickselect
It's basically quicksort, except that you only recurse on the interesting side of the pivot.
A simple implementation runs in O(N) expected time, but using median-of-medians to select a pivot, you can make that a real worst-case bound: https://en.wikipedia.org/wiki/Median_of_medians

You could build a min-heap of size k from the set. Time complexity of building this is O(n) expected time and O(n log k) worst case.
The heap should contain first k minimum elements from the set.
Then it is straightforward to see the sum of the elements in the heap is <= S. You don't need to remove the elements from the heap to calculate the sum. Just traverse the heap to calculate sum. Removing all elements entails k log k complexity.
You don't even need to consider the next higher elements, because adding them would result in sum greater than S

Related

How can I create an array of 'n' positive integers where none of the subsequences of the array have equal sum?

I was watching a lecture on the question "subsequence sum equals to k" in which you're given an array of n positive integers and a target sum = k. Your task is to check if any of the subsequence of the array have a sum equal to the target sum. The recursive solution works in O(2^N). The lecturer said that if we memoize the recursive solution, the time complexity will drop to O(N*K). But as much as I understand, memoization simply removes overlapping subproblems. So if all of the subsequences have different sum, won't the time complexity of the solution still be O(2^N)? Just to test this hypothesis, I was trying to create an array of n positive integers where none of the subsequences have equal sum.
Also, I tried the tabulation method and was unable to understand why the time complexity drops in the case of tabulation. Please point to any resource where I can learn exactly what subproblems does tabulation avoid.
Note that O(NK) is not always smaller than O(2N). If K = 2N for example, then O(KN) = O(N * 2N), which is larger.
Furthermore, this is the sort of range you're dealing with when every subsequence sum is different.
If your N integers are powers of 2, for example: [20, 21, 22, ...], then every subsequence has a different sum, and K=2N is the smallest positive integer that isn't a subsequence sum.
The tabulation method is only an improvement when K is known to be relatively small.
If each value in the array is a different, positive power of the same base, no two sums will be equal.
Python code:
def f(A):
sums = set([0])
for a in A:
new_sums = set()
for s in sums:
new_sum = s + a
if new_sum in sums:
print(new_sum)
return None
new_sums.add(new_sum)
sums = sums.union(new_sums)
return sums
for b in range(2, 10):
A = []
for p in range(5):
A.append(b**p)
sums = f(A)
print(A)
print(len(sums))
print(sums)
In the recursive case without memoization, you'll compute the sums for all subsequences, which has O(2^N) complexity.
Now consider the memoization case.
Let dp[i][j] = 1 if there exists a subsequence in the array arr[i:] that has sum j, else dp[i][j] = 0.
The algorithm is
for each index i in range(n,0,-1):
j = array[i]
for each x in range(0,k):
dp[i][x] += dp[i+1][x]
if dp[i+1][x] == 1:
dp[i][x+j] = 1
return dp[0][k]
For each index, we traverse the subsequence sums seen till yet (in range k), and mark them True for the current index. For each such sum, we also add the value of the current element, and mark that True.
Which sub-problems were reduced?
We just track if sum x is possible in a subarray. In the recursive case, there could be 100 subsequences that have sum x. Here, since we're using a bool to track if x is possible in the subarray, we effectively avoid going over all subsequences just to check if the sum is possible.
For each index, since we do a O(k) traversal of going through all sums, the complexity becomes O(N*k).

How to choose the least number of weights to get a total weight in O(n) time

If there are n unsorted weights and I need to find the least number of weights to get at least weight W.
How do I find them in O(n)?
This problem has many solution methods:
Method 1 - Sorting - O(nlogn)
I guess that the most trivial one would be to sort in descending order and then to take the first K elements that give a sum of at least W. The time complexity will be though O(nlogn).
Method 2 - Max Heap - O(n + klogn)
Another method would be to use a max heap.
Creating the heap will take O(n) and then extracting elements until we got to a total sum of at least W. Each extraction will take O(logn) so the total time complexity will be O(klogn) where k is the number of elements we had to extract from the heap.
Method 3 - Using Min Heap - O(nlogk)
Adding this method that JimMischel suggested in the comments below.
Creating a min heap with the first k elements in the list that sums to at least W. Then, iterate over the remaining elements and if it's greater than the minimum (heap top) replace between them.
At this point, it might be that we have more elements of what we actually need to get to W, so we will just extract the minimums until we reach our limit. In practice, depending on the relation between
find_min_set(A,W)
currentW = 0
heap H //Create empty heap
for each Elem in A
if (currentW < W)
H.add(Elem)
currentW += Elem
else if (Elem > H.top())
currentW += (Elem-H.top())
H.pop()
H.add(Elem)
while (currentW-H.top() > W)
currentW -= H.top()
H.pop()
This method might be even faster in practice, depending on the relation between k and n. See when theory meets practice.
Method 4 - O(n)
The best method I could think of will be using some kind of quickselect while keeping track of the total weight and always partitioning with the median as a pivot.
First, let's define few things:
sum(A) - The total sum of all elements in array A.
num(A) - The number of elements in array A.
med(A) - The median of the array A.
find_min_set(A,W,T)
//partition A
//L contains all the elements of A that are less than med(A)
//R contains all the elements of A that are greater or equal to med(A)
L, R = partition(A,med(A))
if (sum(R)==W)
return T+num(R)
if (sum(R) > W)
return find_min_set(R,W,T)
if (sum(R) < W)
return find_min_set(L,W-sum(R),num(R)+T)
Calling this method by find_min_set(A,W,0).
Runtime Complexity:
Finding median is O(n).
Partitioning is O(n).
Each recursive call is taking half of the size of the array.
Summing it all up we get a follow relation: T(n) = T(n/2) + O(n) which is same as the average case of quickselect = O(n).
Note: When all values are unique both worst-case and average complexity is indeed O(n). With possible duplicates values, the average complexity is still O(n) but the worst case is O(nlogn) with using Median of medians method for selecting the pivot.

Find kth number in sum array

Given an array A with N elements I need to find pair (i,j) such that i is not equal to j and if we write the sum A[i]+A[j] for all pairs of (i,j) then it comes at the kth position.
Example : Let N=4 and arrays A=[1 2 3 4] and if K=3 then answer is 5 as we can see it clearly that sum array becomes like this : [3,4,5,5,6,7]
I can't go for all pair of i and j as N can go up to 100000. Please help how to solve this problem
I mean something like this :
int len=N*(N+1)/2;
int sum[len];
int count=0;
for(int i=0;i<N;i++){
for(int j=i+1;j<N;j++){
sum[count]=A[i]+A[j];
count++;
}
}
//Then just find kth element.
We can't go with this approach
A solution that is based on a fact that K <= 50: Let's take the first K + 1 elements of the array in a sorted order. Now we can just try all their combinations. Proof of correctness: let's assume that a pair (i, j) is the answer, where j > K + 1. But there are K pairs with the same or smaller sum: (1, 2), (1, 3), ..., (1, K + 1). Thus, it cannot be the K-th pair.
It is possible to achieve an O(N + K ^ 2) time complexity by choosing the K + 1 smallest numbers using a quickselect algorithm(it is possible to do even better, but it is not required). You can also just the array and get an O(N * log N + K ^ 2 * log K) complexity.
I assume that you got this question from http://www.careercup.com/question?id=7457663.
If k is close to 0 then the accepted answer to How to find kth largest number in pairwise sums like setA + setB? can be adapted quite easily to this problem and be quite efficient. You need O(n log(n)) to sort the array, O(n) to set up a priority queue, and then O(k log(k)) to iterate through the elements. The reversed solution is also efficient if k is near n*n - n.
If k is close to n*n/2 then that won't be very good. But you can adapt the pivot approach of http://en.wikipedia.org/wiki/Quickselect to this problem. First in time O(n log(n)) you can sort the array. In time O(n) you can set up a data structure representing the various contiguous ranges of columns. Then you'll need to select pivots O(log(n)) times. (Remember, log(n*n) = O(log(n)).) For each pivot, you can do a binary search of each column to figure out where it split it in time O(log(n)) per column, and total cost of O(n log(n)) for all columns.
The resulting algorithm will be O(n log(n) log(n)).
Update: I do not have time to do the finger exercise of supplying code. But I can outline some of the classes you might have in an implementation.
The implementation will be a bit verbose, but that is sometimes the cost of a good general-purpose algorithm.
ArrayRangeWithAddend. This represents a range of an array, summed with one value.with has an array (reference or pointer so the underlying data can be shared between objects), a start and an end to the range, and a shiftValue for the value to add to every element in the range.
It should have a constructor. A method to give the size. A method to partition(n) it into a range less than n, the count equal to n, and a range greater than n. And value(i) to give the i'th value.
ArrayRangeCollection. This is a collection of ArrayRangeWithAddend objects. It should have methods to give its size, pick a random element, and a method to partition(n) it into an ArrayRangeCollection that is below n, count of those equal to n, and an ArrayRangeCollection that is larger than n. In the partition method it will be good to not include ArrayRangeWithAddend objects that have size 0.
Now your main program can sort the array, and create an ArrayRangeCollection covering all pairs of sums that you are interested in. Then the random and partition method can be used to implement the standard quickselect algorithm that you will find in the link I provided.
Here is how to do it (in pseudo-code). I have now confirmed that it works correctly.
//A is the original array, such as A=[1,2,3,4]
//k (an integer) is the element in the 'sum' array to find
N = A.length
//first we find i
i = -1
nl = N
k2 = k
while (k2 >= 0) {
i++
nl--
k2 -= nl
}
//then we find j
j = k2 + nl + i + 1
//now compute the sum at index position k
kSum = A[i] + A[j]
EDIT:
I have now tested this works. I had to fix some parts... basically the k input argument should use 0-based indexing. (The OP seems to use 1-based indexing.)
EDIT 2:
I'll try to explain my theory then. I began with the concept that the sum array should be visualised as a 2D jagged array (diminishing in width as the height increases), with the coordinates (as mentioned in the OP) being i and j. So for an array such as [1,2,3,4,5] the sum array would be conceived as this:
3,4,5,6,
5,6,7,
7,8,
9.
The top row are all values where i would equal 0. The second row is where i equals 1. To find the value of 'j' we do the same but in the column direction.
... Sorry I cannot explain this any better!

Find pairs with given difference

Given n, k and n number of integers. How would you find the pairs of integers for which their difference is k?
There is a n*log n solution, but I cannot figure it out.
You can do it like this:
Sort the array
For each item data[i], determine its two target pairs, i.e. data[i]+k and data[i]-k
Run a binary search on the sorted array for these two targets; if found, add both data[i] and data[targetPos] to the output.
Sorting is done in O(n*log n). Each of the n search steps take 2 * log n time to look for the targets, for the overall time of O(n*log n)
For this problem exists the linear solution! Just ask yourself one question. If you have a what number should be in the array? Of course a+k or a-k (A special case: k = 0, required an alternative solution). So, what now?
You are creating a hash-set (for example unordered_set in C++11) with all values from the array. O(1) - Average complexity for each element, so it's O(n).
You are iterating through the array, and check for each element Is present in the array (x+k) or (x-k)?. You check it for each element, in set in O(1), You check each element once, so it's linear (O(n)).
If you found x with pair (x+k / x-k), it is what you are looking for.
So it's linear (O(n)). If you really want O(n lg n) you should use a set on tree, with checking is_exist in (lg n), then you have O(n lg n) algorithm.
Apposition: No need to check x+k and x-k, just x+k is sufficient. Cause if a and b are good pair then:
if a < b then
a + k == b
else
b + k == a
Improvement: If you know a range, you can guarantee linear complexity, by using bool table (set_tab[i] == true, when i is in table.).
Solution similar to one above:
Sort the array
set variables i = 0; j = 1;
check the difference between array[i] and array[j]
if the difference is too small, increase j
if the difference is too big, increase i
if the difference is the one you're looking for, add it to results and increase j
repeat 3 and 4 until the end of array
Sorting is O(n*lg n), the next step is, if I'm correct, O(n) (at most 2*n comparisons), so the whole algorithm is O(n*lg n)

Finding the median of an unsorted array

To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this approach would take O(nlogn) time.
Can we do the same by some method in O(n) time? If we can, then please tell or suggest some method.
You can use the Median of Medians algorithm to find median of an unsorted array in linear time.
I have already upvoted the #dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort
Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).
BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams
Quickselect works in O(n), this is also used in the partition step of Quicksort.
The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:
import random
def partition(L, v):
smaller = []
bigger = []
for val in L:
if val < v: smaller += [val]
if val > v: bigger += [val]
return (smaller, [v], bigger)
def top_k(L, k):
v = L[random.randrange(len(L))]
(left, middle, right) = partition(L, v)
# middle used below (in place of [v]) for clarity
if len(left) == k: return left
if len(left)+1 == k: return left + middle
if len(left) > k: return top_k(left, k)
return left + middle + top_k(right, k - len(left) - len(middle))
def median(L):
n = len(L)
l = top_k(L, n / 2 + 1)
return max(l)
No, there is no O(n) algorithm for finding the median of an arbitrary, unsorted dataset.
At least none that I am aware of in 2022. All answers offered here are variations/combinations using heaps, Median of Medians, Quickselect, all of which are strictly O(nlogn).
See https://en.wikipedia.org/wiki/Median_of_medians and http://cs.indstate.edu/~spitla/abstract2.pdf.
The problem appears to be confusion about how algorithms are classified, which is according their limiting (worst case) behaviour. "On average" or "typically" O(n) with "worst case" O(f(n)) means (in textbook terms) "strictly O(f(n))". Quicksort for example, is often discussed as being O(nlogn) (which is how it typically performs), although it is in fact an O(n^2) algorithm because there is always some pathological ordering of inputs for which it can do no better than n^2 comparisons.
It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).
As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
http://en.wikipedia.org/wiki/Selection_algorithm
Here is Java source for a Quickselect algorithm to find the k'th element in an array:
/**
* Returns position of k'th largest element of sub-list.
*
* #param list list to search, whose sub-list may be shuffled before
* returning
* #param lo first element of sub-list in list
* #param hi just after last element of sub-list in list
* #param k
* #return position of k'th largest element of (possibly shuffled) sub-list.
*/
static int select(double[] list, int lo, int hi, int k) {
int n = hi - lo;
if (n < 2)
return lo;
double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot
// Triage list to [<pivot][=pivot][>pivot]
int nLess = 0, nSame = 0, nMore = 0;
int lo3 = lo;
int hi3 = hi;
while (lo3 < hi3) {
double e = list[lo3];
int cmp = compare(e, pivot);
if (cmp < 0) {
nLess++;
lo3++;
} else if (cmp > 0) {
swap(list, lo3, --hi3);
if (nSame > 0)
swap(list, hi3, hi3 + nSame);
nMore++;
} else {
nSame++;
swap(list, lo3, --hi3);
}
}
assert (nSame > 0);
assert (nLess + nSame + nMore == n);
assert (list[lo + nLess] == pivot);
assert (list[hi - nMore - 1] == pivot);
if (k >= n - nMore)
return select(list, hi - nMore, hi, k - nLess - nSame);
else if (k < nLess)
return select(list, lo, lo + nLess, k);
return lo + k;
}
I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].
In practice, you can expect the above code to be o(N).
Let the problem be: finding the Kth largest element in an unsorted array.
Divide the array into n/5 groups where each group consisting of 5 elements.
Now a1,a2,a3....a(n/5) represent the medians of each group.
x = Median of the elements a1,a2,.....a(n/5).
Now if k<n/2 then we can remove the largets, 2nd largest and 3rd largest element of the groups whose median is greater than the x. We can now call the function again with 7n/10 elements and finding the kth largest value.
else if k>n/2 then we can remove the smallest ,2nd smallest and 3rd smallest element of the group whose median is smaller than the x. We can now call the function of again with 7n/10 elements and finding the (k-3n/10)th largest value.
Time Complexity Analysis:
T(n) time complexity to find the kth largest in an array of size n.
T(n) = T(n/5) + T(7n/10) + O(n)
if you solve this you will find out that T(n) is actually O(n)
n/5 + 7n/10 = 9n/10 < n
Notice that building a heap takes O(n) actually not O(nlogn), you can check this using amortized analysis or simply check in Youtube.
Extract-Min takes O(logn), therefore, extracting n/2 will take (nlogn/2) = O(nlogn) amortized time.
About your question, you can simply check at Median of Medians.

Resources