find kth smallest number in O(logn) time - algorithm

Here is the problem, an unsorted array a[n], and I need to find the kth smallest number in range [i, j], and absolutely 1<=i<=j<=n, k<=j-i+1.
Typically I will use quick-find to do the job, but it is not fast enough if there many query requests with different range [i, j], I hardly to figure out a algorithm to do the query in O(logn) time (preprocessing is allowed).
Any idea is appreciated.
PS
Let me make the problem easier to understand. Any kinds of preprocessing is allowed, but the query needs to be done in O(logn) time. And there will be many (more than 1) queries, like find the 1st in range [3,7], or 3rd in range [10,17], or 11th in range [33, 52].
By range [i, j] I mean in the original array, not sorted or something.
For example, a[5] = {3,1,7,5,9}, query 1st in range [3,4] is 5, 2nd in range [1,3] is 5, 3rd in range [0,2] is 7.

If pre-processing is allowed and not counted towards the time complexity, just use that to construct sub-lists so that you can efficiently find the element you're looking for. As with most optimisations, this trades space for time.
Your pre-processing step is to take your original list of n numbers and create a number of new sublists.
Each of these sublists is a portion of the original, starting with the nth element, extending for m elements and then sorted. So your original list of:
{3, 1, 7, 5, 9}
gives you:
list[0][0] = {3}
list[0][1] = {1, 3}
list[0][2] = {1, 3, 7}
list[0][3] = {1, 3, 5, 7}
list[0][4] = {1, 3, 5, 7, 9}
list[1][0] = {1}
list[1][1] = {1, 7}
list[1][2] = {1, 5, 7}
list[1][3] = {1, 5, 7, 9}
list[2][0] = {7}
list[2][1] = {5, 7}
list[2][2] = {5, 7, 9}
list[3][0] = {5}
list[3][1] = {5,9}
list[4][0] = {9}
This isn't a cheap operation (in time or space) so you may want to maintain a "dirty" flag on the list so you only perform it the first time after you do an modifying operation (insert, delete, change).
In fact, you can use lazy evaluation for even more efficiency. Basically set all sublists to an empty list when you start and whenever you perform a modifying operation. Then, whenever you attempt to access a sublist and it's empty, calculate that sublist (and that one only) before trying to get the kth value out of it.
That ensures sublists are evaluated only when needed and cached to prevent unnecessary recalculation. For example, if you never ask for a value from the 3-through-6 sublist, it's never calculated.
The pseudo-code for creating all the sublists is basically (for loops inclusive at both ends):
for n = 0 to a.lastindex:
create array list[n]
for m = 0 to a.lastindex - n
create array list[n][m]
for i = 0 to m:
list[n][m][i] = a[n+i]
sort list[n][m]
The code for lazy evaluation is a little more complex (but only a little), so I won't provide pseudo-code for that.
Then, in order to find the kth smallest number in the range i through j (where i and j are the original indexes), you simply look up lists[i][j-i][k-1], a very fast O(1) operation:
+--------------------------+
| |
| v
1st in range [3,4] (values 5,9), list[3][4-3=1][1-1-0] = 5
2nd in range [1,3] (values 1,7,5), list[1][3-1=2][2-1=1] = 5
3rd in range [0,2] (values 3,1,7), list[0][2-0=2][3-1=2] = 7
| | ^ ^ ^
| | | | |
| +-------------------------+----+ |
| |
+-------------------------------------------------+
Here's some Python code which shows this in action:
orig = [3,1,7,5,9]
print orig
print "====="
list = []
for n in range (len(orig)):
list.append([])
for m in range (len(orig) - n):
list[-1].append([])
for i in range (m+1):
list[-1][-1].append(orig[n+i])
list[-1][-1] = sorted(list[-1][-1])
print "(%d,%d)=%s"%(n,m,list[-1][-1])
print "====="
# Gives xth smallest in index range y through z inclusive.
x = 1; y = 3; z = 4; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 2; y = 1; z = 3; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
x = 3; y = 0; z = 2; print "(%d,%d,%d)=%d"%(x,y,z,list[y][z-y][x-1])
print "====="
As expected, the output is:
[3, 1, 7, 5, 9]
=====
(0,0)=[3]
(0,1)=[1, 3]
(0,2)=[1, 3, 7]
(0,3)=[1, 3, 5, 7]
(0,4)=[1, 3, 5, 7, 9]
(1,0)=[1]
(1,1)=[1, 7]
(1,2)=[1, 5, 7]
(1,3)=[1, 5, 7, 9]
(2,0)=[7]
(2,1)=[5, 7]
(2,2)=[5, 7, 9]
(3,0)=[5]
(3,1)=[5, 9]
(4,0)=[9]
=====
(1,3,4)=5
(2,1,3)=5
(3,0,2)=7
=====

Current solution is O( (logn)^2 ). I am pretty sure it can be modified to run on O(logn). The main advantage of this algorithm over paxdiablo's algorithm is space efficiency. This algorithm needs O(nlogn) space, not O(n^2) space.
First, the complexity of finding kth smallest element from two sorted arrays of length m and n is O(logm + logn). Complexity of finding kth smallest element from arrays of lengths a,b,c,d.. is O(loga+logb+.....).
Now, sort the whole array and store it. Sort the first half and second half of the array and store it and so on. You will have 1 sorted array of length n, 2 sorted of arrays of length n/2, 4 sorted arrays of length n/4 and so on. Total memory required = 1*n+2*n/2+4*n/4+8*n/8...= nlogn.
Once you have i and j figure out the list of of subarrays which, when concatenated, give you range [i,j]. There are going to be logn number of arrays. Finding kth smallest number among them would take O( (logn)^2) time.
Example for the last paragraph:
Assume the array is of size 8 (indexed from 0 to 7). You have the following sorted lists:
A:0-7, B:0-3, C:4-7, D:0-1, E:2-3, F:4-5, G:6-7.
Now construct a tree with pointers to these arrays such that every node contains its immediate constituents. A will be root, B and C are its children and so on.
Now implement a recursive function that returns a list of arrays.
def getArrays(node, i, j):
if i==node.min and j==node.max:
return [node];
if i<=node.left.max:
if j<=node.left.max:
return [getArrays(node.left, i, j)]; # (i,j) is located within left node
else:
return [ getArrays(node.left, i, node.left.max), getArrays(node.right, node.right.min, j) ]; # (i,j) is spread over left and right node
else:
return [getArrays(node.right, i, j)]; # (i,j) is located within right node

Preprocess: Make an nxn array where the [k][r] element is the kth smallest element of the first r elements (1-indexed for convenience).
Then, given some particular range [i,j] and value for k, do the following:
Find the element at the [k][j] slot of the matrix; call this x.
go down the i-1 column of your matrix and find how many values in it are smaller than or equal to x (treat column 0 as having 0 smaller entries). By construction, this column will be sorted (all columns will be sorted), so it can be found in log time. Call this value s
Find the element in the [k+s][j] slot of the matrix. This is your answer.
E.g., given 3 1 7 5 9
3 1 1 1 1
X 3 3 3 3
X X 7 5 5
X X X 7 7
X X X X 9
Now, if we're asked for the 2nd smallest in [2,4] range (again, 1-indexing), I first find the 2nd smallest in [1,4] range which is 3. I then look at column 1 and see that there is 1 element less than or equal to 3. Finally, I find the 3rd smallest in [1,4] range at [3][5] slot which is 5, as desired.
This takes n^2 space, and log(n) lookup time.

This one does not require pre-process but is somehow slower than O(logN). It's significantly faster than a naive iterate&count, and could support dynamic modification on the sequence.
It goes like this. Suppose the length n has n=2^x for some x. Construct a segment-tree whose root node represent [0,n-1]. For each of the node, if it represent a node [a,b], b>a, let it has two child nodes each representing [a,(a+b)/2], [(a+b)/2+1,b]. (That is, do a recursive divide-by-two).
Then, on each node, maintain a separate binary search tree for the numbers within that segment. Therefore, each modification on the sequence takes O(logN)[on the segement]*O(logN)[on the BST]. Queries can be done like this, Let Q(a,b,x) be rank of x within segment [a,b]. Obviously, if Q(a,b,x) can be computed efficiently, a binary search on x can compute the answer desired effectively (with an extra O(logE) factor.
Q(a,b,x) can be computed as: find smallest number of segments that make up [a,b], which can be done in O(logN) on the segment tree. For each segment, query on the binary search tree for that segment for the number of elements less than x. Add all these numbers to get Q(a,b,x).
This should be O(logN*logE*logN). Well not exactly what you have asked for though.

In O(log n) time it's not possible to read all of the elements of the array. Since it's not sorted, and there's no other provided information, this is impossible.

There's no way you can do better than O(n) in both worst and average case. You have to look at every single element.

Related

How to find minimum pairing cost? (Any Language)

I came across an algorithm question recently and I still haven't been able to come up with a way to solve it. Can anyone help with pseudocode or logic?
Here is the question:
There are n elements in the array. N is an odd number. When we exclude 1 element from array, we are finding the minimum pairing cost.
Rules:
1 <= n <= 1001
n % 2 = 1
Example:
Given array is [4, 2, 1, 7, 8]
When we pair items with the closest ones [[1,2], [7,8]] and "4" is excluded.
So the minimum cost is |1 - 2| + |7 - 8| = 2;
What i tried:
Sort array first: [1,2,4,7,8]
Remove the middle element: 4
Pair items with the next ones: [[1, 2], [7, 8]]
According the example it works but what if the given array is [1, 7, 8, 16, 17]?
Sort array first: [1, 7, 8, 16, 17]
Remove the middle element: 8
Pair items with the next ones: [[1, 7], [16, 17]] Wrong Answer
"1" must be excluded and the pairs must be [[7, 8], [16, 17]]
Once the array is sorted, you can pair all elements from left to right, keep track of the total sum, and replace the last pairing with one starting from the right, updating the total sum if it's smaller.
In pseudo-code (all zero-based indexing):
let S be the sum of all pairing costs of
elements 2i and 2i+1 for i from 0 to (n-3)/2
(that is all pairings when you exclude the very last element)
let j = (n-1)/2
for i from (n-3)/2 to 0 (included):
let L be the pairing cost of elements 2i and 2i+1
let R be the pairing cost of elements 2i+1 and 2i+2
let S' = S - L + R
if S' < S
replace S with S'
replace j with i
2j is the element to exclude
Sorting the array first is a good start. Once you've done that, you have a choice of removing any value from index 1..N. A brute-force approach would be to calculate the pairing cost of omitting index 1, then recalculate omitting only index 2, and so on until you reach index N.
You'd be calculating many of the pairs over and over. To avoid that, consider that all the pairs to the left of your omitted index are paired odd-even (from the perspective of starting at element 1) and to the right of the omitted index will be even-odd. If you precalculate the sums of the left pairings and the sums of the right pairings into two arrays, you could determine the minimum cost at each position as the minimum sum of both values at each position of those two arrays.

Maximum Sum for Subarray with fixed cutoff

I have a list of integers, and I need to find a way to get the maximum sum of a subset of them, adding elements to the total until the sum is equal to (or greater than) a fixed cutoff. I know this seems similar to the knapsack, but I was unsure whether it was equivalent.
Sorting the array and adding the maximum element until sum <= cutoff does not work. Observe the following list:
list = [6, 5, 4, 4, 4, 3, 2, 2, 1]
cutoff = 15
For this list, doing it the naive way results in a sum of 15, which is very sub-optimal. As far as I can see, the maximum you could arrive at using this list is 20, by adding 4 + 4 + 4 + 2 + 6. If this is just a different version of knapsack, I can just implement a knapsack solution, as I probably have small enough lists to get away with this, but I'd prefer to do something more efficient.
First of all in any sum, you won't have produced a worse result by adding the largest element last. So there is no harm in assuming that the elements are sorted from smallest to largest as a first step.
And now you use a dynamic programming approach similar to the usual subset sum.
def best_cutoff_sum (cutoff, elements):
elements = sorted(elements)
sums = {0: None}
for e in elements:
next_sums = {}
for v, path in sums.iteritems():
next_sums[v] = path
if v < cutoff:
next_sums[v + e] = [e, path]
sums = next_sums
best = max(sums.keys())
return (best, sums[best])
print(best_cutoff_sum(15, [6, 5, 4, 4, 4, 3, 2, 2, 1]))
With a little work you can turn the path from the nested array it currently is to whatever format you want.
If your list of non-negative elements has n elements, your cutoff is c and your maximum value is v, then this algorithm will take time O(n * (k + v))

Find the number of elements greater than x in a given range

Given an array with n elements, how to find the number of elements greater than or equal to a given value (x) in the given range index i to index j in O(log n) complexity?
The queries are of the form (i, j, x) which means find number of elements greater than x from the ith till jth element in the array
The array is not sorted. i, j & x are different for different queries. Elements of the array are static.
Edit: i, j, x all can be different for different queries!
If we know all queries before hand, we can solve this problem by making use of Fenwick tree.
First, we need to sort all elements in array and queries together, based on their values.
So, assuming that we have array [5, 4, 2, 1, 3] and queries (0, 1, 6) and (2, 5, 2), we will have following result after sorting : [1, 2, 2, 3 , 4 , 5, 6]
Now, we will need to process each element in descending order:
If we encounter an element which is from the array, we will update its index in the Fenwick tree, which take O(log n)
If we encounter a queries, we need to check, in this range of the query, how many elements have been added in the tree, which take O(log n).
For above example, the process will be:
1st element is a query for value 6, as Fenwick tree is empty -> result is 0
2nd is element 5 -> add index 0 into Fenwick tree
3rd element is 4 -> add index 1 into tree.
4th element is 3 -> add index 4 into tree.
5th element is 2 -> add index 2 into tree.
6th element is query for range (2, 5), we query the tree and get answer 2.
7th element is 1 -> add index 3 into tree.
Finish.
So, in total, the time complexity for our solution is O((m + n) log(m + n)) with m and n is the number of queries and number of element from input array respectively.
That is possible only if you got the array sorted. In that case binary search the smallest value passing your condition and compute the count simply by sub-dividing your index range by its found position to two intervals. Then just compute the length of the interval passing your condition.
If array is not sorted and you need to preserve its order you can use index sort . When put together:
definitions
Let <i0,i1> be your used index range and x be your value.
index sort array part <i0,i1>
so create array of size m=i1-i0+1 and index sort it. This task is O(m.log(m)) where m<=n.
binary search x position in index array
This task is O(log(m)) and you want the index j = <0,m) for which array[index[j]]<=x is the smallest value <=x
compute count
Simply count how many indexes are after j up to m
count = m-j;
As you can see if array is sorted you got O(log(m)) complexity but if it is not then you need to sort O(m.log(m)) which is worse than naive approach O(m) which should be used only if the array is changing often and cant be sorted directly.
[Edit1] What I mean by Index sort
By index sort I mean this: Let have array a
a[] = { 4,6,2,9,6,3,5,1 }
The index sort means that you create new array ix of indexes in sorted order so for example ascending index sort means:
a[ix[i]]<=a[ix[i+1]]
In our example index bubble sort is is like this:
// init indexes
a[ix[i]]= { 4,6,2,9,6,3,5,1 }
ix[] = { 0,1,2,3,4,5,6,7 }
// bubble sort 1st iteration
a[ix[i]]= { 4,2,6,6,3,5,1,9 }
ix[] = { 0,2,1,4,5,6,7,3 }
// bubble sort 2nd iteration
a[ix[i]]= { 2,4,6,3,5,1,6,9 }
ix[] = { 2,0,1,5,6,7,4,3 }
// bubble sort 3th iteration
a[ix[i]]= { 2,4,3,5,1,6,6,9 }
ix[] = { 2,0,5,6,7,1,4,3 }
// bubble sort 4th iteration
a[ix[i]]= { 2,3,4,1,5,6,6,9 }
ix[] = { 2,5,0,7,6,1,4,3 }
// bubble sort 5th iteration
a[ix[i]]= { 2,3,1,4,5,6,6,9 }
ix[] = { 2,5,7,0,6,1,4,3 }
// bubble sort 6th iteration
a[ix[i]]= { 2,1,3,4,5,6,6,9 }
ix[] = { 2,7,5,0,6,1,4,3 }
// bubble sort 7th iteration
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
ix[] = { 7,2,5,0,6,1,4,3 }
So the result of ascending index sort is this:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
Original array stays unchanged only the index array is changed. Items a[ix[i]] where i=0,1,2,3... are sorted ascending.
So now if x=4 on this interval you need to find (bin search) which i has the smallest but still a[ix[i]]>=x so:
// ix: 0 1 2 3 4 5 6 7
a[] = { 4,6,2,9,6,3,5,1 }
ix[] = { 7,2,5,0,6,1,4,3 }
a[ix[i]]= { 1,2,3,4,5,6,6,9 }
// *
i = 3; m=8; count = m-i = 8-3 = 5;
So the answer is 5 items are >=4
[Edit2] Just to be sure you know what binary search means for this
i=0; // init value marked by `*`
j=4; // max power of 2 < m , i+j is marked by `^`
// ix: 0 1 2 3 4 5 6 7 i j i+j a[ix[i+j]]
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 4 4 5>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 0 2 2 3< 4 -> i+=j; j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 1 3 4>=4 j>>=1;
* ^
a[ix[i]]= { 1,2,3,4,5,6,6,9 } 2 0 -> stop
*
a[ix[i]] < x -> a[ix[i+1]] >= x -> i = 2+1 = 3 in O(log(m))
so you need index i and binary bit mask j (powers of 2). At first set i with zero and j with the biggest power of 2 still smaller then n (or in this case m). Fro example something like this:
i=0; for (j=1;j<=m;j<<=1;); j>>=1;
Now in each iteration test if a[ix[i+j]] suffice search condition or not. If yes then update i+=j else leave it as is. After that go to next bit so j>>=1 and if j==0 stop else do iteration again. at the end you found value is a[ix[i]] and index is i in log2(m) iterations which is also the number of bits needed to represent m-1.
In the example above I use condition a[ix[i]]<4 so the found value was biggest number still <4 in the array. as we needed to also include 4 then I just increment the index once at the end (I could use <=4instead but was too lazy to rewrite the whole thing again).
The count of such items is then just number of element in array (or interval) minus the i.
Previous answer describes an offline solution using Fenwick tree, but this problem could be solved online (and even when doing updates to the array) with slightly worse complexity. I'll describe such a solution using segment tree and AVL tree (any self-balancing BST could do the trick).
First lets see how to solve this problem using segment tree. We'll do this by keeping the actual elements of the array in every node by range that it covers. So for array A = [9, 4, 5, 6, 1, 3, 2, 8] we'll have:
[9 4 5 6 1 3 2 8] Node 1
[9 4 5 6] [1 3 2 8] Node 2-3
[9 4] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
Since height of our segment tree is log(n) and at every level we keep n elements, total amount of memory used is n log(n).
Next step is to sort these arrays which looks like this:
[1 2 3 4 5 6 8 9] Node 1
[4 5 6 9] [1 2 3 8] Node 2-3
[4 9] [5 6] [1 3] [2 8] Node 4-7
[9] [4] [5] [6] [1] [3] [2] [8] Node 8-15
NOTE: You first need to build the tree and then sort it to keep the order of elements in original array.
Now we can start our range queries and that works basically the same way as in regular segment tree, except when we find a completely overlapping interval, we then additionally check for number of elements greater than X. This can be done with binary search in log(n) time by finding the index of first element greater than X and subtracting it from number of elements in that interval.
Let's say our query was (0, 5, 4), so we do a segment search on interval [0, 5] and end up with arrays: [4, 5, 6, 9], [1, 3]. We then do a binary search on these arrays to see number of elements greater than 4 and get 3 (from first array) and 0 (from second) which brings to total of 3 - our query answer.
Interval search in segment trees can have up to log(n) paths, which means log(n) arrays and since we're doing binary search on each of them, brings complexity to log^2(n) per query.
Now if we wanted to update the array, since we are using segment trees its impossible to add/remove elements efficiently, but we can replace them. Using AVL trees (or other binary trees that allow replacement and lookup in log(n) time) as nodes and storing the arrays, we can manage this operation in same time complexity (replacement with log(n) time).
This is special variant of orthogonal range counting queries in 2D.
Each element el[i] is transformed into point on the plane (i, el[i])
and the query (i,j,x) can be transformed to count all points in the rectangle [i,j] x [x, +infty].
You can use 2D Range Trees (for example: http://www.cs.uu.nl/docs/vakken/ga/slides5b.pdf) for such type of the queries.
The simple idea is to have a tree that stores points in the leaves
(each leaf contains single point) ordered by X-axis.
Each internal node of the tree contains additional tree that stores all points from the subtree (ordered by Y-axis).
The used space is O(n logn)
Simple version could do the counting in O(log^2 n) time, but using
fractional cascading
this could be reduced to O(log n).
There better solution by Chazelle in 1988 (https://www.cs.princeton.edu/~chazelle/pubs/FunctionalDataStructures.pdf)
to O(n) preprocessing and O(log n) query time.
You can find some solutions with better query time, but they are way more complicated.
I would try to give you a simple approach.
You must have studied merge sort.
In merge sort we keep on dividing array into sub array and then build it up back but we dont store the sorted subarrays in this approach we store them as nodes of binary tree.
this takes up nlogn space and nlogn time to build up;
now for each query you just have to find the subarray this will be done in logn on average and logn^2 in worst case.
These tree are also known as fenwick trees.
If you want a simple code I can provide you with that.

How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?

The inputs are an array A of positive or null integers and another integer K.
We should partition A into K blocks of consecutive elements (by "partition" I mean that every element of A belongs to some block and 2 different blocks don't contain any element in common).
We define the sum of a block as sum of the elements of the block.
The goal is to find such a partition in K blocks such that the maximum of the sums of each block (let's call that "MaxSumBlock") is minimized.
We need to output the MaxSumBlock (we don't need to find an actual partition)
Here is an example:
Input:
A = {2, 1, 5, 1, 2, 2, 2}
K = 3
Expected output:
MaxSumBlock: 6
(with partition: {2, 1}, {5, 1}, {2, 2, 2})
In the expected output, the sums of each block are 3, 6 and 6. The max is 6.
Here is an non optimal partition:
partition: {2, 1}, {5}, {1, 2, 2, 2}
The sums of each block in that case are 3, 6 and 7. The max is hence 7. It is not a correct answer.
What algorithm solves this problem?
EDIT: K and the size of A is no bigger than 100'000. Each element of A is no bigger than 10'000
Use binary search.
Let max sum range from 0 to sum(array). So, mid = (range / 2). See if mid can be achieved by partitioning into k sets in O(n) time. If yes, go for lower range and if not, go for a higher range.
This will give you the result in O(n log n).
PS: if you have any problem with writing the code, I can help but I'd suggest you try it first yourself.
EDIT:
as requested, I'll explain how to find if mid can be achieved by partitioning into k sets in O(n) time.
Iterate through the elements till sum is less than or equal to mid. As soon as it gets greater than mid, let it be part of next set. If you get k or less sets, mid is achievable, else not.

Algorithm to find combination of n numbers with largest sum

Problem is simple -
Suppose I have an array of following numbers -
4,1,4,5,7,4,3,1,5
I have to find number of sets of k elements each that can be created from above numbers having largest sum. Two sets are considered to be different if they have at least one different element.
e.g.
if k = 2, then there can be two sets - {7,5} and {7,5}. Note: 5 appears twice in above array.
I think I can start with something like-
1. Sort array
2. Create two arrays. One for different number and an other in parallel for number's occurence.
But I am stuck now. Any suggestions?
The algorithm is as follows:
1) Sort elements in descending order.
2) Look at this array. It may look something like this:
a ... a b ... b c ... c d ...
| <- k -> |
Now obviously all elements a and b will be in the sets with the largest sum. You can't replace any of them with a smaller element, because then the sum wouldn't be the largest possible. So you have no choice here, you have to choose all a and b for any of the sets.
On the other hand only some of the elements c will be in those sets. So the answer is just the number of possibilities, to choose c's to fill the positions left in the sets, after you have taken all larger elements. That is the binomial coefficient:
count of c's choose (k - (count of elements larger than c))
For example for an array (already sorted here)
[9, 8, 7, 7, 5, 5, 5, 5, 4, 4, 2, 2, 1, 1, 1]
and k = 6, you must choose 9, 8 and both 7's for every set with the largest sum (which is 41). And then you can choose any two out of the four 5's. So the result will be 4 choose 2 = 6.
With the same array and k = 4, the result would be x choose 0 = 1 (that unique set is {9, 8, 7, 7}), with k = 7 the result would be 4 choose 3 = 4, and with k = 9: 2 choose 1 = 2 (choosing any 4 for the set with the largest sum).
EDIT: I edited the answer, because we figured it out that OP needs to count multisets.
First, find the largest k numbers in the array. This is of course easy, and if k is very small, you can do it in O(k) by performing k linear scans. If k is not so small, you can use a binary heap, or a priority queue or just sort the array to do that which is respectively O(n * log(k)) or O(n * log(n)) when using sorting.
Let assume that you have computed k largest numbers. Of course all sets of size k with the largest sum have to contain exactly these k largest numbers and no more other numbers. On the other hand, any different set doesn't have the largest sum.
Let count[i] be the number of occurrences of number i in the input sequence.
Let occ[i] be the number of occurrences of number i in the largest k numbers.
We can compute these both tables in very different ways, for example using a hash table or if input numbers are small, you can use an array indexed by these numbers.
Let B be the array of distinct numbers from the largest k numbers.
Let m be the size of B.
Now let's compute the answer. We will do it in m steps. After i-th step we will have computed the number of different multisets consisting of the first i numbers from B. At the beginning the result is 1 since there is only one empty multiset. In the i-th step, we will multiply the current result by the number of possible chooses of occ[B[i]] elements from count[B[i]] elements, which is equal to binomial(occ[i], count[i])
For example, let's consider your instance with added one more 7 at the end and k set to 3:
k = 3
A = [4, 1, 4, 5, 7, 4, 3, 1, 5, 7]
The largest three numbers in A are 7, 7, 5
At the beginning we have:
count[7] = 2
count[5] = 2
occ[7] = 2
occ[5] = 1
result = 1
B = [7, 5]
We start with the first element in B which is 7. Its count is 2 and its occ is also 2, so we do:
// binomial(2, 2) is 1
result = result * binomial(2, 2)
Next element in B is 5, its count is 2 and its occ is 1, so we do:
// binomial(2, 1) is 2
result = result * binomial(2, 1)
And the final result is 2, since there are two different multisets [7, 7, 5]
I'd create a sorted dictionary of the frequencies of occurrence of the numbers in the input. Then take the two largest numbers and multiply the number of times they occur.
In C++, it could look something like this:
std::vector<int> inputs { 4, 1, 4, 5, 7, 3, 1, 5};
std::map<int, int> counts;
for (auto i : inputs)
++counts[i];
auto last = counts.rbegin();
int largest_count = *last;
int second_count = *++last;
int set_count = largeest_count * second_count;
You can do the following:
1) Sort the elements in descending order;
2) define variable answer=1;
3) Start from the beginning of the array and for each new value you see, count the number of its occurrence (lets call this variable count). every time do: answer = answer * count. The pseudo-code should look like this.
find_count(Array A, K)
{
sort(A,'descending);
int answer=1;
int count=1;
for (int i=1,j=1; i<K && j<A.length;j++)
{
if(A[i] != A[i-1])
{
answer = answer *count;
i++;
count=1;
}
else
count++;
}
return answer;
}

Resources