Efficient algorithm to find kth largest numbers from N lists by picking one number each time from N lists - algorithm

There are given N lists of numbers. Every time one number will be picked from each list and all the picked numbers will be sorted. The k th largest of sorted numbers will be added to a set.
Finally the size of the set will be reported.
For Example
3 3
3 2 5 3
3 8 1 6
3 7 4 9
First integer is the no of lists N(From next line there are N lists. In this case it is 3, so next three lines have list values). Second integer is the k value.And first entry of the next N lines are the list size.
List values are list1 -> (2,5,3) , list2 ->(8,1,6), list3 ->(7,4,9)
Any number can be picked from the list. For example (2,8,7),(2,8,4),(2,8,9),(2,1,7),(2,1,4),(2,1,9)..etc are all valid combinations. From this combinations kth largest will be selected from each combination.
In this case the following numbers have the chance to be the 3 rd largest (since k=3)
(4,5,6,7,8,9)
The total count must be reported. So the output is 6
One way:
I am trying to find the permutation of all the list values, sort it and take the k th largest every time. In this way the complexity is high. For example 4 lists of sizes (10,12,15,20)= (10 *12 * 15 * 20) list values. So it will not fit in memory.
Is there any other efficient algorithm for this problem?

This is an interesting question , took a while to figure it out .
Make 2 max-heaps , h1 and h2 .
put 1st element of all lists at each time in h1 , and 1 element (maximum) from h1 to h2 and when size of h2 >=K ,
pop 1 element (maximum) from h2 and add it into your set .
Run on your case :
1) h1 = empty h2 = empty set=empty
2) h1 = 2 8 7 h2 = empty set=empty
3) h1 = 2 7 5 1 4 h2 = 8 set=empty
4) h1 = 2 5 1 4 3 6 9 h2 = 8 7 set=empty
5) h1 = 2 5 1 4 3 6 h2 = 8 7 9 set=empty
6) h1 = 2 5 1 4 3 h2 = 8 7 6 set=9
7) h1 = 2 1 4 3 h2 = 5 7 6 set=9 8
8) h1 = 2 1 3 h2 = 5 4 6 set=9 8 7
9) h1 = 2 1 h2 = 5 4 3 set=9 8 7 6
10) h1 = 1 h2 = 2 4 3 set=9 8 7 6 5
11) h1 = empty h2 = 2 1 3 set=9 8 7 6 5 4
h1 = empty , STOP.
Time complexity : O(N log N)

Related

Deleting element and getting it's neighbours

I have got a sequence 1 2 3 4 5 6 ... n. Now, I am given a sequence of n deletions - each deletion is a number which I want to delete. I need to respond to each deletion with two numbers - of a left and right neighbour of deleted number (-1 if any doesn't exists).
E.g. I delete 2 - I respond 1 3, then I delete 3 I respond 1 4 , I delete 6 I respond 5 -1 etc.
I want to do it fast - linear of linear-logarithmic time complexity.
What data structure should I use? I guess the key to the solution is the fact that the sequence is sorted.
A doubly-linked list will do fine.
We will store the links in two arrays, prev and next, to allow O(1) access for deletions.
First, for every element and two sentinels at the ends, link it to the previous and next integers:
init ():
for cur := 0, 1, 2, ..., n, n+1:
prev[cur] := cur-1
next[cur] := cur+1
When you delete an element cur, update the links in O(1) like this:
remove (cur):
print (num (prev[cur]), " ", num (next[cur]), newline)
prev[next[cur]] := prev[cur]
next[prev[cur]] := next[cur]
Here, the num wrapper is inserted to print -1 for the sentinels:
num (cur):
if (cur == 0) or (cur == n+1):
return -1
else:
return cur
Here's how it works:
prev next
n = 6 prev/ print 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
/next ------------------- -------------------
init () -1 0 1 2 3 4 5 6 1 2 3 4 5 6 7 8
remove (2) 1 3 1 3 -1 0 1 3 4 5 6 1 3 4 5 6 7 8
remove (3) 1 4 1 4 -1 0 1 4 5 6 1 4 5 6 7 8
remove (6) 5 7 5 -1 -1 0 1 4 5 1 4 5 7 8
remove (1) 0 4 -1 4 -1 0 4 5 4 5 7 8
remove (5) 4 7 4 -1 -1 0 4 4 7 8
remove (4) 0 7 -1 -1 -1 0 7 8
Above, the portions not used anymore are blanked out for clarity.
The respective elements of the arrays still store the values printed above them, but we no longer access them.
As Jim Mischel rightly noted (thanks!), storing the list in two arrays instead of dynamically allocating the storage is crucial to make this O(1) per deletion.
You can use a binary search tree. Deleting from it is logarithmic. If you want to remove n elements and the number of total elements is m, then the complexity of removing n elements from it will be
nlogm

I know how Merge Sort works, but How Merge Sort Code Works?

You can read this on Wikipedia:
function merge_sort(list m)
// Base case. A list of zero or one elements is sorted, by definition.
if length(m) <= 1
return m
// Recursive case. First, *divide* the list into equal-sized sublists.
var list left, right
var integer middle = length(m) / 2
for each x in m before middle
add x to left
for each x in m after or equal middle
add x to right
// Recursively sort both sublists
left = merge_sort(left)
right = merge_sort(right)
// Then merge the now-sorted sublists.
return merge(left, right)
On line 1 there's a list of numbers, let's say 9 6 3 7 5 1 8 2
They say that merge_sort divides the list on 2 and 2 again and again until each list has only 1 integer left, like this one:
9 6 3 7 5 1 8 2 -->
9 6 3 7 - 5 1 8 2 -->
9 6 - 3 7 - 5 1 - 8 2 -->
9 - 6 - 3 - 7 - 5 - 1 - 8 - 2
And then the numbers are put together like this:
6 9 - 3 7 - 1 5 - 2 8 -->
3 6 7 9 - 1 2 5 8 -->
1 2 3 5 6 7 8 9 -->
But I don't see where in the code the list of integers are divided on 2 again and again until each has only 1 integer left?
var list left, right
var integer middle = length(m) / 2
for each x in m before middle
add x to left
for each x in m after or equal middle
add x to right
As I understand, on the code above, the list of numbers is divided to two different lists:
9 6 3 7 and 5 1 8 2
What then happens on the code below?
left = merge_sort(left)
right = merge_sort(right)
Can someone explain me how the merge_sort code above exactly works step by step?
But I don't see where in the code the list of integers are divided on 2 again and again until each has only 1 integer left?
var list left, right
var integer middle = length(m) / 2 --------statement-1
for each x in m before middle --------statement-2
add x to left
for each x in m after or equal middle --------statement-3
add x to right
At the statement-1 you divide the array into two parts and add them to the left and right sub-array. In the statement-2, you are adding all the element before middle, which is your middle element of the array. Similarly statement-3, you are adding rest of the element in right sub-array. So essentially, you keep on dividing the array in two parts until their size is 1 or 0.
if length(m) <= 1
return m
In the start you have above conditional check, which return the method call if the size of the array is less then or equal to one.
What then happens on the code below?
left = merge_sort(left)
right = merge_sort(right)
This is a recursive call to sort (divide the array until size is one) the each sub array. Which is created in the above pseudo-code. You sort left and right sub-array separately and then join them into a single array.
return merge(left, right)
Here both left and right sub-array are passed to a merge function. These both array are sorted array. The task of the merge function is merge these sub-array into a single sorted array.
The pseudo code is missing some details. There was debate on the talk page about removing it or fixing it. Note it's supposed to be working with a list, not an array, which is why elements can only be appended one at a time. The list is not really split into 2 parts; instead two new initially empty lists left and right are created, then (middle = length/2) elements are moved from list to left, then (length - middle) elements are moved from list to right. This cleaned up example with C++ comments may make more sense, but it's still an inefficient way to sort a list. A bottom up merge sort using an array of pointers is much more efficient. I can add example code here if anyone is interested.
var list left, right
var integer middle = length(m) / 2
var integer count
for (count = 0; count < middle; count += 1)
get x from front of list // x = *list.front()
remove first element from list // list.pop_front()
add x to left // left.push_back(x)
for (count = middle; count < length; count += 1)
get x from front of list // x = *list.front()
remove first element from list // list.pop_front()
add x to right // right.push_back(x)
In that same wiki article, there are two C / C++ like code examples, which should be easier to understand. The examples are simplified and copy data back to the original array after each merge step, which could be avoided with more optimized code.
http://en.wikipedia.org/wiki/Merge_sort#Top-down_implementation
http://en.wikipedia.org/wiki/Merge_sort#Bottom-up_implementation
The sequence is different for top down merge sort, it's depth first, left first:
9 6 3 7 5 1 8 2
9 6 3 7|5 1 8 2
9 6|3 7
9|6
6 9
3|7
3 7
3 6 7 9
5 1|8 2
5|1
1 5
8|2
2 8
1 2 5 8
1 2 3 5 6 7 8 9
Bottom up merge sort skips the recursion and just starts off assuming a run size of 1, and merges width first, left to right:
9 6 3 7 5 1 8 2
9|6|3|7|5|1|8|2 run size = 1
6 9|3 7|1 5|2 8 run size = 2
3 6 7 9|1 2 5 8 run size = 4
1 2 3 5 6 7 8 9 done
Another example of bottom up merge sort algorithm:
http://www.mathcs.emory.edu/~cheung/Courses/171/Syllabus/7-Sort/merge-sort5.html

Partitioning a circular buffer while keeping order

I've got a circular buffer with positive natural values, e.g.
1 5
4 2
11 7
2 9
We're going to partition it into exactly two continuous parts, while keeping this order. These two parts in this example could be:
(4 1 5) and (2 7 9 2 11),
(7 9 2 11 4) and (1 5 2),
etc.
The idea is to keep order and take two continuous subsequences.
And the problem now is to partition it so that the sums of these subsequences are closes to each other, i.e. the difference between the sums must be closest to zero.
In this case, I believe the solution would be: (2 7 9 2) and (11 4 1 5) with sums, respectively, 20 and 21.
How to do this optimally?
Algorithm:
Calculate the total sum.
Let the current sum = 0.
Start off with 2 pointers at any point (both starting off at the same point).
Increase the second pointer, adding the number it passed, until the current sum is more than half of the total sum.
Increase the first pointer, subtracting the number it passed, until the current sum is less than half of the total sum.
Stop if either:
The first pointer is back where it started, or
The best sum is 0.5 or 0 from half the total sum (in which case the difference will be 1 or 0).
The difference can be 1 only if the total sum is odd, in which case the difference can never be 0. (Thanks Artur!)
Otherwise repeat from step 3.
Check all the current sums we got in this process and keep the one that's closest to half, along with indices of the partition that got that sum.
Running time:
The running time will be O(n), since we only ever increase the pointers and the first one only goes around once, and the second one can't go around more than twice.
Example:
Input:
1 5
4 2
11 7
2 9
Total sum = 41.
Half of sum = 20.5.
So, let's say we start off at 1. (I just put it on a straight line to make it easier to draw)
p1, p2
V
1 5 2 7 9 2 11 4
sum = 0
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 1
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 6
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 8
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 15
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 24
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 23
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 18
p1 p2
V V
1 5 2 7 9 2 11 4
sum = 20
Here the sum (20) is 0.5 from half the total sum (20.5), so we can stop.
The above corresponds to (11 4 1 5) (2 7 9 2), with a difference in sums of 1.

Matlab: removing rows when there are repeated values in columns

I have a problem with removing the rows when the columns are identical.
I have used a for and if loop but the run time is too long.
I was thinking if there are any more efficient and faster run time method.
say
A=[ 2 4 6 8;
3 9 7 9;
4 8 7 6;
8 5 4 6;
2 10 11 2]
I would want the result to be
A=[ 2 4 6 8;
4 8 7 6;
8 5 4 6]
eliminating the 2nd row because of the repeated '9' and remove the 5th row because of repeated '2'.
You can use sort and diff to identify the rows with repeated values
A = A(all(diff(sort(A'))),:)
returns
A =
2 4 6 8
4 8 7 6
8 5 4 6
The trick here is how to find the rows with repeated values in an efficient manner.
How about this:
% compare all-vs-all for each row using `bsxfun`
>> c = bsxfun( #eq, A, permute( A, [1 3 2] ) );
>> c = sum( c, 3 ); % count the number of matches each element has in the row
>> c = any( c > 1, 2 ); % indicates rows with repeated values - an element with more than one match
>> A = A( ~c, : )
A =
2 4 6 8
4 8 7 6
8 5 4 6

How can I sort a 2-D array in MATLAB with respect to 2nd row?

I have array say "a"
a =
1 4 5
6 7 2
if i use function
b=sort(a)
gives ans
b =
1 4 2
6 7 5
but i want ans like
b =
5 1 4
2 6 7
mean 2nd row should be sorted but elements of ist row should remain unchanged and should be correspondent to row 2nd.
sortrows(a',2)'
Pulling this apart:
a = 1 4 5
6 7 2
a' = 1 6
4 7
5 2
sortrows(a',2) = 5 2
1 6
4 7
sortrows(a',2)' = 5 1 4
2 6 7
The key here is sortrows sorts by a specified row, all the others follow its order.
You can use the SORT function on just the second row, then use the index output to sort the whole array:
[junk,sortIndex] = sort(a(2,:));
b = a(:,sortIndex);
How about
a = [1 4 5; 6 7 2]
a =
1 4 5
6 7 2
>> [s,idx] = sort(a(2,:))
s =
2 6 7
idx =
3 1 2
>> b = a(:,idx)
b =
5 1 4
2 6 7
in other words, you use the second argument of sort to get the sort order you want, and then you apply it to the whole thing.

Resources