Data structure for storing data and calculating average - data-structures

In this problem, we are interested in a data structure that supports keeping infinite numbers of Y axis parallel vectors.
Each node contains location (X axis value) and height (Y axis value). We can assume there are no two vectors in the same location.
Please advise for an efficient data structure that supports:
init((x1,y1)(x2,y2)(x3,y3)...(xn,yn)) - the DS will contain all n vectors, while VECTOR#i's location is xi VECTOR#i's hieght is yi.
We also know that x1 < x2 < x3 < ... < xn (nothing is known about the y) - complexity = O(n) on average
insert(x,y) - add vector with location x and height y. - complexity = O(logn) amortized on average.
update(x,y) - update vector#x's height to y. - complexity = O(logn) worst case
average_around(x) - return the heights average of logn neighbors of x - complexity = O(1) on average
Space Complexity: O(n)

I can't provide a full answer, but it might be a hint into the right direction.
Basic ideas:
Let's assume you've calculated the average of n numbers a_1,...,a_n, then this average is avg=(a_1+...+a_n)/n. If we now replace a_n by b, we can recalculate the new average as follows: avg'=(a_1+...+a_(n-1)+b)/n, or - simpler - avg'=((avg*n)-a_n+b)/n. That means, if we exchange one element, we can recompute the average using the original average value by simple, fast operations, and don't need to re-iterate over all elements participating in the average.
Note: I assume that you want to have log n neighbours on each side, i.e. in total we have 2 log(n) neighbours. You can simply adapt it if you want to have log(n) neighbours in total. Moreover, since log n in most cases won't be a natural number, I assume that you are talking about floor(log n), but I'll just write log n for simplicity.
The main thing I'm considering is the fact that you have to tell the average around element x in O(1). Thus, I suppose you have to somehow precompute this average and store it. So, i would store in a node the following:
x value
y value
average around
Note that update(x,y) runs strictly in O(log n) if you have this structure: If you update element x to height y, you have to consider the 2log(n) neighbours whose average is affected by this change. You can recalculate each of these averages in O(1):
Let's assume, update(x,y) affects an element b, whose average is to be updated as well. Then, you simply multiply average(b) by the number of neighbours (2log(n) as stated above). Then, we subtract the old y-value of element x, and add the new (updated) y-value of x. After that, we divide by 2 log(n). This ensures that we now have the updated average for element b. This involved only some calculations and can thus be done in O(1). Since we have 2log n neighbours, update runs in O(2log n)=O(log n).
When you insert a new element e, you have to update the average of all elements affected by this new element e. This is essentially done like in the update routine. However, you have to be careful when log n (or precisely floor(log n)) changes its value. If floor(log n) stays the same (which it will, in most cases), then you can just do the analogue things described in update, however you will have to "remove" the height of one element, and "add" the height of the newly added element. In these "good" cases, run time is again strictly O(log n).
Now, when floor(log n) is changing (incrementing by 1), you have to perform an update for all elements. That is, you have to do an O(1) operation for n elements, resulting in a running time of O(n). However, it is very seldom the case that floor(log n) increments by 1 (you need to double the value of n to increment floor(log n) by 1 - assuming we are talking about log to base 2, which is not uncommon in computer science). We denote this time by c*n or simply cn.
Thus, let's consider a sequence of inserts: The first insert needs an update: c*1, the second insert needs an update: 2*c. The next time an expensive insert occurs, is the fourth insert: 4*c, then the eight insert: 8c, the sixtenth insert: 16*c. The distance between two expensive inserts is doubled each time:
insert # 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 ..
cost 1c 2c 1 4c 1 1 1 8c 1 1 1 1 1 1 1 16c 1 1 ..
Since no remove is required, we can continue with our analysis without any "special cases" and consider only a sequence of inserts. You see that most inserts cost 1, while few are expensive (1,2,4,8,16,32,...). So, if we have m inserts in total, we have roughly log m expensive inserts, and roughly m-log m cheap inserts. For simplicity, we assume simply m cheap inserts.
Then, we can compute the cost for m inserts:
log m
----
\ i
m*1 + / 2
----
i=0
m*1 counts the cheap operations, the sum the expensive ones. It can be shown that the whole thing is at most 4m (in fact you can even show better estimates quite easily, but for us this suffices).
Thus, we see that m insert operations cost at most 4m in total. Thus, a single insert operation costs at most 4m/m=4, thus is O(1) amortized.
So, there are 2 things left:
How to store all the entries?
How to initialize the data structure in O(n)?
I suggest storing all entries in a skip-list, or some tree that guarantees logarithmic search-operations (otherwise, insert and update require more than O(log n) for finding the correct position). Note that the data structure must be buildable in O(n) - which should be no big problem assuming the elements are sorted according to their x-coordinate.
To initialize the data structure in O(n), I suggest beginning at element at index log n and computing its average the simple way (sum up, the 2log n neighbours, divide by 2 log n).
Then you move the index one further and compute average(index) using average(index-1): average(index)=average(index-1)*log(n)-y(index-1-log(n))+y(index+log(n)).
That is, we follow a similar approach as in update. This means that computing the averages costs O(log n + n*1)=O(n). Thus, we can compute the averages in O(n).
Note that you have to take some details into account which I haven't described here (e.g. border cases: element at index 1 does not have log(n) neighbours on both sides - how do you proceed with this?).

Related

algorithm of finding max of min in any range for two arrays

Let's say we have two arrays of ints of equal length, a_1, ..., a_n and b_1, ..., b_n. For any given index pairs i and j with 1<=i<j<=n, we need to find the max of the min for any sequence of the form a_k, ..., a_{l-1}, b_l, ..., b_{j-i+k} with 0<=k<=n-j+i and l can be j-i+k+1, i.e. that sequence is purely from array a. When k=0, the sequence is purely from array b.
We want to do this for all pairs of i and j very efficiently.
Example, given
`a=[3,2,4,1]` and `b=[4,6,1,3]`
when `i=1, j=3`, the sequence can be
`[3,2,4]`, min is 2
`[3,2,1]`, min is 1
`[3,6,1]`, min is 1
`[2,4,1]`, min is 1
`[2,4,3]`, min is 2
`[2,1,3]`, min is 1
`[4,6,1]`, min is 1
`[6,1,3]`, min is 1
So the max is 2 for this input.
Is there a good way to run this efficiently?
It's seems possible to make the brute force approach run fairly quickly.
If you preprocess each sequence into a balanced tree where each node is augmented with the min of that subtree, then you can find the min of any subrange of that sequence in O(log n) time by splitting the tree at the appropriate points. See, for example, this paper for more information. Note that this preprocessing takes O(n) time.
Let's call the range (i,j) a window. The complexity of this problem doesn't depend on the specific (i,j), but rather the size of the window (that is, j-i+1). For a window size of m (=j-i+1), there are n-m+1 windows of that size. For each window, there are m+1 places where you can "cut" the window so that some prefix of the elements come from from sequence a and the suffix comes from sequence b. You pay O(log n) for each cut (to split the binary trees as I mentioned above). That's a total cost of O((n-m+1) * (m+1) * log(n)).
There is probably a faster way to do this, by reusing splits, or by noticing that nearby windows share a lot of elements. But regardless, I think the binary tree splitting trick I mentioned above might be helpful!

Interval tree of an an array with update on array

Given a array of size N, and an array of intervals also of size N, each a contiguous segment of the first array, I need to handle Q queries that update the elements of the array and that ask for the sum of an segment in the second array (sum of the elements in the iTH interval to the jTH interval).
Now, the first query can be handled easily. I can build a segment tree from the array. I can use it to calculate the sum of an interval in the first array (an element in the second array). But how can i handle the second query in O(log n)? In the worst case, the element I update will be in all the intervals in the second array.
I need a O(Qlog N) or O(Q(logN)^2) solution.
Here is an O((Q + N) * sqrt(Q)) solution(it is based on a pretty standard idea of sqrt-decomposition):
1. Let's assume that the array is never updated. Then the problem becomes pretty easy: using prefix sums, it is possible to solve this problem in O(N) time for precomputation and O(1) per query(we need 2 prefix sum arrays here: one for the original array and the other one for the array of intervals).
2. Now let's divide our queries into blocks of size sqrt(Q). In the beginning of the each block, we can do the same thing as in 1. taking into accounts only those updates that happened before the beginning of this block. It can be done in linear time(using prefix sums twice). The total number of such computations is Q / sqrt(Q) = sqrt(Q) times(because it is the number of blocks we have). So far, it gives us O((N + Q) * sqrt(Q)) time in total.
3. When we get the query of type 2, all the updates that are outside the current block are already considered. So there are at most sqrt(Q) updates that could affect the answer. So let's process them almost naively: iterate over all updates within the current block that happened before this query and update the answer. To do this, we need to know how many times a given position in the array is present in the intervals from i to j. This part can be solved offline with sweep line algorithm using O(Q * sqrt(N + Q)) time and space(additional log factor does not appear because radix sort can be used).
So we get O((N + Q) * sqrt(Q)) time and space in the worst case in total. It is worse than O(Q * log N), of course, but should work fine for about 10^5 queries and array elements.

set with average O(1) for add/remove and worst max/min O(log n)

Can I have a set where average add/remove operation is O(1) (this is tipical for hashtable-based sets) and worst max/min is less then O(n), probably O(log n) (typical for tree-based sets)?
upd hmm it seems in the simplest case I can just rescan ALL N elements every time max/min disappear and in general it gives me O(1). But i apply my algorithm to stock trading where changes near min/max are much more likely so I just don't want to rescan everything every time max or min disappear, i need something smarter than full rescan which gives O(n).
upd2 In my case set contains 100-300 elements. Changes of max/min elements are very likely, so max/min changes often. And I need to track max/min. I still want O(1) for add/remove.
Here's an impossibility result with bad constants for worst-case, non-amortized bounds in a deterministic model where keys can be compared and hashed but nothing else. (Yes, that's a lot of stipulations. I second the recommendation of a van Emde Boas tree.)
As is usually the case with comparison lower bounds, this is an adversary argument. The adversary's game plan is to insert many keys while selectively deleting the ones about which the algorithm has the most information. Eventually, the algorithm will be unable to handle a call to max().
The adversary decides key comparisons as follows. Associated with each key is a binary string. Each key initially has an empty string. When two keys are compared, their strings are extended minimally so that neither is a prefix of the other, and the comparison is decided according to the dictionary order. For example, with keys x, y, z, we could have:
x < y: string(x) is now 0, string(y) is now 1
x < z: string(z) is now 1
y < z: string(y) is now 10, string(z) is now 11.
Let k be a worst-case upper bound on the number of key comparisons made by one operation. Each key comparison increases the total string length by at most two, so for every sequence of at most 3 * n insert/delete operations, the total string length is at most 6 * k * n. If we insert 2 * n distinct keys with interspersed deletions whenever there is a key whose string has length at least 6 * k, then we delete at most n keys on the way to a set with at least n keys where each key has a string shorter than 6 * k bits.
Extend each key's string arbitrarily to 6 * k bits. The (6 * k)-bit prefix of a key's string is the bucket to which the key belongs. Right now, the algorithm has no information about the relative order of keys within a bucket. There are 2 ** (6 * k) buckets, which we imagine laid out left to right in the increasing order dictated by the (6 * k)-bit prefixes. For n sufficiently large, there exists a bucket with a constant (depending on k) fraction of the keys and at least 2 * k times as many keys as the combined buckets to its right. Delete the latter keys, and max() requires a linear number of additional comparisons to sort out the big bucket that now holds the maximum, because at most a little more than half of the required work has been done by the deletions.
Well, you know that max/min < CONST, and the elements are all numbers. Based on this you can get O(1) insertion and O(k+n/k) find min/max 1.
Have an array of size k, each element in the array will be a hash set. At insertion, insert an element to array[floor((x-MIN)/MAX-MIN)*k)] (special case for x=MAX). Assuming uniform distribution of elements, that means each hash set has an expected number of n/k elements.
At deletion - remove from the relevant set similarly.
findMax() is now done as follows: find the largest index where the set is not empty - it takes O(k) worst case, and O(n/k) to find maximal element in the first non empty set.
Finding optimal k:
We need to minimize k+n/k.
d(n+n/k)/dk = 1-n/k^2 = 0
n = k^2
k = sqrt(n)
This gives us O(sqrt(n) + n/sqrt(n)) = O(sqrt(n)) find min/max on average, with O(1) insertion/deletion.
From time to time you might need to 'reset' the table due to extreme changes of max and min, but given a 'safe boundary' - I believe in most cases this won't be an issue.
Just make sure your MAX is something like 2*max, and MIN is 1/2*min every time you 'reset' the DS.
(1) Assuming all elements are coming from a known distribution. In my answer I assume a uniform distribution - so P(x)=P(y) for each x,y, but it is fairly easy to modify it to any known distribution.

Finding the minimum element in a given range greater than a given number

We are given N (N <= 106) points on a 2D plane and an integer D (N <= 106), we want to find two points p1,p2 (p2 to the right of p1) such that the difference between p1.y and p2.y is at least D and p2.x - p1.x is minimized.
The x and y axis are in the range 0..106
This is a problem from a USACO past contest.
Here is my attempt to solve it:
MAXY = The maximum y axis among the N points.
Suppose we know p1, then we can find p2 quite easily; by taking all the points which have their y-axis in the range p1.y+D to MAXY or in the range 0 to p1.y-D and take the point which has the smallest x-axis greater than p.x. This will be the best choice for p2.
But as we don't know p1, we will have to try all points for p1 and so finding the best choice for p2 should be done efficiently.
I used a segment tree for that. Every node in the tree will store all the points in the corresponding range in sorted order of x-axis. While querying, if a node falls in the query range then we binary search on the array for p1.x and return the smallest element greater than it.
For every choice of p1, we query the tree twice with the ranges 0,p1.y-D and p1.y+D,MAXY and take the best of the two points returned.
The building of the tree can be done in O(NlogN) time.
Every query will take O(logN*logN) time, and we make N queries, so the total time taken is (Nlogn*logn) which might not run within the time limits of 2 seconds. (106*20*20).
Also the memory taken will be O(NlogN) which is about 80 mb (100000*20*4 kb) which is too much as the limit is 64 mb.
How can we do the queries faster and using lesser space?
It can be done much easier.
Suppose you have two copies of the array: one sorted by Y-axis and another by X-axis. Now you'll iterate through the Y-sorted array and for each point (let name it cur) you should binary search an appropriate point (with the smallest p2.x - p1.x) in the X-sorted array. In case of binary search will find the same point or the point with Y-coordinate less than cur+D you should just delete that point from X-sorted array (we'll never need that point in X-sorted array again because we only increase Y-coordinate) and run binary search again. The answer will be the smallest of the binary search results.
As we need fast timing we should erase points from array quickly. It can be done by using binary tree - it can erase any node in O(logN) time and can do binary search in O(logN) time. As you delete each node from the tree only once and it takes O(logN + logN) time - total time will be O(N * logN). Preprocesssing time is O(N * logN) too. Also the memory taken will be O(N).
By the way your solution is also appropriate because actual N is 10^5 not 10^6. Which allows your solution to keep timing below 2 seconds, and to use less than 20MB of memory.
How about just do Sort & Scan.
Sort by x since you want to find the minimum difference by x. It take O(N logN) time and in place.
Maintain two index i and j from the head of x.
The faster goes first, find the position of |P[i].y - P[j].y| > D
and the X = |P[i].x - P[j].x| is your first available choice.
Then updating X by moving the to index forward. Try P[i+1], scan from P[i+2] as P[j] and increase until |P[i].x - P[j].x| >= X. If there is an available new X, set this as X.
This might make do lot of compare at first. But since you updating your X, somehow will make your compare range shrinking.

Find the largest k numbers in k arrays stored across k machines

This is an interview question. I have K machines each of which is connected to 1 central machine. Each of the K machines have an array of 4 byte numbers in file. You can use any data structure to load those numbers into memory on those machines and they fit. Numbers are not unique across K machines. Find the K largest numbers in the union of the numbers across all K machines. What is the fastest I can do this?
(This is an interesting problem because it involves parallelism. As I haven't encountered parallel algorithm optimization before, it's quite amusing: you can get away with some ridiculously high-complexity steps, because you can make up for it later. Anyway, onto the answer...)
> "What is the fastest I can do this?"
The best you can do is O(K). Below I illustrate both a simple O(K log(K)) algorithm, and the more complex O(K) algorithm.
First step:
Each computer needs enough time to read every element. This means that unless the elements are already in memory, one of the two bounds on the time is O(largest array size). If for example your largest array size varies as O(K log(K)) or O(K^2) or something, no amount of algorithmic trickery will let you go faster than that. Thus the actual best running time is O(max(K, largestArraySize)) technically.
Let us say the arrays have a max length of N, which is <=K. With the above caveat, we're allowed to bound N<K since each computer has to look at each of its elements at least once (O(N) preprocessing per computer), each computer can pick the largest K elements (this is known as finding kth-order-statistics, see these linear-time algorithms). Furthermore, we can do so for free (since it's also O(N)).
Bounds and reasonable expectations:
Let's begin by thinking of some worst-case scenarios, and estimates for the minimum amount of work necessary.
One minimum-work-necessary estimate is O(K*N/K) = O(N), because we need to look at every element at the very least. But, if we're smart, we can distribute the work evenly across all K computers (hence the division by K).
Another minimum-work-necessary estimate is O(N): if one array is larger than all elements on all other computers, we return the set.
We must output all K elements; this is at least O(K) to print them out. We can avoid this if we are content merely knowing where the elements are, in which case the O(K) bound does not necessarily apply.
Can this bound of O(N) be achieved? Let's see...
Simple approach - O(NlogN + K) = O(KlogK):
For now let's come up with a simple approach, which achieves O(NlogN + K).
Consider the data arranged like so, where each column is a computer, and each row is a number in the array:
computer: A B C D E F G
10 (o) (o)
9 o (o) (o)
8 o (o)
7 x x (x)
6 x x (x)
5 x ..........
4 x x ..
3 x x x . .
2 x x . .
1 x x .
0 x x .
You can also imagine this as a sweep-line algorithm from computation geometry, or an efficient variant of the 'merge' step from mergesort. The elements with parentheses represent the elements with which we'll initialize our potential "candidate solution" (in some central server). The algorithm will converge on the correct o responses by dumping the (x) answers for the two unselected os.
Algorithm:
All computers start as 'active'.
Each computer sorts its elements. (parallel O(N logN))
Repeat until all computers are inactive:
Each active computer finds the next-highest element (O(1) since sorted) and gives it to the central server.
The server smartly combines the new elements with the old K elements, and removes an equal number of the lowest elements from the combined set. To perform this step efficiently, we have a global priority queue of fixed size K. We insert the new potentially-better elements, and bad elements fall out of the set. Whenever an element falls out of the set, we tell the computer which sent that element to never send another one. (Justification: This always raises the smallest element of the candidate set.)
(sidenote: Adding a callback hook to falling out of a priority queue is an O(1) operation.)
We can see graphically that this will perform at most 2K*(findNextHighest_time + queueInsert_time) operations, and as we do so, elements will naturally fall out of the priority queue. findNextHighest_time is O(1) since we sorted the arrays, so to minimize 2K*queueInsert_time, we choose a priority queue with an O(1) insertion time (e.g. a Fibonacci-heap based priority queue). This gives us an O(log(queue_size)) extraction time (we cannot have O(1) insertion and extraction); however, we never need to use the extract operation! Once we are done, we merely dump the priority queue as an unordered set, which takes O(queue_size)=O(K) time.
We'd thus have O(N log(N) + K) total running time (parallel sorting, followed by O(K)*O(1) priority queue insertions). In the worst case of N=K, this is O(K log(K)).
The better approach - O(N+K) = O(K):
However I have come up with a better approach, which achieves O(K). It is based on the median-of-median selection algorithm, but parallelized. It goes like this:
We can eliminate a set of numbers if we know for sure that there are at least K (not strictly) larger numbers somewhere among all the computers.
Algorithm:
Each computer finds the sqrt(N)th highest element of its set, and splits the set into elements < and > it. This takes O(N) time in parallel.
The computers collaborate to combine those statistics into a new set, and find the K/sqrt(N)th highest element of that set (let's call it the 'superstatistic'), and note which computers have statistics < and > the superstatistic. This takes O(K) time.
Now consider all elements less than their computer's statistics, on computers whose statistic is less than the superstatistic. Those elements can be eliminated. This is because the elements greater than their computer's statistic, on computers whose statistic is larger than the superstatistic, are a set of K elements which are larger. (See the visual here).
Now, the computers with the uneliminated elements evenly redistribute their data to the computers who lost data.
Recurse: you still have K computers, but the value of N has decreased. Once N is less than a predetermined constant, use the previous algorithm I mentioned in "simple approach - O(NlogN + K)"; except in this case, it is now O(K). =)
It turns out that the reductions are O(N) total (amazingly not order K), except perhaps the final step which might by O(K). Thus this algorithm is O(N+K) = O(K) total.
Analysis and simulation of O(K) running time below. The statistics allow us to divide the world into four unordered sets, represented here as a rectangle divided into four subboxes:
------N-----
N^.5
________________
| | s | <- computer
| | #=K s REDIST. | <- computer
| | s | <- computer
| K/N^.5|-----S----------| <- computer
| | s | <- computer
K | s | <- computer
| | s ELIMIN. | <- computer
| | s | <- computer
| | s | <- computer
| |_____s__________| <- computer
LEGEND:
s=statistic, S=superstatistic
#=K -- set of K largest elements
(I'd draw the relation between the unordered sets of rows and s-column here, but it would clutter things up; see the addendum right now quickly.)
For this analysis, we will consider N as it decreases.
At a given step, we are able to eliminate the elements labelled ELIMIN; this has removed area from the rectangle representation above, reducing the problem size from K*N to , which hilariously simplifies to
Now, the computers with the uneliminated elements redistribute their data (REDIST rectangle above) to the computers with eliminated elements (ELIMIN). This is done in parallel, where the bandwidth bottleneck corresponds to the length of the short size of REDIST (because they are outnumbered by the ELIMIN computers which are waiting for their data). Therefore the data will take as long to transfer as the long length of the REDIST rectangle (another way of thinking about it: K/√N * (N-√N) is the area, divided by K/√N data-per-time, resulting in O(N-√N) time).
Thus at each step of size N, we are able to reduce the problem size to K(2√N-1), at the cost of performing N + 3K + (N-√N) work. We now recurse. The recurrence relation which will tell us our performance is:
T(N) = 2N+3K-√N + T(2√N-1)
The decimation of the subproblem size is much faster than the normal geometric series (being √N rather than something like N/2 which you'd normally get from common divide-and-conquers). Unfortunately neither the Master Theorem nor the powerful Akra-Bazzi theorem work, but we can at least convince ourselves it is linear via a simulation:
>>> def T(n,k=None):
... return 1 if n<10 else sqrt(n)*(2*sqrt(n)-1)+3*k+T(2*sqrt(n)-1, k=k)
>>> f = (lambda x: x)
>>> (lambda n: T((10**5)*n,k=(10**5)*n)/f((10**5)*n) - T(n,k=n)/f(n))(10**30)
-3.552713678800501e-15
The function T(N) is, at large scales, a multiple of the linear function x, hence linear (doubling the input doubles the output). This method, therefore, almost certainly achieves the bound of O(N) we conjecture. Though see the addendum for an interesting possibility.
...
Addendum
One pitfall is accidentally sorting. If we do anything which accidentally sorts our elements, we will incur a log(N) penalty at the least. Thus it is better to think of the arrays as sets, to avoid the pitfall of thinking that they are sorted.
Also we might initially think that with the constant amount of work at each step of 3K, so we would have to do work 3Klog(log(N)) work. But the -1 has a powerful role to play in the decimation of the problem size. It is very slightly possible that the running time is actually something above linear, but definitely much smaller than even Nlog(log(log(log(N)))). For example it might be something like O(N*InverseAckermann(N)), but I hit the recursion limit when testing.
The O(K) is probably only due to the fact that we have to print them out; if we are content merely knowing where the data is, we might even be able to pull off an O(N) (e.g. if the arrays are of length O(log(K)) we might be able to achieve O(log(K)))... but that's another story.
The relation between the unordered sets is as follows. Would have cluttered things up in explanation.
.
_
/ \
(.....) > s > (.....)
s
(.....) > s > (.....)
s
(.....) > s > (.....)
\_/
v
S
v
/ \
(.....) > s > (.....)
s
(.....) > s > (.....)
s
(.....) > s > (.....)
\_/
Find the k largest numbers on each machine. O(n*log(k))
Combine the results (on a centralized server, if k is not huge, otherwise you can merge them in a tree-hierarchy accross the server cluster).
Update: to make it clear, the combine step is not a sort. You just pick the top k numbers from the results. There are many ways to do this efficiently. You can use a heap for example, pushing the head of each list. Then you can remove the head from the heap and push the head from the list the element belonged to. Doing this k times gives you the result. All this is O(k*log(k)).
Maintain a min heap of size 'k' in the centralized server.
Initially insert first k elements into the min heap.
For the remaining elements
Check(peek) for the min element in the heap (O(1))
If the min element is lesser than the current element, then remove the min element from heap and insert the current element.
Finally min heap will have 'k' largest elements
This would require n(log k) time.
I would suggest something like this:
take the k largest numbers on each machine in sorted order O(Nk) where N is the number of element on each machine
sort each of these arrays of k elements by largest element (you will get k arrays of k elements sorted by largest element : a square matrix kxk)
take the "upper triangle" of the matrix made of these k arrays of k elements, (the k largest element will be in this upper triangle)
the central machine can now find the k largest element of these k(k+1)/2 elements
Let the machines find the out k largest elements copy it into a
datastructure (stack), sort it and pass it on to the Central
machine.
At the central machine receive the stacks from all the machine. Find
the greatest of the elements at the top of the stacks.
Pop out the greatest element form its stack and copy it to the 'TopK list'.
Leave the other stacks intact.
Repeat step 3, k times to get Top K numbers.
1) sort the items on every machine
2) use a k - binary heap on the central machine
a) populate the heap with first (max) element from each machine
b) extract the first element, and put back in the heap the first element from the machine that you extracted the element. (of course heapify your heap, after the element is added).
Sort will be O(Nlog(N)) where N is the max array on the machines.
O(k) - to build the heap
O(klog(k)) to extract and populate the heap k times.
Complexity is max(O(klog(k)),O(Nlog(N)))
I would think the MapReduce paradigm would be well suited to a task like this.
Every machine runs it's own independent map task to find the maximum value in its array (depends on the language used) and this will probably be O(N) complexity for N numbers on each machine.
The reduce task compares the result from the individual machines' outputs to give you the largest k numbers.

Resources