Merge k sorted lists: Space complexity - algorithm

class Solution(object):
def mergeKLists(self, lists):
"""
:type lists: List[ListNode]
:rtype: ListNode
"""
head = point = ListNode(0)
q = PriorityQueue()
for l in lists:
if l:
q.put((l.val, l))
while not q.empty():
val, node = q.get()
point.next = ListNode(val)
point = point.next
node = node.next
if node:
q.put((node.val, node))
return head.next
Hi, I have Python solution for the problem "Merge k sorted lists",
https://leetcode.com/problems/merge-k-sorted-lists/solution/
In the official solution section it says:
O(n) Creating a new linked list costs O(n) space.
O(k) The code above present applies in-place method which cost O(1) space. And the priority queue (often implemented with heaps) costs O(k) space (it's far less than N in most situations).
I don't quite understand why the space is O(n) ? For me it's obviously O(k) because we only need a heap of size k. I don't see where we have created a new linked list.
Thanks.

Related

What's the most efficient way to convert a binomial tree into a sorted array of keys?

Say we are given a single-tree binomial heap, such that the binomial tree is of rank r, and thus holding 2^r keys. What's the most efficient way to convert it into a k<2^r length sorted array, with the k smallest keys of the tree? Let's assume we can't use any other data structure but Lazy Binomial Heaps, and Binomial Trees. Notice that at each level the children are unnecessarily linked by order, so you might have to make some comparisons at some point.
My solution was (assuming 1<=k<=2^r):
Create a new empty lazy binomial heap H.
Insert the root's key into the heap.
Create a new counter x, and set x=1.
For each level i=0,1,... (where the root is at level 0):
Let c be the number of nodes at level i.
Set x=x+c.
Iterate over the nodes in level i and:
Insert each node N to H. (In O(1))
If x < k, recursively make the same process for each node N, passing through x so the counting continues.
Repeat k times:
Extract the minimal key out of the heap and place it in the output array.
Delete the minimal key form the heap. (amortized cost: O(1))
Return output array.
There might be some holes in the pseudo-code, but I think the idea itself is clear. I also managed to implement it. However, I'm not sure that's the most efficient algorithm for this task.
Thanks to Gene's comment I see that the earlier algorithm I suggested will not always work, as it assumes the maximal node at level x is smaller than the minimal node at level x-1, which is not a reasonable assumption.
Yet, I believe this one makes the job efficiently:
public static int[] kMin(FibonacciHeap H, int k) {
if (H == null || H.isEmpty() || k <= 0)
return new int[0];
HeapNode tree = H.findMin();
int rank = tree.getRank();
int size = H.size();
size = (int) Math.min(size, Math.pow(2, rank));
if (k > size)
k = size;
int[] result = new int[k];
FibonacciHeap heap = new FibonacciHeap();
HeapNode next = H.findMin();
for(int i = 0; i < k; i++) { // k iterations
if(next != null)
for (Iterator<HeapNode> iter = next.iterator(); iter.hasNext(); ) { // rank nCr next.getParent().getRank() iterations.
next = iter.next();
HeapNode node = heap.insert(next.getKey()); // O(1)
node.setFreePointer(next);
}
next = heap.findMin().getFreePointer();
result[i] = next.getKey();
heap.deleteMin(); // O(log n) amortized cost.
next = next.child;
}
return result;
}
"freePointer" is a field in HeapNode, where I can store a pointer to another HeapNode. It is basically the "info field" most heaps have.
let r be the rank of the tree. Every iteration we insert at most r items to the external heap. In addition, every iteration we use Delete-Min to delete one item from the heap.
Therefore, the total cost of insertions is O(kr), and the total cost of Delete-Min is O(k*log(k)+k*log(r)). So the total cost of everything becomes O(k(log(k)+r))

Find the top K elements in O(N log K) time using heaps

Let's say if i have a list containing:
lst = [4,0,8,3,1,5,10]
and I'm planning to use a heap structure to help me retrieve the top k largest number where k is a user input.
I understand that heap sort is O(N log N) where we first take O(N) time to place them in a min/max heap and the O(log N) time to retrieve elements.
But the problem I'm facing now is that I'm required to retrieve the top k users in O(N log K) time. If my k is 4, i would have:
[10,8,5,4]
as my output. The thing I'm confused about is, at the early stage of forming the heap, am i supposed to heap the entire list in order to retrieve the top k elements?
The log K term would suggest that you would only want a heap of size K. Here is one possible solution.
Start with an unsorted array. Convert the first K elements to a min-heap of size K. At the top of the heap will be your smallest element. Successively replace the smallest element with each of the remaining N - K elements in the array (that do not constitute a part of the heap), in O(log K) time. After O(N) such operations, the first K elements in the array (or, the K elements of the heap you created) will now have the K largest elements in your array.
There are other solutions but this is the most straightforward.
PriorityQueue<Integer> pq = new PriorityQueue<Integer>();
for (int i : your_arraylist){
pq.add(i); //add to queue
if(pq. size() > k){
pq.poll(); //remove the top element, smallest in this case, once the queue
// reaches the size K
}
}
System.out.println(pq);

Why binary search is not possible in sorted linked list?

Is it possible to search a element with binary search in sorted linked list?
If it is not possible then the question is "why it is not possible"?
Binary search on a sorted array gives us the result in O(log N) comparisons and O(1) memory utilization. Linear search on a sorted array gives us the result in O(N) comparisons and O(1) memory utilization.
Beyond the normal memory and comparison measurements, we also have the idea of traversal steps. This is important for data structures with no random access. For example, in a linked list, to get to element j from the head, we would need to take j steps forward. These steps can happen without any comparison. As pointed out in the comments, the cost for making a traversal step may be different from the cost for making a comparison. A traversal step here translates to a memory read.
The question is what happens when our data structure is a sorted singly linked list? Is it worth doing binary search?
To address this, we need to look at the performance of binary search on a sorted singly linked list. The code looks like this:
struct Node {
Node* next;
int value;
};
Node* binarySearch(Node* n, int v) {
if (v <= n->value) return n;
Node *right, *left=n;
int size = count(n);
while (size > 1)
{
int newSize = (size / 2);
right = left;
for (int i = 0; (i < newSize) && (right->next!=nullptr); i++)
right = right->next;
if (v == right->value) return right;
else if (v > right->value) left = right;
size -= newSize;
}
if (right && (v < right->value)) return right;
else if (right->next) return right->next;
else return nullptr;
}
The function binarySearch returns the node with element equal to or just greater than v. The parameter n is the head node in a sorted singly linked list.
It is clear that the outer loop iterates O(log N) times where N = size of the list. For each iteration, we make 2 comparisons, so the total # of comparisons is O(log N).
The number of traversal steps is the number of times right = right->next; gets executed, which is O(N). This is because the # of iterations in the inner loop decreases by half at each iteration of the outer loop, so N/2 + N/4 + ... + 1 = N (plus or minus some wiggle room).
Memory usage is still O(1).
In contrast, linear search through a sorted singly linked list is O(n) traversal steps, O(n) comparisons, and O(1) memory.
So is it worth doing binary search on a singly linked list? The answer is almost always yes, but not quite.
Disregarding the cost to count, what happens if the element we are looking for is the 2nd element in the list? Linear search takes 1 step and 1 comparison. Binary search takes ~ N steps and ~log N comparisons. Reality isn't so clear.
So here's the summary:
Sorted Array
Binary: O(log N) comparisons, O(1) memory, O(log N) traversal steps
Linear: O(N) comparisons, O(1) memory, O(N) traversal steps
Although, technically, the # of required traversal steps is 0 for sorted arrays. We never have to step forward or backwards. The idea doesn't even make sense.
Sorted Singly Linked List
Binary: O(log N) comparisons, O(1) memory, O(N) traversal steps
Linear: O(N) comparisons, O(1) memory, O(N) traversal steps
And these are worst case run time. However, the glass may not always be half empty :p
A linked list only allows sequential access, so binary search is impossible even if the list is sorted.
Edit:
As others have pointed out, binary search is possible, but would be pointless.
We can emulate random access in a linked list, but that would be slow and has a time complexity of O(n) on average, so a binary search(which is normally O(lgn)) will take O(nlgn).
Edit 2: as #ethang pointed out, if it's a doubly linked list, the binary search can take only O(n). In each step, we can start from the previous position instead of from the head/tail, so the distance moved will half each time.
If you must use a linked list, you'd better using a linear search, whose complexity is only O(n), and is simpler than a binary search.
If you want to both search and insert/remove effectively, you may use other data strictures such as a binary search tree.
ethang shows how to perform binary search in a singly linked list with just O(1) extra space, O(n) traversal time, and O(log n) comparisons. I did not previously believe that this was possible. For fun, I figured I'd implement a similar algorithm in Haskell:
bs :: Ord k => Int -> k -> [(k,v)] -> Maybe v
bs 0 _ _ = Nothing
bs 1 needle ((k,v) : _)
| k == needle = Just v
| otherwise = Nothing
bs size needle left = case drop size' left of
right#((k,_):_)
| needle >= k -> bs (size - size') needle right
_ -> bs size' needle left
where size' = size `quot` 2
search :: Ord k => k -> [(k,v)] -> Maybe v
search k kvs = bs (length kvs) k kvs
This can be adjusted to use O(log i) comparisons and O(i) traversal time, where i is the distance from the beginning of the list to the location where the sought key is or would be. This implementation could be improved, but the gist is quite simple—replace the search above with this version:
import Control.Applicative ((<|>))
search :: Ord k => k -> [(k,v)] -> Maybe v
-- The 10 can be replaced by any positive integer
search = go 10
where
-- The 2 can be replaced by any integer > 1
go lim needle kvs#((k,_):_) | k <= needle =
bs lim needle kvs <|> go (lim*2) needle (drop lim kvs)
go _ _ _ = Nothing

Algorithm for Shuffling a Linked List in n log n time

I'm trying to shuffle a linked list using a divide-and-conquer algorithm that randomly shuffles a linked list in linearithmic (n log n) time and logarithmic (log n) extra space.
I'm aware that I can do a Knuth shuffle similar to that could be used in a simple array of values, but I'm not sure how I would do this with divide-and-conquer. What I mean is, what am I actually dividing? Do I just divide to each individual node in the list and then randomly assemble the list back together using some random value?
Or do I give each node a random number and then do a mergesort on the nodes based on the random numbers?
What about the following? Perform the same procedure as merge sort. When merging, instead of selecting an element (one-by-one) from the two lists in sorted order, flip a coin. Choose whether to pick an element from the first or from the second list based on the result of the coin flip.
Edit (2022-01-12): As GA1 points out in the answer below, this algorithm doesn't produce a permutation uniformly at random.
Algorithm.
shuffle(list):
if list contains a single element
return list
list1,list2 = [],[]
while list not empty:
move front element from list to list1
if list not empty: move front element from list to list2
shuffle(list1)
shuffle(list2)
if length(list2) < length(list1):
i = pick a number uniformly at random in [0..length(list2)]
insert a dummy node into list2 at location i
# merge
while list1 and list2 are not empty:
if coin flip is Heads:
move front element from list1 to list
else:
move front element from list2 to list
if list1 not empty: append list1 to list
if list2 not empty: append list2 to list
remove the dummy node from list
The key point for space is that splitting the list into two does not require any extra space. The only extra space we need is to maintain log n elements on the stack during recursion.
The point with the dummy node is to realize that inserting and removing a dummy element keeps the distribution of the elements uniform.
Edit (2022-01-12): As Riley points out in the comments, the analysis below is flawed.
Analysis.
Why is the distribution uniform? After the final merge, the probability P_i(n) of any given number ending up in the position i is as follows. Either it was:
in the i-th place in its own list, and the list won the coin toss the first i times, the probability of this is 1/2^i;
in the i-1-st place in its own list, and the list won the coin toss i-1 times including the last one and lost once, the probability of this is (i-1) choose 1 times 1/2^i;
in the i-2-nd place in its own list, and the list won the coin toss i-2 times including the last one and lost twice, the probability of this is (i-1) choose 2 times 1/2^i;
and so on.
So the probability
P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * P_j(n/2).
Inductively, you can show that P_i(n) = 1/n. I let you verify the base case and assume that P_j(n/2) = 2/n. The term \sum_{j=0}^{i-1} (i-1 choose j) is exactly the number of i-1-bit binary numbers, i.e. 2^{i-1}. So we get
P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * 2/n
= 2/n * 1/2^i * \sum_{j=0}^{i-1} (i-1 choose j)
= 1/n * 1/2^{i-1} * 2^{i-1}
= 1/n
I hope this makes sense. The only assumption we need is that n is even, and that the two lists are shuffled uniformly. This is achieved by adding (and then removing) the dummy node.
P.S. My original intuition was nowhere near rigorous, but I list it just in case. Imagine we assign numbers between 1 and n at random to the elements of the list. And now we run a merge sort with respect to these numbers. At any given step of the merge, it needs to decide which of the heads of the two lists is smaller. But the probability of one being greater than the other should be exactly 1/2, so we can simulate this by flipping a coin.
P.P.S. Is there a way to embed LaTeX here?
Code
Up shuffle approach
This (lua) version is improved from foxcub's answer to remove the need of dummy nodes.
In order to slightly simplify the code in this answer, this version suppose that your lists know their sizes. In the event they don't, you can always find it in O(n) time, but even better: a few simple adaptation in the code can be done to not require to compute it beforehand (like subdividing one over two instead of first and second half).
function listUpShuffle (l)
local lsz = #l
if lsz <= 1 then return l end
local lsz2 = math.floor(lsz/2)
local l1, l2 = {}, {}
for k = 1, lsz2 do l1[#l1+1] = l[k] end
for k = lsz2+1, lsz do l2[#l2+1] = l[k] end
l1 = listUpShuffle(l1)
l2 = listUpShuffle(l2)
local res = {}
local i, j = 1, 1
while i <= #l1 or j <= #l2 do
local rem1, rem2 = #l1-i+1, #l2-j+1
if math.random() < rem1/(rem1+rem2) then
res[#res+1] = l1[i]
i = i+1
else
res[#res+1] = l2[j]
j = j+1
end
end
return res
end
To avoid using dummy nodes, you have to compensate for the fact that the two intermediate lists can have different lengths by varying the probability to choose in each list. This is done by testing a [0,1] uniform random number against the ratio of nodes popped from the first list over the total number of node popped (in the two lists).
Down shuffle approach
You can also shuffle while you subdivide recursively, which in my humble tests showed slightly (but consistently) better performance. It might come from the fewer instructions, or on the other hand it might have appeared due to cache warmup in luajit, so you will have to profile for your use cases.
function listDownShuffle (l)
local lsz = #l
if lsz <= 1 then return l end
local lsz2 = math.floor(lsz/2)
local l1, l2 = {}, {}
for i = 1, lsz do
local rem1, rem2 = lsz2-#l1, lsz-lsz2-#l2
if math.random() < rem1/(rem1+rem2) then
l1[#l1+1] = l[i]
else
l2[#l2+1] = l[i]
end
end
l1 = listDownShuffle(l1)
l2 = listDownShuffle(l2)
local res = {}
for i = 1, #l1 do res[#res+1] = l1[i] end
for i = 1, #l2 do res[#res+1] = l2[i] end
return res
end
Tests
The full source is in my listShuffle.lua Gist.
It contains code that, when executed, prints a matrix representing, for each element of the input list, the number of times it appears at each position of the output list, after a specified number of run. A fairly uniform matrix 'show' the uniformity of the distribution of characters, hence the uniformity of the shuffle.
Here is an example run with 1000000 iteration using a (non power of two) 3 element list :
>> luajit listShuffle.lua 1000000 3
Up shuffle bias matrix:
333331 332782 333887
333377 333655 332968
333292 333563 333145
Down shuffle bias matrix:
333120 333521 333359
333435 333088 333477
333445 333391 333164
You can actually do better than that: the best list shuffle algorithm is O(n log n) time and just O(1) space. (You can also shuffle in O(n) time and O(n) space by constructing a pointer array for the list, shuffling it in place using Knuth and re-threading the list accordingly.)
Complexity proof
To see why O(n log n) time is minimal for O(1) space, observe that:
With O(1) space, updating the successor of an arbitrary list element necessarily takes O(n) time.
Wlog, you can assume that whenever you update one element, you also update all the other elements (leaving them unchanged if you wish), as this also takes just O(n) time.
With O(1) space, there are at most O(1) elements to choose from for the successor of any element you're updating (which specific elements these are will obviously depend on the algorithm).
Therefore, a single O(n) time update of all the elements could result in at most c^n different list permutations.
Since there are n! = O(n^n) = O(c^(n log n)) possible list permutations, you require at least O(log n) updates.
Linked-list data structure (because Python)
import collections
class Cons(collections.Sequence):
def __init__(self, head, tail=None):
self.head = head
self.tail = tail
def __getitem__(self, index):
current, n = self, index
while n > 0:
if isinstance(current, Cons):
current, n = current.tail, n - 1
else:
raise ValueError("Out of bounds index [{0}]".format(index))
return current
def __len__(self):
current, length = self, 0
while isinstance(current, Cons):
current, length = current.tail, length + 1
return length
def __repr__(self):
current, rep = self, []
while isinstance(current, Cons):
rep.extend((str(current.head), "::"))
current = current.tail
rep.append(str(current))
return "".join(rep)
Merge-style algorithm
Here is an O(n log n) time and O(1) space algorithm based on iterative merge sort. The basic idea is simple: shuffle the left half, then the right half, then merge them by randomly selecting from the two lists. Two things worth noting:
By making the algorithm iterative rather than recursive, and returning a pointer to the new last element at the end of every merge step, we reduce the space requirement to O(1) while keeping the time cost minimal.
To make sure that all possibilities are equally likely for all input sizes, we use probabilities from the Gilbert–Shannon–Reeds model riffle shuffle when merging (see http://en.wikipedia.org/wiki/Gilbert%E2%80%93Shannon%E2%80%93Reeds_model).
import random
def riffle_lists(head, list1, len1, list2, len2):
"""Riffle shuffle two sublists in place. Returns the new last element."""
for _ in range(len1 + len2):
if random.random() < (len1 / (len1 + len2)):
next, list1, len1 = list1, list1.tail, len1 - 1
else:
next, list2, len2 = list2, list2.tail, len2 - 1
head.tail, head = next, next
head.tail = list2
return head
def shuffle_list(list):
"""Shuffle a list in place using an iterative merge-style algorithm."""
dummy = Cons(None, list)
i, n = 1, len(list)
while (i < n):
head, nleft = dummy, n
while (nleft > i):
head = riffle_lists(head, head[1], i, head[i + 1], min(i, nleft - i))
nleft -= 2 * i
i *= 2
return dummy[1]
Another algorithm
Another interesting O(n log n) algorithm that produces not-quite-uniform shuffles involves simply riffle shuffling the list 3/2 log_2(n) times. As described in http://en.wikipedia.org/wiki/Gilbert%E2%80%93Shannon%E2%80%93Reeds_model, this leaves only a constant number of bits of information.
I'd say, that foxcub's answer is wrong. To prove that I will introduce a helpful definition for a perfectly shuffled list (call it array or sequence or whatever you want).
Definition: Assume we have a List L containing the elements a1, a2 ... an and the indexes 1, 2, 3..... n. If we expose the L to a shuffle operation (to which internals we have no access) L is perfectly shuffled if and only if by knowing indexes of some k (k< n) elements we can't deduce the indexes of remaining n-k elements. That is the remaining n-k elements are equally probable to be revealed at any of the remaining n-k indexes.
Example: if we have a four element list [a, b, c, d] and after shuffling it, we know that its first element is a ([a, .., .., ..]) than the probability for any of the elements b, c, d to occur in, let's say, the third cell equals 1/3.
Now, the smallest list for which the algorithm does not fulfil the definition has three elements. But the algorithm converts it to a 4-element list anyway, so we will try to show its incorrectness for a 4-element list.
Consider an input L = [a, b, c, d]Following the first run of the algorithm the L will be divided into l1 = [a, c] and l2 = [b, d]. After shuffling these two sublists (but before merging into the four-element result) we can get four equally probable 2-elements lists:
l1shuffled = [a , c] l2shuffled = [b , d]
l1shuffled = [a , c] l2shuffled = [d , b]
l1shuffled = [c , a] l2shuffled = [b , d]
l1shuffled = [c , a] l2shuffled = [d , b]
Now try to answer two questions.
1. What is the probability that after merging into the final result a will be the first element of the list.
Simply enough, we can see that only two of the four pairs above (again, equally probable) can give such a result (p1 = 1/2). For each of these pairs heads must be drawed during first flipping in the merge routine (p2 = 1/2). Thus the probability for having a as the first element of the Lshuffled is p = p1*p2 = 1/4, which is correct.
2. Knowing that a is on the first position of the Lshuffled, what is the probability of having c (we could as well choose b or d without loss of generality) on the second position of the Lshuffled
Now, according to the above definition of a perfectly shuffled list, the answer should be 1/3, since there are three numbers to put in the three remaining cells in the list
Let's see if the algorithm assures that.
After choosing 1 as the first element of the Lshuffled we would now have either:
l1shuffled = [c] l2shuffled = [b, d]
or:
l1shuffled = [c] l2shuffled = [d, b]
The probability of choosing 3 in both cases is equal to the probability of flipping heads (p3 = 1/2), thus the probability of having 3 as the second element of Lshuffled, when knowing that the first element element of Lshuffled is 1 equals 1/2. 1/2 != 1/3 which ends the proof of the incorrectness of the algorithm.
The interesting part is that the algorithm fullfils the necessary (but not sufficient) condition for a perfect shuffle, namely:
Given a list of n elements, for every index k (<n), for every element ak: after shuffling the list m times, if we have counted the times when ak occured on the k index, this count will tend to m/n by probability, with m tending to infinity.
Here is one possible solution:
#include <stdlib.h>
typedef struct node_s {
struct node_s * next;
int data;
} node_s, *node_p;
void shuffle_helper( node_p first, node_p last ) {
static const int half = RAND_MAX / 2;
while( (first != last) && (first->next != last) ) {
node_p firsts[2] = {0, 0};
node_p *lasts[2] = {0, 0};
int counts[2] = {0, 0}, lesser;
while( first != last ) {
int choice = (rand() <= half);
node_p next = first->next;
first->next = firsts[choice];
if( !lasts[choice] ) lasts[choice] = &(first->next);
++counts[choice];
first = next;
}
lesser = (counts[0] < counts[1]);
if( !counts[lesser] ) {
first = firsts[!lesser];
*(lasts[!lesser]) = last;
continue;
}
*(lasts[0]) = firsts[1];
*(lasts[1]) = last;
shuffle_helper( firsts[lesser], firsts[!lesser] );
first = firsts[!lesser];
last = *(lasts[!lesser]);
}
}
void shuffle_list( node_p thelist ) { shuffle_helper( thelist, NULL ); }
This is basically quicksort, but with no pivot, and with random partitioning.
The outer while loop replaces a recursive call.
The inner while loop randomly moves each element into one of two sublists.
After the inner while loop, we connect the sublists to one another.
Then, we recurse on the smaller sublist, and loop on the larger.
Since the smaller sublist can never be more than half the size of the initial list, the worst case depth of recursion is the log base two of the number of elements. The amount of memory needed is O(1) times the depth of recursion.
The average runtime, and number of calls to rand() is O(N log N).
More precise runtime analysis requires an understanding of the phrase "almost surely."
Bottom up merge sort without compares. while merging don't do any comparison just swap the elements.
You could traverse over the list, randomly generating 0 or 1 at each node.
If it is 1, remove the node and place it as the first node of the list.
If its is 0, do nothing.
loop this until you reach the end of the list.

Finding the median of an unsorted array

To find the median of an unsorted array, we can make a min-heap in O(nlogn) time for n elements, and then we can extract one by one n/2 elements to get the median. But this approach would take O(nlogn) time.
Can we do the same by some method in O(n) time? If we can, then please tell or suggest some method.
You can use the Median of Medians algorithm to find median of an unsorted array in linear time.
I have already upvoted the #dasblinkenlight answer since the Median of Medians algorithm in fact solves this problem in O(n) time. I only want to add that this problem could be solved in O(n) time by using heaps also. Building a heap could be done in O(n) time by using the bottom-up. Take a look to the following article for a detailed explanation Heap sort
Supposing that your array has N elements, you have to build two heaps: A MaxHeap that contains the first N/2 elements (or (N/2)+1 if N is odd) and a MinHeap that contains the remaining elements. If N is odd then your median is the maximum element of MaxHeap (O(1) by getting the max). If N is even, then your median is (MaxHeap.max()+MinHeap.min())/2 this takes O(1) also. Thus, the real cost of the whole operation is the heaps building operation which is O(n).
BTW this MaxHeap/MinHeap algorithm works also when you don't know the number of the array elements beforehand (if you have to resolve the same problem for a stream of integers for e.g). You can see more details about how to resolve this problem in the following article Median Of integer streams
Quickselect works in O(n), this is also used in the partition step of Quicksort.
The quick select algorithm can find the k-th smallest element of an array in linear (O(n)) running time. Here is an implementation in python:
import random
def partition(L, v):
smaller = []
bigger = []
for val in L:
if val < v: smaller += [val]
if val > v: bigger += [val]
return (smaller, [v], bigger)
def top_k(L, k):
v = L[random.randrange(len(L))]
(left, middle, right) = partition(L, v)
# middle used below (in place of [v]) for clarity
if len(left) == k: return left
if len(left)+1 == k: return left + middle
if len(left) > k: return top_k(left, k)
return left + middle + top_k(right, k - len(left) - len(middle))
def median(L):
n = len(L)
l = top_k(L, n / 2 + 1)
return max(l)
No, there is no O(n) algorithm for finding the median of an arbitrary, unsorted dataset.
At least none that I am aware of in 2022. All answers offered here are variations/combinations using heaps, Median of Medians, Quickselect, all of which are strictly O(nlogn).
See https://en.wikipedia.org/wiki/Median_of_medians and http://cs.indstate.edu/~spitla/abstract2.pdf.
The problem appears to be confusion about how algorithms are classified, which is according their limiting (worst case) behaviour. "On average" or "typically" O(n) with "worst case" O(f(n)) means (in textbook terms) "strictly O(f(n))". Quicksort for example, is often discussed as being O(nlogn) (which is how it typically performs), although it is in fact an O(n^2) algorithm because there is always some pathological ordering of inputs for which it can do no better than n^2 comparisons.
It can be done using Quickselect Algorithm in O(n), do refer to Kth order statistics (randomized algorithms).
As wikipedia says, Median-of-Medians is theoretically o(N), but it is not used in practice because the overhead of finding "good" pivots makes it too slow.
http://en.wikipedia.org/wiki/Selection_algorithm
Here is Java source for a Quickselect algorithm to find the k'th element in an array:
/**
* Returns position of k'th largest element of sub-list.
*
* #param list list to search, whose sub-list may be shuffled before
* returning
* #param lo first element of sub-list in list
* #param hi just after last element of sub-list in list
* #param k
* #return position of k'th largest element of (possibly shuffled) sub-list.
*/
static int select(double[] list, int lo, int hi, int k) {
int n = hi - lo;
if (n < 2)
return lo;
double pivot = list[lo + (k * 7919) % n]; // Pick a random pivot
// Triage list to [<pivot][=pivot][>pivot]
int nLess = 0, nSame = 0, nMore = 0;
int lo3 = lo;
int hi3 = hi;
while (lo3 < hi3) {
double e = list[lo3];
int cmp = compare(e, pivot);
if (cmp < 0) {
nLess++;
lo3++;
} else if (cmp > 0) {
swap(list, lo3, --hi3);
if (nSame > 0)
swap(list, hi3, hi3 + nSame);
nMore++;
} else {
nSame++;
swap(list, lo3, --hi3);
}
}
assert (nSame > 0);
assert (nLess + nSame + nMore == n);
assert (list[lo + nLess] == pivot);
assert (list[hi - nMore - 1] == pivot);
if (k >= n - nMore)
return select(list, hi - nMore, hi, k - nLess - nSame);
else if (k < nLess)
return select(list, lo, lo + nLess, k);
return lo + k;
}
I have not included the source of the compare and swap methods, so it's easy to change the code to work with Object[] instead of double[].
In practice, you can expect the above code to be o(N).
Let the problem be: finding the Kth largest element in an unsorted array.
Divide the array into n/5 groups where each group consisting of 5 elements.
Now a1,a2,a3....a(n/5) represent the medians of each group.
x = Median of the elements a1,a2,.....a(n/5).
Now if k<n/2 then we can remove the largets, 2nd largest and 3rd largest element of the groups whose median is greater than the x. We can now call the function again with 7n/10 elements and finding the kth largest value.
else if k>n/2 then we can remove the smallest ,2nd smallest and 3rd smallest element of the group whose median is smaller than the x. We can now call the function of again with 7n/10 elements and finding the (k-3n/10)th largest value.
Time Complexity Analysis:
T(n) time complexity to find the kth largest in an array of size n.
T(n) = T(n/5) + T(7n/10) + O(n)
if you solve this you will find out that T(n) is actually O(n)
n/5 + 7n/10 = 9n/10 < n
Notice that building a heap takes O(n) actually not O(nlogn), you can check this using amortized analysis or simply check in Youtube.
Extract-Min takes O(logn), therefore, extracting n/2 will take (nlogn/2) = O(nlogn) amortized time.
About your question, you can simply check at Median of Medians.

Resources