selection algorithm for median - algorithm

I was trying to understand the selection algorithm for finding the median. I have pasted the psuedo code below.
SELECT(A[1 .. n], k):
if n<=25
use brute force
else
m = ceiling(n/5)
for i=1 to m
B[i]=SELECT(A[5i-4 .. 5i], 3)
mom=SELECT(B[1 ..m], floor(m/2))
r = PARTITION(A[1 .. n],mom)
if k < r
return SELECT(A[1 .. r-1], k)
else if k > r
return SELECT(A[r +1 .. n], k-r)
else
return mom
i have a very trivial doubt. I was wondering what the author means by brute force written above for i<=25. Is it that he will compare elements one by one with every other element and see if its the kth largest or something else.

The code must come from here.
A brute force algorithm can be any simple and stupid algorithm. In your example, you can sort the 25 elements and find the middle one. This is simple and stupid compared to the selection algorithm since sorting takes O(nlgn) while selection takes only linear time.
A brute force algorithm is often good enough when n is small. Besides, it is easier to implement. Read more about brute force here.

Common wisdom is that Quicksort is slower than insertion sort for small inputs. Therefore many implementations switch to insertion sort at some threshold.
There is a reference to this practice in the Wikipedia page on Quicksort.
Here's an example of commercial mergesort code that switches to insertion sort for small inputs. Here the threshold is 7.
The "brute force" almost certainly refers to the fact that the code here is using the same practice: insertion sort followed by picking the middle element(s) for the median.
However I've found in practice that the common wisdom is not generally true. When I've run benchmarks, the switch has either very little positive effect or negative. That was for Quicksort. In the Parition algorithm, it's more likely ot be negative because one side of the partition is thrown away at each step, so there is less time spent on small inputs. This is verified in #Dennis's response to this SO question.

Related

Partial selection sort vs Mergesort to find "k largest in array"

I was wondering if my line of thinking is correct.
I'm preparing for interviews (as a college student) and one of the questions I came across was to find the K largest numbers in an array.
My first thought was to just use a partial selection sort (e.g. scan the array from the first element and keep two variables for the lowest element seen and its index and swap with that index at the end of the array and continue doing so until we've swapped K elements and return a copy of the first K elements in that array).
However, this takes O(K*n) time. If I simply sorted the array using an efficient sorting method like Mergesort, it would only take O(n*log(n)) time to sort the entire array and return the K largest numbers.
Is it good enough to discuss these two methods during an interview (comparing log(n) and K of the input and going with the smaller of the two to compute the K largest) or would it be safe to assume that I'm expected to give a O(n) solution for this problem?
There exists an O(n) algorithm for finding the k'th smallest element, and once you have that element, you can simply scan through the list and collect the appropriate elements. It's based on Quicksort, but the reasoning behind why it works are rather hairy... There's also a simpler variation that probably will run in O(n). My answer to another question contains a brief discussion of this.
Here's a general discussion of this particular interview question found from googling:
http://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/
As for your question about interviews in general, it probably greatly depends on the interviewer. They usually like to see how you think about things. So, as long as you can come up with some sort of initial solution, your interviewer would likely ask questions from there depending on what they were looking for exactly.
IMHO, I think the interviewer wouldn't be satisfied with either of the methods if he says the dataset is huge (say a billion elements). In this case, if K to be returned is huge (nearing a billion) your partial selection would almost result in an O(n^2). I think it entirely depends on the intricacies of the question proposed.
EDIT: Aasmund Eldhuset's answer shows you how to achieve the O(n) time complexity.
If you want to find K (so for K = 5 you'll get five results - five highest numbers ) then the best what you can get is O(n+klogn) - you can build prority queue in O(n) and then invoke pq.Dequeue() k times. If you are looking for K biggest number then you can get it with O(n) quicksort modification - it's called k-th order statistics. Pseudocode looks like that: (it's randomized algorithm, avg time is approximately O(n) however worst case is O(n^2))
QuickSortSelection(numbers, currentLength, k) {
if (currentLength == 1)
return numbers[0];
int pivot = random number from numbers array;
int newPivotIndex = partitionAroundPivot(numbers) // check quicksort algorithm for more details - less elements go left to the pivot, bigger elements go right
if ( k == newPivotIndex )
return pivot;
else if ( k < newPivotIndex )
return QuickSortSelection(numbers[0..newPivotIndex-1], newPivotIndex, k)
else
return QuickSortSelection(numbers[newPivotIndex+1..end], currentLength-newPivotIndex+1, k-newPivotIndex);
}
As i said this algorithm is O(n^2) worst case because pivot is chosen at random (however probability of running time of ~n^2 is something like 1/2^n). You can convert it deterministic algorithm with same running time worst case using for instance median of three median as a pivot - but it is slower in practice (due to constant).

What sorting techniques can I use when comparing elements is expensive?

Problem
I have an application where I want to sort an array a of elements a0, a1,...,an-1. I have a comparison function cmp(i,j) that compares elements ai and aj and a swap function swap(i,j), that swaps elements ai and aj of the array. In the application, execution of the cmp(i,j) function might be extremely expensive, to the point where one execution of cmp(i,j) takes longer than any other steps in the sort (except for other cmp(i,j) calls, of course) together. You may think of cmp(i,j) as a rather lengthy IO operation.
Please assume for the sake of this question that there is no way to make cmp(i,j) faster. Assume all optimizations that could possibly make cmp(i,j) faster have already been done.
Questions
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
It is possible in my application to write a predicate expensive(i,j) that is true iff a call to cmp(i,j) would take a long time. expensive(i,j) is cheap and expensive(i,j) ∧ expensive(j,k) → expensive(i,k) mostly holds in my current application. This is not guaranteed though.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I'd like pointers to further material on this topic.
Example
This is an example that is not entirely unlike the application I have.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them. This essentially boils down to sorting the files by some arbitrary criterium and then traversing them in order, outputting sequences of equal files that were encountered.
Of course reader in large amounts of data is expensive, therefor one can, for instance, only read the first megabyte of each file and calculate a hash function on this data. If the files compare equal, so do the hashes, but the reverse may not hold. Two large file could only differ in one byte near the end.
The implementation of expensive(i,j) in this case is simply a check whether the hashes are equal. If they are, an expensive deep comparison is neccessary.
I'll try to answer each question as best as I can.
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Traditional sorting methods may have some variation, but in general, there is a mathematical limit to the minimum number of comparisons necessary to sort a list, and most algorithms take advantage of that, since comparisons are often not inexpensive. You could try sorting by something else, or try using a shortcut that may be faster that may approximate the real solution.
Would the existance of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I don't think you can get around the necessity of doing at least the minimum number of comparisons, but you may be able to change what you compare. If you can compare hashes or subsets of the data instead of the whole thing, that could certainly be helpful. Anything you can do to simplify the comparison operation will make a big difference, but without knowing specific details of the data, it's hard to suggest specific solutions.
I'd like pointers to further material on this topic.
Check these out:
Apparently Donald Knuth's The Art of Computer Programming, Volume 3 has a section on this topic, but I don't have a copy handy.
Wikipedia of course has some insight into the matter.
Sorting an array with minimal number of comparisons
How do I figure out the minimum number of swaps to sort a list in-place?
Limitations of comparison based sorting techniques
The theoretical minimum number of comparisons needed to sort an array of n elements on average is lg (n!), which is about n lg n - n. There's no way to do better than this on average if you're using comparisons to order the elements.
Of the standard O(n log n) comparison-based sorting algorithms, mergesort makes the lowest number of comparisons (just about n lg n, compared with about 1.44 n lg n for quicksort and about n lg n + 2n for heapsort), so it might be a good algorithm to use as a starting point. Typically mergesort is slower than heapsort and quicksort, but that's usually under the assumption that comparisons are fast.
If you do use mergesort, I'd recommend using an adaptive variant of mergesort like natural mergesort so that if the data is mostly sorted, the number of comparisons is closer to linear.
There are a few other options available. If you know for a fact that the data is already mostly sorted, you could use insertion sort or a standard variation of heapsort to try to speed up the sorting. Alternatively, you could use mergesort but use an optimal sorting network as a base case when n is small. This might shave off enough comparisons to give you a noticeable performance boost.
Hope this helps!
A technique called the Schwartzian transform can be used to reduce any sorting problem to that of sorting integers. It requires you to apply a function f to each of your input items, where f(x) < f(y) if and only if x < y.
(Python-oriented answer, when I thought the question was tagged [python])
If you can define a function f such that f(x) < f(y) if and only if x < y, then you can sort using
sort(L, key=f)
Python guarantees that key is called at most once for each element of the iterable you are sorting. This provides support for the Schwartzian transform.
Python 3 does not support specifying a cmp function, only the key parameter. This page provides a way of easily converting any cmp function to a key function.
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Edit: Ah, sorry. There are algorithms that minimize the number of comparisons (below), but not that I know of for specific elements.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
Not that I know of, but perhaps you'll find it in these papers below.
I'd like pointers to further material on this topic.
On Optimal and Efficient in Place Merging
Stable Minimum Storage Merging by Symmetric Comparisons
Optimal Stable Merging (this one seems to be O(n log2 n) though
Practical In-Place Mergesort
If you implement any of them, posting them here might be useful for others too! :)
Is there a sorting algorithm that minimizes the number of calls to cmp(i,j)?
Merge insertion algorithm, described in D. Knuth's "The art of computer programming", Vol 3, chapter 5.3.1, uses less comparisons than other comparison-based algorithms. But still it needs O(N log N) comparisons.
Would the existence of expensive(i,j) allow for a better algorithm that tries to avoid expensive comparing operations? If yes, can you point me to such an algorithm?
I think some of existing sorting algorithms may be modified to take into account expensive(i,j) predicate. Let's take the simplest of them - insertion sort. One of its variants, named in Wikipedia as binary insertion sort, uses only O(N log N) comparisons.
It employs a binary search to determine the correct location to insert new elements. We could apply expensive(i,j) predicate after each binary search step to determine if it is cheap to compare the inserted element with "middle" element found in binary search step. If it is expensive we could try the "middle" element's neighbors, then their neighbors, etc. If no cheap comparisons could be found we just return to the "middle" element and perform expensive comparison.
There are several possible optimizations. If predicate and/or cheap comparisons are not so cheap we could roll back to the "middle" element earlier than all other possibilities are tried. Also if move operations cannot be considered as very cheap, we could use some order statistics data structure (like Indexable skiplist) do reduce insertion cost to O(N log N).
This modified insertion sort needs O(N log N) time for data movement, O(N2) predicate computations and cheap comparisons and O(N log N) expensive comparisons in the worst case. But more likely there would be only O(N log N) predicates and cheap comparisons and O(1) expensive comparisons.
Consider a set of possibly large files. In this application the goal is to find duplicate files among them.
If the only goal is to find duplicates, I think sorting (at least comparison sorting) is not necessary. You could just distribute the files between buckets depending on hash value computed for first megabyte of data from each file. If there are more than one file in some bucket, take other 10, 100, 1000, ... megabytes. If still more than one file in some bucket, compare them byte-by-byte. Actually this procedure is similar to radix sort.
Most sorting algorithm out there try minimize the amount of comparisons during sorting.
My advice:
Pick quick-sort as a base algorithm and memorize results of comparisons just in case you happen to compare the same problems again. This should help you in the O(N^2) worst case of quick-sort. Bear in mind that this will make you use O(N^2) memory.
Now if you are really adventurous you could try the Dual-Pivot quick-sort.
Something to keep in mind is that if you are continuously sorting the list with new additions, and the comparison between two elements is guaranteed to never change, you can memoize the comparison operation which will lead to a performance increase. In most cases this won't be applicable, unfortunately.
We can look at your problem in the another direction, Seems your problem is IO related, then you can use advantage of parallel sorting algorithms, In fact you can run many many threads to run comparison on files, then sort them by one of a best known parallel algorithms like Sample sort algorithm.
Quicksort and mergesort are the fastest possible sorting algorithm, unless you have some additional information about the elements you want to sort. They will need O(n log(n)) comparisons, where n is the size of your array.
It is mathematically proved that any generic sorting algorithm cannot be more efficient than that.
If you want to make the procedure faster, you might consider adding some metadata to accelerate the computation (can't be more precise unless you are, too).
If you know something stronger, such as the existence of a maximum and a minimum, you can use faster sorting algorithms, such as radix sort or bucket sort.
You can look for all the mentioned algorithms on wikipedia.
As far as I know, you can't benefit from the expensive relationship. Even if you know that, you still need to perform such comparisons. As I said, you'd better try and cache some results.
EDIT I took some time to think about it, and I came up with a slightly customized solution, that I think will make the minimum possible amount of expensive comparisons, but totally disregards the overall number of comparisons. It will make at most (n-m)*log(k) expensive comparisons, where
n is the size of the input vector
m is the number of distinct component which are easy to compare between each other
k is the maximum number of elements which are hard to compare and have consecutive ranks.
Here is the description of the algorithm. It's worth nothing saying that it will perform much worse than a simple merge sort, unless m is big and k is little. The total running time is O[n^4 + E(n-m)log(k)], where E is the cost of an expensive comparison (I assumed E >> n, to prevent it from being wiped out from the asymptotic notation. That n^4 can probably be further reduced, at least in the mean case.
EDIT The file I posted contained some errors. While trying it, I also fixed them (I overlooked the pseudocode for insert_sorted function, but the idea was correct. I made a Java program that sorts a vector of integers, with delays added as you described. Even if I was skeptical, it actually does better than mergesort, if the delay is significant (I used 1s delay agains integer comparison, which usually takes nanoseconds to execute)

have I invented a new sorting algorithm? or is this the same as quicksort

I made an algorithm for sorting but I then I thought perhaps I had just reinvented quicksort.
However I heard quicksort is O(N^2) worst case; I think my algorithm should be only O(NLogN) worst case.
Is this the same as quicksort?
The algorithm works by swapping values so that all values smaller than the median are moved to the left of the array. It then works recursively on each side.
The algorithm starts with i=0, j = n-1
i and j move towards each other with list[i] and list[j] being swapped if necessary.
Here is some code for the first iteration before the recursion:
_list = [1,-4,2,-5,3,-6]
def in_place(_list,i,j,median):
while i<j:
a,b = _list[i],_list[j]
if (a<median and b>=median):
i+=1
j-=1
elif (a>=median and b<median):
_list[i],_list[j]=b,a
i+=1
j-=1
elif a<median:
i+=1
else:
j-=1
print "changed to ", _list
def get_median(_list):
#approximate median in O(N) with O(1) space
return -4
median = get_median(_list)
in_place(_list,0,len(_list)-1,median)
"""
changed1 to [-6, -5, 2, -4, 3, 1]
"""
http://en.wikipedia.org/wiki/Quicksort#Selection-based_pivoting
Conversely, once we know a worst-case O(n) selection algorithm is
available, we can use it to find the ideal pivot (the median) at every
step of quicksort, producing a variant with worst-case O(n log n)
running time. In practical implementations, however, this variant is
considerably slower on average.
Another variant is to choose the Median of Medians as the pivot
element instead of the median itself for partitioning the elements.
While maintaining the asymptotically optimal run time complexity of
O(n log n) (by preventing worst case partitions), it is also
considerably faster than the variant that chooses the median as pivot.
For starters, I assume there is other code not shown, as I'm pretty sure that the code you've shown on its own would not work.
I'm sorry to steal your fire, but I'm afraid what code you do show seems to be Quicksort, and not only that, but the code seems to possibly suffer from some bugs.
Consider the case of sorting a list of identical elements. Your _in_place method, which seems to be what is traditionally called partition in Quicksort, would not move any elements correctly, but at the end the j and i seem to reflect the list having only one partition containing the whole list, in which case you would recurse again on the whole list forever. My guess is, as as mentioned, you don't return anything from it, or seem to actually fully sort anywhere, so I am left guessing how this would be used.
I'm afraid using the real median for Quicksort is not only a possibly fairly slow strategy in the average case, it also doesn't avoid the O(n^2) worst case, again a list of identical elements would provide such a worst case. However, I think a three way partition Quicksort with such a median selection algorithm would guarantee O(n*log n) time. Nonetheless, this is a known option for pivot choice and not a new algorithm.
In short, this appears to be an incomplete and possibly buggy Quicksort, and without three way partitioning, using the median would not guarantee you O(n*log n). However, I do feel that it is a good thing and worth congratulations that you did think of the idea of using the median yourself - even if it has been thought of by others before.

Why is Binary Search a divide and conquer algorithm?

I was asked if a Binary Search is a divide and conquer algorithm at an exam. My answer was yes, because you divided the problem into smaller subproblems, until you reached your result.
But the examinators asked where the conquer part in it was, which I was unable to answer. They also disapproved that it actually was a divide and conquer algorithm.
But everywhere I go on the web, it says that it is, so I would like to know why, and where the conquer part of it is?
The book:
Data Structures and Algorithm Analysis in Java (2nd Edition), by Mark Allen Weiss
Says that a D&C algorithm should have two disjoint recursive calls, just like QuickSort does.
Binary Search does not have this, even though it can be implemented recursively.
I think it is not divide and conquer, see first paragraph in http://en.wikipedia.org/wiki/Divide_and_conquer_algorithm
recursively breaking down a problem into two or more sub-problems
which are then combined to give a solution
In binary search there is still only one problem which does just reducing data by half every step, so no conquer (merging) phase of the results is needed.
It isn't.
To complement #Kenci's post, DnC algorithms have a few general/common properties; they:
divide the original problem instance into a set of smaller sub-instances of itself;
independently solve each sub-instance;
combine smaller/independent sub-instance solutions to build a single solution for the larger/original instance
The problem with Binary Search is that it does not really even generate a set of independent sub-instances to be solved, as per step 1; it only simplifies the original problem by permanently discarding sections it's not interested in. In other words, it only reduces the problem's size and that's as far as it ever goes.
A DnC algorithm is supposed to not only identify/solve the smaller sub-instances of the original problem independently of each other, but also use that set of partial independent solutions to "build up" a single solution for the larger problem instance as a whole.
The book Fundamentals of Algorithmics, G. Brassard, P. Bratley says the following (bold my emphasis, italics in original):
It is probably the simplest application of divide-and-conquer, so simple in fact that strictly speaking this is an application of simplification rather than divide-and-conquer: the solution to any sufficiently large instance is reduced to that of a single smaller one, in this case of half size.
Section 7.3 Binary Search on p.226.
In a divide and conquer strategy :
1.Problem is divided into parts;
2.Each of these parts is attacked/solved independently, by applying the algorithm at hand (mostly recursion is used for this purpose) ;
3.And then the solutions of each partition/division and combined/merged together to arrive at the final solution to the problem as a whole (this comes under conquer)
Example, Quick sort, merge sort.
Basically, the binary search algorithm just divides its work space(input (ordered) array of size n) into half in each iteration. Therefore it is definitely deploying the divide strategy and as a result, the time complexity reduces down to O(lg n).So,this covers up the "divide" part of it.
As can be noticed, the final solution is obtained from the last comparison made, that is, when we are left with only one element for comparison.
Binary search does not merge or combine solution.
In short, binary search divides the size of the problem (on which it has to work) into halves but doesn't find the solution in bits and pieces and hence no need of merging the solution occurs!
I know it's a bit too lengthy but i hope it helps :)
Also you can get some idea from : https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/running-time-of-binary-search
Also i realised just now that this question was posted long back!
My bad!
Apparently some people consider binary search a divide-and-conquer algorithm, and some are not. I quickly googled three references (all seem related to academia) that call it a D&C algorithm:
http://www.cs.berkeley.edu/~vazirani/algorithms/chap2.pdf
http://homepages.ius.edu/rwisman/C455/html/notes/Chapter2/DivConq.htm
http://www.csc.liv.ac.uk/~ped/teachadmin/algor/d_and_c.html
I think it's common agreement that a D&C algorithm should have at least the first two phases of these three:
divide, i.e. decide how the whole problem is separated into sub-problems;
conquer, i.e. solve each of the sub-problems independently;
[optionally] combine, i.e. merge the results of independent computations together.
The second phase - conquer - should recursively apply the same technique to solve the subproblem by dividing into even smaller sub-sub-problems, and etc. In practice, however, often some threshold is used to limit the recursive approach, as for small size problems a different approach might be faster. For example, quick sort implementations often use e.g. bubble sort when the size of an array portion to sort becomes small.
The third phase might be a no-op, and in my opinion it does not disqualify an algorithm as D&C. A common example is recursive decomposition of a for-loop with all iterations working purely with independent data items (i.e. no reduction of any form). It might look useless at glance, but in fact it's very powerful way to e.g. execute the loop in parallel, and utilized by such frameworks as Cilk and Intel's TBB.
Returning to the original question: let's consider some code that implements the algorithm (I use C++; sorry if this is not the language you are comfortable with):
int search( int value, int* a, int begin, int end ) {
// end is one past the last element, i.e. [begin, end) is a half-open interval.
if (begin < end)
{
int m = (begin+end)/2;
if (value==a[m])
return m;
else if (value<a[m])
return search(value, a, begin, m);
else
return search(value, a, m+1, end);
}
else // begin>=end, i.e. no valid array to search
return -1;
}
Here the divide part is int m = (begin+end)/2; and all the rest is the conquer part. The algorithm is explicitly written in a recursive D&C form, even though only one of the branches is taken. However, it can also be written in a loop form:
int search( int value, int* a, int size ) {
int begin=0, end=size;
while( begin<end ) {
int m = (begin+end)/2;
if (value==a[m])
return m;
else if (value<a[m])
end = m;
else
begin = m+1;
}
return -1;
}
I think it's quite a common way to implement binary search with a loop; I deliberately used the same variable names as in the recursive example, so that commonality is easier to see. Therefore we might say that, again, calculating the midpoint is the divide part, and the rest of the loop body is the conquer part.
But of course if your examiners think differently, it might be hard to convince them it's D&C.
Update: just had a thought that if I were to develop a generic skeleton implementation of a D&C algorithm, I would certainly use binary search as one of API suitability tests to check whether the API is sufficiently powerful while also concise. Of course it does not prove anything :)
The Merge Sort and Quick Sort algorithms use the divide and conquer technique (because there are 2 sub-problems) and Binary Search comes under decrease and conquer (because there is 1 sub-problem).
Therefore, Binary Search actually uses the decrease and conquer technique and not the divide and conquer technique.
Source: https://www.geeksforgeeks.org/decrease-and-conquer/
Binary search is tricky to describe with divide-and-conquer because the conquering step is not explicit. The result of the algorithm is the index of the needle in the haystack, and a pure D&C implementation would return the index of the needle in the smallest haystack (0 in the one-element list) and then recursively add the offsets in the larger haystacks that were divided in the divison step.
Pseudocode to explain:
function binary_search has arguments needle and haystack and returns index
if haystack has size 1
return 0
else
divide haystack into upper and lower half
if needle is smaller than smallest element of upper half
return 0 + binary_search needle, lower half
else
return size of lower half + binary_search needle, upper half
The addition (0 + or size of lower half) is the conquer part. Most people skip it by providing indices into a larger list as arguments, and thus it is often not readily available.
The divide part is of course dividing the set into halves.
The conquer part is determining whether and on what position in the processed part there is a searched element.
Dichotomic in computer science refers to choosing between two antithetical choices, between two distinct alternatives. A dichotomy is any splitting of a whole into exactly two non-overlapping parts, meaning it is a procedure in which a whole is divided into two parts. It is a partition of a whole (or a set) into two parts (subsets) that are:
1. Jointly Exhaustive: everything must belong to one part or the other, and
2. Mutually Exclusive: nothing can belong simultaneously to both parts.
Divide and conquer works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple enough to be solved directly.
So the binary search halves the number of items to check with each iteration and determines if it has a chance of locating the "key" item in that half or moving on to the other half if it is able to determine keys absence. As the algorithm is dichotomic in nature so the binary search will believe that the "key" has to be in one part until it reaches the exit condition where it returns that the key is missing.
Divide and Conquer algorithm is based on 3 step as follows:
Divide
Conquer
Combine
Binary Search problem can be defined as finding x in the sorted array A[n].
According to this information:
Divide: compare x with middle
Conquer: Recurse in one sub array. (Finding x in this array)
Combine: it is not necessary.
A proper divide and conquer algorithm will require both parts to be processed.
Therefore, many people will not call binary-search a divide and conquer algorithm, it does divide the problem, but discards the other half.
But most likely, your examiners just wanted to see how you argue. (Good) exams aren't about the facts, but about how you react when the challenge goes beyond the original material.
So IMHO the proper answer would have been:
Well, technically, it consists only of a divide step, but needs to conquer only half of the original task then, the other half is trivially done already.
BTW: there is a nice variation of QuickSort, called QuickSelect, which actually exploits this difference to obtain an on average O(n) median search algorithm. It's like QuickSort - but descends only into the half it is interested in.
Binary Search is not a divide and conquer approach. It is a decrease and conquer approach.
In divide and conquer approach, each subproblem must contribute to the solution but in binary search, all subdivision does not contribute to the solution. we divide into two parts and discard one part because we know that the solution does not exist in this part and look for the solution only in one part.
The informal definition is more or less: Divide the problem into small problems. Then solve them and put them together (conquer). Solving is in fact deciding where to go next (left, right, element found).
Here a quote from wikipedia:
The name "divide and conquer" is sometimes applied also to algorithms that reduce each problem to only one subproblem, such as the binary search algorithm for finding a record in a sorted list.
This states, it's NOT [update: misread this phrase:)] only one part of divide and conquer.
Update:
This article made it clear for me. I was confused since the definition says you have to solve every sub problem. But you solved the sub problem if you know you don't have to keep on searching..
The Binary Search is a divide and conquer algorithm:
1) In Divide and Conquer algorithms, we try to solve a problem by solving a smaller sub problem (Divide part) and use the solution to build the solution for our bigger problem(Conquer).
2) Here our problem is to find an element in the sorted array. We can solve this by solving a similar sub problem. (We are creating sub problems here based on a decision that the element being searched is smaller or bigger than the middle element). Thus once we know that the element can not exist surely in one half, we solve a similar sub-problem in the the other half.
3) This way we recurse.
4) The conquer part here is just returning the value returned by the sub problem to the top the recursive tree
I think it is Decrease and Conquer.
Here is a quote from wikipedia.
"The name decrease and conquer has been proposed instead for the
single-subproblem class"
http://en.wikipedia.org/wiki/Divide_and_conquer_algorithms#Decrease_and_conquer
According to my understanding, "Conquer" part is at the end when you find the target element of the Binary search. The "Decrease" part is reducing the search space.
Binary Search and Ternary Search Algorithms are based on Decrease and Conquer technique. Because, you do not divide the problem, you actually decrease the problem by dividing by 2(3 in ternary search).
Merge Sort and Quick Sort Algorithms can be given as examples of Divide and Conquer technique. You divide the problem into two subproblems and use the algorithm for these subproblems again to sort an array. But, you discard the half of array in binary search. It means you DECREASE the size of array, not divide.
No, binary search is not divide and conquer. Yes, binary search is decrease and conquer. I believe divide and conquer algorithms have an efficiency of O(n log(n)) while decrease and conquer algorithms have an efficiency of O(log(n)). The difference being whether or not you need to evaluate both parts of the split in data or not.

Is it possible to calculate median of a list of numbers better than O(n log n)?

I know that it is possible to calculate the mean of a list of numbers in O(n). But what about the median? Is there any better algorithm than sort (O(n log n)) and lookup middle element (or mean of two middle elements if an even number of items in list)?
Yes. You can do it (deterministically) in O(n).
What you're talking about is a selection algorithm, where k = n/2. There is a method based on the same partitioning function used in quicksort which works. It is called, not surprisingly, quickselect. While it can, like quicksort, have a O(n2) worst case, this can be brought down to linear time using the proper pivot selection.
Partially irrelevant, but: a quick tip on how to quickly find answers to common basic questions like this on the web.
We're talking about medians? So Gg to the page about medians in wikipedia
Search page for algorithm:
Efficient computation of the sample median
Even though sorting n items takes in general O(n log n) operations, by using a "divide and conquer" algorithm the median of n items can be computed with only O(n) operations (in fact, you can always find the k-th element of a list of values with this method; this is called the selection problem).
Follow the link to the selection problem for the description of algorithm. Read intro:
... There are worst-case linear time selection algorithms. ...
And if you're interested read about the actual ingenious algorithm.
If the numbers are discrete (e.g. integers) and there is a manageable number of distinct values, you can use a "bucket sort" which is O(N), then iterate over the buckets to figure out which bucket holds the median. The complete calculation is O(N) in time and O(B) in space.
Just for fun (and who knows, it may be faster) there's another randomized median algorithm, explained technically in Mitzenmacher's and Upfall's book. Basically, you choose a polynomially-smaller subset of the list, and (with some fancy bookwork) such that it probably contains the real median, and then use it to find the real median. The book is on google books, and here's a link. Note: I was able to read the pages of the algorthm, so assuming that google books reveals the same pages to everyone, you can read them too.
It is a randomized algorithm s.t. if it finds the answer, it is 100% certain that it is the correct answer (this is called Las Vegas style). The randomness arises from the runtime --- occasionally (with probability 1/(sqrt(n)), I think) it FAILS to find the median, and must be re-run.
Asymptotically, it is exactly linear when you take into the chance of failure --- that is to say, it is a wee bit less than linear, exactly such that when you take into account the number of times you may need to re-run it, it becomes linear.
Note: I'm not saying this is better or worse --- I certainly haven't done a real-life runtime comparison between these algorithms! I'm simply presenting an additional algorithm that has linear runtime, but works in a significantly different way.
This link has popped up recently on calculating median: http://matpalm.com/median/question.html .
In general I think you can't go beyond O(n log n) time, but I don't have any proof on that :). No matter how much you make it parallel, aggregating the results into a single value takes at least log n levels of execution.
Try the randomized algorithm, the sampling size (e.g. 2000) is independent from the data size n, still be able to get sufficiently high (99%) accuracy. If you need higher accuracy, just increase sampling size. Using Chernoff bound can proof the probability under a certain sampling size. I've write some JavaScript Code to implement the algorithm, feel free to take it. http://www.sfu.ca/~wpa10

Resources