I want to sort a list of n items from pairwise comparisons. Each round, I receive k comparisons, one each from k different "arbiters".
The arbiters cannot coordinate, and must choose their comparisons independently from myself and each other. How should they choose their comparisons so that I can sort the list of items in as few rounds as possible?
A naive solution is that each arbiter independently runs quicksort, sending over the corresponding comparisons they make. Ultimately, I'd just be waiting for one arbiter to finish sorting, so this would take O(n*log(n)) rounds for me to sort the list, and I literally receive no benefit from having k arbiter over just a single arbiter.
Another naive solution is each arbiter independently sends over random comparisons. This would result in a coupon collector problem, and would taken on average O(n^2*log(n)/k) rounds for me to get the right comparisons to sort the list. But unless k is in ω(n), this run-time isn't better than O(n*log(n)).
Is there a better solution? Maybe one that uses O(n*log(n)/k) rounds? (i.e. double the arbiters = half the rounds needed)
To be more concrete about the independence of arbiters: ideally, the arbiters would use symmetric randomized strategies. If that's not possible, though, then I'll allow the arbiters to have a strategy meeting one time only at the start.
Also, arbiters have to send exactly the comparisons they make. E.g. an arbiter cannot just sort the entire list themselves, and then send only the comparisons (arr[i] < arr[i+1]) for i=0 to n-2. They have to send each comparison they make as they make it.
I think I have a solution. It's totally symmetric with no coordination.
Each arbiter runs the following algorithm to choose their comparisons.
First, they consider evenly dividing n into k "sections" (e.g. section 1 is from 1 to n/k, section 2 is from n/k+1 to 2n/k, etc.). Then, they randomly choose one of k sections to focus on. Next, they use quickselect to get all the elements that will end up in their section (which takes O(n) comparisons). Finally, they fully sort their section (which takes O(n*log(n)/k) comparisons). After that, they randomly choose another section and repeat.
To get all k sections sorted, it's a coupon collector problem, and takes k*log(k) tries, which, when distributed over k arbiters, will only take log(k) tries.
So the final number of rounds of this approach will be O(n*log(k) + n*log(n)*log(k)/k) = O(n*log(n)*log(k)/k), which gives you an improvement factor of log(k)/k. This can be upgraded to 1/k if the arbiters are allowed to choose at the start which sections they're in charge of.
If they can coordinate their own numbers.
Here's an idea. First, partition the array in k parts as for quicksort. First have everybody partition the entire array in 2 synchronously. Then let the first half of the arbiters partition the first half of the array while the second half of the arbiters partition the second half of the array, and so on. This takes no more than 2n rounds. Next, let arbiter number i sort the partition number i. This takes (n/k) log (n/k) rounds.
If they cannot coordinate their own numbers.
The same idea but it takes longer. You still have them partition the array in k parts, only now they have to do it all the way down synchronously, so you don't benefit from having k arbiters at this stage. This takes n log k rounds. Then let each arbiter sort each partition in randomly chosen order. I'm too lazy to calculate the expected number of rounds here. Off the top of my head, you will need about e (n/k) log (n/k) rounds, but don't quote me.
Another way if they cannot coordinate their own numbers.
Almost the same as above, but have each arbiter partition in a random order (first partition in half, then randomly choose which half to partition further etc). Sort as soon as there is a partition of size n/k, then randomly select what to partition next. This is harder to analyse than the above but should be faster.
Related
I've been comparing the run times of various pivot selection algorithms. Surprisingly the simplest one where the first element is always chosen is the fastest. This may be because I'm filling the array with random data.
If the array has been randomized (shuffled) does it matter? For example picking the medium of 3 as the pivot is always(?) better than picking the first element as the pivot. But this isn't what I've noticed. Is it because if the array is already randomized there would be no reason to assume sortedness, and using the medium is assuming there is some degree of sortedness?
The worst case runtime of quicksort is O(n²). Quicksort is only in average case a fast sorting algorithm.
To reach a average runtime of O(n log n) you have to choose a random pivot element.
But instead of choosing a random pivot element, you can shuffle the list and choose the first element.
To see that this holds you can look at this that way: lets say all elements are in a specific order. Shuffling means you use a random permutation on the list of elements, so a random element will be at the first position and also on all other positions. You can also see it by shuffling the list by randomly choose one of all elements for the first element, then choosing randomly one element of the other (not yet coosen elements) for the second element, and so on.
If your list is already a random generated list, you can directly choose the first element as pivot, without shuffling again.
So, choosing the first element is the fastest one because of the random generated input, but choosing the thrid or the last will also as fast as choosing the first.
All other ways to choose a pivot element have to compute something (a median or a random number or something like this), but they have no advantage over a random choice.
A substantially late response, but I believe it will add some additional info.
Surprisingly the simplest one where the first element is always chosen
is the fastest.
This is actually not surprisingly at all, since you mentioned that you test the algorithm with the random data. In the reality, a percentage of almost-sorted and sorted data is much greater than it would statistically be expected. Take for example the chronological data, when you collect it into the log file some elements can be out of order, but most of them are already sorted. Unfortunately, the Quicksort implementation that takes first (or last) element as a pivot is vulnerable to such input and it degenerates into O(n^2) complexity because in the partition step you divide your array into two halves of size 1 and n-1 and therefore you get n partitions instead of log n, on average.
That's why people decided to add some sort of randomization that would make a probability of getting the problematic input as minimum as possible. There are three well-known approaches:
shuffle the input - to quote Robert Sedgewick, "the probability of getting O(n^2) performance with such approach is lower than the probability that you will be hit by a thunderstrike" :)
choose the pivot element randomly - Wikipedia says that in average, expected number of comparisons in this case is 1.386 n log n
choose the pivot element as a median of three - Wikipedia says that in average, expected number of comparisons in this case is 1.188 n log n
However, randomization costs. If you shuffle the input array, that is O(n) which is dominated by O(nlogn), but you need to take in the account the cost of invoking random(..) method n times. With your simple approach, that is avoided and it is thus faster.
See also:
Worst case for Quicksort - when can it occur?
Given an array where number of occurrences of each number is odd except one number whose number of occurrences is even. Find the number with even occurrences.
e.g.
1, 1, 2, 3, 1, 2, 5, 3, 3
Output should be:
2
The below are the constraints:
Numbers are not in range.
Do it in-place.
Required time complexity is O(N).
Array may contain negative numbers.
Array is not sorted.
With the above constraints, all my thoughts failed: comparison based sorting, counting sort, BST's, hashing, brute-force.
I am curious to know: Will XORing work here? If yes, how?
This problem has been occupying my subway rides for several days. Here are my thoughts.
If A. Webb is right and this problem comes from an interview or is some sort of academic problem, we should think about the (wrong) assumptions we are making, and maybe try to explore some simple cases.
The two extreme subproblems that come to mind are the following:
The array contains two values: one of them is repeated an even number of times, and the other is repeated an odd number of times.
The array contains n-1 different values: all values are present once, except one value that is present twice.
Maybe we should split cases by complexity of number of different values.
If we suppose that the number of different values is O(1), each array would have m different values, with m independent from n. In this case, we could loop through the original array erasing and counting occurrences of each value. In the example it would give
1, 1, 2, 3, 1, 2, 5, 3, 3 -> First value is 1 so count and erase all 1
2, 3, 2, 5, 3, 3 -> Second value is 2, count and erase
-> Stop because 2 was found an even number of times.
This would solve the first extreme example with a complexity of O(mn), which evaluates to O(n).
There's better: if the number of different values is O(1), we could count value appearances inside a hash map, go through them after reading the whole array and return the one that appears an even number of times. This woud still be considered O(1) memory.
The second extreme case would consist in finding the only repeated value inside an array.
This seems impossible in O(n), but there are special cases where we can: if the array has n elements and values inside are {1, n-1} + repeated value (or some variant like all numbers between x and y). In this case, we sum all the values, substract n(n-1)/2 from the sum, and retrieve the repeated value.
Solving the second extreme case with random values inside the array, or the general case where m is not constant on n, in constant memory and O(n) time seems impossible to me.
Extra note: here, XORing doesn't work because the number we want appears an even number of times and others appear an odd number of times. If the problem was "give the number that appears an odd number of times, all other numbers appear an even number of times" we could XOR all the values and find the odd one at the end.
We could try to look for a method using this logic: we would need something like a function, that applied an odd number of times on a number would yield 0, and an even number of times would be identity. Don't think this is possible.
Introduction
Here is a possible solution. It is rather contrived and not practical, but then, so is the problem. I would appreciate any comments if I have holes in my analysis. If this was a homework or challenge problem with an “official” solution, I’d also love to see that if the original poster is still about, given that more than a month has passed since it was asked.
First, we need to flesh out a few ill-specified details of the problem. Time complexity required is O(N), but what is N? Most commentators appear to be assuming N is the number of elements in the array. This would be okay if the numbers in the array were of fixed maximum size, in which case Michael G’s solution of radix sort would solve the problem. But, I interpret constraint #1, in absence of clarification by the original poster, as saying the maximum number of digits need not be fixed. Therefore, if n (lowercase) is the number of elements in the array, and m the average length of the elements, then the total input size to contend with is mn. A lower bound on the solution time is O(mn) because this is the read-through time of the input needed to verify a solution. So, we want a solution that is linear with respect to total input size N = nm.
For example, we might have n = m, that is sqrt(N) elements of sqrt(N) average length. A comparison sort would take O( log(N) sqrt(N) ) < O(N) operations, but this is not a victory, because the operations themselves on average take O(m) = O(sqrt(N)) time, so we are back to O( N log(N) ).
Also, a radix sort would take O(mn) = O(N) if m were the maximum length instead of average length. The maximum and average length would be on the same order if the numbers were assumed to fall in some bounded range, but if not we might have a small percentage with a large and variable number of digits and a large percentage with a small number of digits. For example, 10% of the numbers could be of length m^1.1 and 90% of length m*(1-10%*m^0.1)/90%. The average length would be m, but the maximum length m^1.1, so the radix sort would be O(m^1.1 n) > O(N).
Lest there be any concern that I have changed the problem definition too dramatically, my goal is still to describe an algorithm with time complexity linear to the number of elements, that is O(n). But, I will also need to perform operations of linear time complexity on the length of each element, so that on average over all the elements these operations will be O(m). Those operations will be multiplication and addition needed to compute hash functions on the elements and comparison. And if indeed this solution solves the problem in O(N) = O(nm), this should be optimal complexity as it takes the same time to verify an answer.
One other detail omitted from the problem definition is whether we are allowed to destroy the data as we process it. I am going to do so for the sake of simplicity, but I think with extra care it could be avoided.
Possible Solution
First, the constraint that there may be negative numbers is an empty one. With one pass through the data, we will record the minimum element, z, and the number of elements, n. On a second pass, we will add (3-z) to each element, so the smallest element is now 3. (Note that a constant number of numbers might overflow as a result, so we should do a constant number of additional passes through the data first to test these for solutions.) Once we have our solution, we simply subtract (3-z) to return it to its original form. Now we have available three special marker values 0, 1, and 2, which are not themselves elements.
Step 1
Use the median-of-medians selection algorithm to determine the 90th percentile element, p, of the array A and partition the array into set two sets S and T where S has the 10% of n elements greater than p and T has the elements less than p. This takes O(n) steps (with steps taking O(m) on average for O(N) total) time. Elements matching p could be placed either into S or T, but for the sake of simplicity, run through array once and test p and eliminate it by replacing it with 0. Set S originally spans indexes 0..s, where s is about 10% of n, and set T spans the remaining 90% of indexes s+1..n.
Step 2
Now we are going to loop through i in 0..s and for each element e_i we are going to compute a hash function h(e_i) into s+1..n. We’ll use universal hashing to get uniform distribution. So, our hashing function will do multiplication and addition and take linear time on each element with respect to its length.
We’ll use a modified linear probing strategy for collisions:
h(e_i) is occupied by a member of T (meaning A[ h(e_i) ] < p but is not a marker 1 or 2) or is 0. This is a hash table miss. Insert e_i by swapping elements from slots i and h(e_i).
h(e_i) is occupied by a member of S (meaning A[ h(e_i) ] > p) or markers 1 or 2. This is a hash table collision. Do linear probing until either encountering a duplicate of e_i or a member of T or 0.
If a member of T, this is a again a hash table miss, so insert e_i as in (1.) by swapping to slot i.
If a duplicate of e_i, this is a hash table hit. Examine the next element. If that element is 1 or 2, we’ve seen e_i more than once already, change 1s into 2s and vice versa to track its change in parity. If the next element is not 1 or 2, then we’ve only seen e_i once before. We want to store a 2 into the next element to indicate we’ve now seen e_i an even number of times. We look for the next “empty” slot, that is one occupied by a member of T which we’ll move to slot i, or a 0, and shift the elements back up to index h(e_i)+1 down so we have room next to h(e_i) to store our parity information. Note we do not need to store e_i itself again, so we’ve used up no extra space.
So basically we have a functional hash table with 9-fold the number of slots as elements we wish to hash. Once we start getting hits, we begin storing parity information as well, so we may end up with only 4.5-fold number of slots, still a very low load factor. There are several collision strategies that could work here, but since our load factor is low, the average number of collisions should be also be low and linear probing should resolve them with suitable time complexity on average.
Step 3
Once we finished hashing elements of 0..s into s+1..n, we traverse s+1..n. If we find an element of S followed by a 2, that is our goal element and we are done. Any element e of S followed by another element of S indicates e was encountered only once and can be zeroed out. Likewise e followed by a 1 means we saw e an odd number of times, and we can zero out the e and the marker 1.
Rinse and Repeat as Desired
If we have not found our goal element, we repeat the process. Our 90th percentile partition will move the 10% of n remaining largest elements to the beginning of A and the remaining elements, including the empty 0-marker slots to the end. We continue as before with the hashing. We have to do this at most 10 times as we process 10% of n each time.
Concluding Analysis
Partitioning via the median-of-medians algorithm has time complexity of O(N), which we do 10 times, still O(N). Each hash operation takes O(1) on average since the hash table load is low and there are O(n) hash operations in total performed (about 10% of n for each of the 10 repetitions). Each of the n elements have a hash function computed for them, with time complexity linear to their length, so on average over all the elements O(m). Thus, the hashing operations in aggregate are O(mn) = O(N). So, if I have analyzed this properly, then on whole this algorithm is O(N)+O(N)=O(N). (It is also O(n) if operations of addition, multiplication, comparison, and swapping are assumed to be constant time with respect to input.)
Note that this algorithm does not utilize the special nature of the problem definition that only one element has an even number of occurrences. That we did not utilize this special nature of the problem definition leaves open the possibility that a better (more clever) algorithm exists, but it would ultimately also have to be O(N).
See the following article: Sorting algorithm that runs in time O(n) and also sorts in place,
assuming that the maximum number of digits is constant, we can sort the array in-place in O(n) time.
After that it is a matter of counting each number's appearences, which will take in average n/2 time to find one number whose number of occurrences is even.
Is anybody able to give a 'plain english' intuitive, yet formal, explanation of what makes QuickSort n log n? From my understanding it has to make a pass over n items, and it does this log n times...Im not sure how to put it into words why it does this log n times.
Complexity
A Quicksort starts by partitioning the input into two chunks: it chooses a "pivot" value, and partitions the input into those less than the pivot value and those larger than the pivot value (and, of course, any equal to the pivot value have go into one or the other, of course, but for a basic description, it doesn't matter a lot which those end up in).
Since the input (by definition) isn't sorted, to partition it like that, it has to look at every item in the input, so that's an O(N) operation. After it's partitioned the input the first time, it recursively sorts each of those "chunks". Each of those recursive calls looks at every one of its inputs, so between the two calls it ends up visiting every input value (again). So, at the first "level" of partitioning, we have one call that looks at every input item. At the second level, we have two partitioning steps, but between the two, they (again) look at every input item. Each successive level has more individual partitioning steps, but in total the calls at each level look at all the input items.
It continues partitioning the input into smaller and smaller pieces until it reaches some lower limit on the size of a partition. The smallest that could possibly be would be a single item in each partition.
Ideal Case
In the ideal case we hope each partitioning step breaks the input in half. The "halves" probably won't be precisely equal, but if we choose the pivot well, they should be pretty close. To keep the math simple, let's assume perfect partitioning, so we get exact halves every time.
In this case, the number of times we can break it in half will be the base-2 logarithm of the number of inputs. For example, given 128 inputs, we get partition sizes of 64, 32, 16, 8, 4, 2, and 1. That's 7 levels of partitioning (and yes log2(128) = 7).
So, we have log(N) partitioning "levels", and each level has to visit all N inputs. So, log(N) levels times N operations per level gives us O(N log N) overall complexity.
Worst Case
Now let's revisit that assumption that each partitioning level will "break" the input precisely in half. Depending on how good a choice of partitioning element we make, we might not get precisely equal halves. So what's the worst that could happen? The worst case is a pivot that's actually the smallest or largest element in the input. In this case, we do an O(N) partitioning level, but instead of getting two halves of equal size, we've ended up with one partition of one element, and one partition of N-1 elements. If that happens for every level of partitioning, we obviously end up doing O(N) partitioning levels before even partition is down to one element.
This gives the technically correct big-O complexity for Quicksort (big-O officially refers to the upper bound on complexity). Since we have O(N) levels of partitioning, and each level requires O(N) steps, we end up with O(N * N) (i.e., O(N2)) complexity.
Practical implementations
As a practical matter, a real implementation will typically stop partitioning before it actually reaches partitions of a single element. In a typical case, when a partition contains, say, 10 elements or fewer, you'll stop partitioning and and use something like an insertion sort (since it's typically faster for a small number of elements).
Modified Algorithms
More recently other modifications to Quicksort have been invented (e.g., Introsort, PDQ Sort) which prevent that O(N2) worst case. Introsort does so by keeping track of the current partitioning "level", and when/if it goes too deep, it'll switch to a heap sort, which is slower than Quicksort for typical inputs, but guarantees O(N log N) complexity for any inputs.
PDQ sort adds another twist to that: since Heap sort is slower, it tries to avoid switching to heap sort if possible To to that, if it looks like it's getting poor pivot values, it'll randomly shuffle some of the inputs before choosing a pivot. Then, if (and only if) that fails to produce sufficiently better pivot values, it'll switch to using a Heap sort instead.
Each partitioning operation takes O(n) operations (one pass on the array).
In average, each partitioning divides the array to two parts (which sums up to log n operations). In total we have O(n * log n) operations.
I.e. in average log n partitioning operations and each partitioning takes O(n) operations.
There's a key intuition behind logarithms:
The number of times you can divide a number n by a constant before reaching 1 is O(log n).
In other words, if you see a runtime that has an O(log n) term, there's a good chance that you'll find something that repeatedly shrinks by a constant factor.
In quicksort, what's shrinking by a constant factor is the size of the largest recursive call at each level. Quicksort works by picking a pivot, splitting the array into two subarrays of elements smaller than the pivot and elements bigger than the pivot, then recursively sorting each subarray.
If you pick the pivot randomly, then there's a 50% chance that the chosen pivot will be in the middle 50% of the elements, which means that there's a 50% chance that the larger of the two subarrays will be at most 75% the size of the original. (Do you see why?)
Therefore, a good intuition for why quicksort runs in time O(n log n) is the following: each layer in the recursion tree does O(n) work, and since each recursive call has a good chance of reducing the size of the array by at least 25%, we'd expect there to be O(log n) layers before you run out of elements to throw away out of the array.
This assumes, of course, that you're choosing pivots randomly. Many implementations of quicksort use heuristics to try to get a nice pivot without too much work, and those implementations can, unfortunately, lead to poor overall runtimes in the worst case. #Jerry Coffin's excellent answer to this question talks about some variations on quicksort that guarantee O(n log n) worst-case behavior by switching which sorting algorithms are used, and that's a great place to look for more information about this.
Well, it's not always n(log n). It is the performance time when the pivot chosen is approximately in the middle. In worst case if you choose the smallest or the largest element as the pivot then the time will be O(n^2).
To visualize 'n log n', you can assume the pivot to be element closest to the average of all the elements in the array to be sorted.
This would partition the array into 2 parts of roughly same length.
On both of these you apply the quicksort procedure.
As in each step you go on halving the length of the array, you will do this for log n(base 2) times till you reach length = 1 i.e a sorted array of 1 element.
Break the sorting algorithm in two parts. First is the partitioning and second recursive call. Complexity of partioning is O(N) and complexity of recursive call for ideal case is O(logN). For example, if you have 4 inputs then there will be 2(log4) recursive call. Multiplying both you get O(NlogN). It is a very basic explanation.
In-fact you need to find the position of all the N elements(pivot),but the maximum number of comparisons is logN for each element (the first is N,second pivot N/2,3rd N/4..assuming pivot is the median element)
In the case of the ideal scenario, the first level call, places 1 element in its proper position. there are 2 calls at the second level taking O(n) time combined but it puts 2 elements in their proper position. in the same way. there will be 4 calls at the 3rd level which would take O(n) combined time but will place 4 elements into their proper position. so the depth of the recursive tree will be log(n) and at each depth, O(n) time is needed for all recursive calls. So time complexity is O(nlogn).
this is a homework question, and I'm not that at finding the complixity but I'm trying my best!
Three-way partitioning is a modification of quicksort that partitions elements into groups smaller than, equal to, and larger than the pivot. Only the groups of smaller and larger elements need to be recursively sorted. Show that if there are N items but only k unique values (in other words there are many duplicates), then the running time of this modification to quicksort is O(Nk).
my try:
on the average case:
the tree subroutines will be at these indices:
I assume that the subroutine that have duplicated items will equal (n-k)
first: from 0 - to(i-1)
Second: i - (i+(n-k-1))
third: (i+n-k) - (n-1)
number of comparisons = (n-k)-1
So,
T(n) = (n-k)-1 + Sigma from 0 until (n-k-1) [ T(i) + T (i-k)]
then I'm not sure how I'm gonna continue :S
It might be a very bad start though :$
Hope to find a help
First of all, you shouldn't look at the average case since the upper bound of O(nk) can be proved for the worst case, which is a stronger statement.
You should look at the maximum possible depth of recursion. In normal quicksort, the maximum depth is n. For each level, the total number of operations done is O(n), which gives O(n^2) total in the worst case.
Here, it's not hard to prove that the maximum possible depth is k (since one unique value will be removed at each level), which leads to O(nk) total.
I don't have a formal education in complexity. But if you think about it as a mathematical problem, you can prove it as a mathematical proof.
For all sorting algorithms, the best case scenario will always be O(n) for n elements because to sort n elements you have to consider each one atleast once. Now, for your particular optimisation of quicksort, what you have done is simplified the issue because now, you are only sorting unique values: All the values that are the same as the pivot are already considered sorted, and by virtue of its nature, quicksort will guarantee that every unique value will feature as the pivot at some point in the operation, so this eliminates duplicates.
This means for an N size list, quicksort must perform some operation N times (once for every position in the list), and because it is trying to sort the list, that operation is trying to find the position of that value in the list, but because you are effectively dealing with just unique values, and there are k of those, the quicksort algorithm must perform k comparisons for each element. So it performs Nk operations for an N sized list with k unique elements.
To summarise:
This algorithm eliminates checking against duplicate values.
But all sorting algorithms must look at every value in the list at least once. N operations
For every value in the list the operation is to find its position relative to other values in the list.
Because duplicates get removed, this leaves only k values to check against.
O(Nk)
How are algorithms analyzed? What makes quicksort have an O(n^2) worst-case performance while merge sort has an O(n log(n)) worst-case performance?
That's a topic for an entire semester. Ultimately we are talking about the upper bound on the number of operations that must be completed before the algorithm finishes as a function of the size of the input. We do not include the coeffecients (ie 10N vs 4N^2) because for N large enough, it doesn't matter anymore.
How to prove what the big-oh of an algorithm is can be quite difficult. It requires a formal proof and there are many techniques. Often a good adhoc way is to just count how many passes on the data the algorithm makes. For instance, if your algorithm has nested for loops, then for each of N items you must operate N times. That would generally be O(N^2).
As to merge sort, you split the data in half over and over. That takes log2(n). And for each split you make a pass on the data, which gives N log(n).
quick sort is a bit trickier because in the average case it is also n log (n). You have to imagine what happens if your partition splits the data such that every time you get only one element on one side of the partition. Then you will need to split the data n times instead of log(n) times which makes it N^2. The advantage of quicksort is that it can be done in place, and that we usually get closer to N log(n) performance.
This is introductory analysis of algorithms course material.
An operation is defined (ie, multiplication) and the analysis is performed in terms of either space or time.
This operation is counted in terms of space or time. Typically analyses are performed as Time being the dependent variable upon Input Size.
Example pseudocode:
foreach $elem in #list
op();
endfor
There will be n operations performed, where n is the size of #list. Count it yourself if you don't believe me.
To analyze quicksort and mergesort requires a decent level of what is known as mathematical sophistication. Loosely, you solve a discrete differential equation derived from the recursive relation.
Both quicksort and merge sort split the array into two, sort each part recursively, then combine the result. Quicksort splits by choosing a "pivot" element and partitioning the array into smaller or greater then the pivot. Merge sort splits arbitrarily and then merges the results in linear time. In both cases a single step is O(n), and if the array size halves each time this would give a logarithmic number of steps. So we would expect O(n log(n)).
However quicksort has a worst case where the split is always uneven so you don't get a number of steps proportional to the logarithmic of n, but a number of steps proportional to n. Merge sort splits exactly into two halves (or as close as possible) so it doesn't have this problem.
Quick sort has many variants depending on pivot selection
Let's assume we always select 1st item in the array as a pivot
If the input array is sorted then Quick sort will be only a kind of selection sort!
Because you are not really dividing the array.. you are only picking first item in each cycle
On the other hand merge sort will always divide the input array in the same manner, regardless of its content!
Also note: the best performance in divide and conquer when divisions length are -nearly- equal !
Analysing algorithms is a painstaking effort, and it is error-prone. I would compare it with a question like, how much chance do I have to get dealt two aces in a bridge game. One has to carefully consider all possibilities and must not overlook that the aces can arrive in any order.
So what one does for analysing those algorithms is going through an actual pseudo code of the algorithm and add what result a worst case situation would have. In the following I will paint with a large brush.
For quicksort one has to choose a pivot to split the set. In a case of dramatic bad luck the set splits in a set of n-1 and a set of 1 each time, for n steps, where each steps means inspecting n elements. This arrive at N^2
For merge sort one starts by splitting the sequence into in order sequences. Even in the worst case that means at most n sequences. Those can be combined two by two, then the larger sets are combined two by two etc. However those (at most) n/2 first combinations deal with extremely small subsets, and the last step deals with subsets that have about size n, but there is just one such step. This arrives at N.log(N)