Linear time reduction of languages in class P, complexity implications - algorithm

I am having problem understanding this topic of P and NP reductions. I understand that if language L1 can be reduced to Language L2 in linear time and L2 is in P, this implies that L1 is in P. But if we know L2 has a time complexity of lets say theta(n log n), can we say that L1 runs in O(n log n)? since the reduction from L1 to L2 is in linear time and L2 runs in theta(n log n) and so it will be O(n) + theta(n log n). And also lets say L2 can be also linearly reduced to L3, we can say L3 runs in omega(n log n)?

tl;dr: Yes. And yes in case you mean big Omega.
The first part is correct: If you can decide L2 in Theta((n*log(n))) which implies it can be done in O(n*log(n)) and you can reduce L1 to L2 in O(n), then you can also decide for L1 in O(n*log(n)) with exactly the argument you made. (Note: this does not mean, that you can't possibly decide L1 in less than this - there might be an algorithm to solve L1 in O(n). It's only an upper bound...)
However, the second part is not correct. If you can reduce L2 to L3, then you can say nothing about L3s running time not matter what the running time of the reduction from L2 to L3 is. (Update: this only shows that L3 might be harder, not more) L3 might be a very hard problem, like SAT for instance. It then is very likely that you can reduce L2 to it, i.e. that you can solve L2 with 'rephrasing' (a reduction) the problem + a SAT-solver - still SAT is NP-complete.
DISCLAIMER: as noted in the comments by DavidRicherby the second part of my answer is wrong as it stands - # uchman21 you were right, L3 has to be in Omega(n*log(n)) (note the upper case!):
If we know the complexity of L2 is Theata(n*log(n)) (upper and lower bounds, O(n*log(n)) and Omega(n*log(n))) and we can reduce L2 to L3 in time O(n), then L3 is at least as hard as L2 - because we know there is no algorithm which can solve the problem L2 faster than Omega(n*log(n)). However, if L3 was faster, that is in o(n*log(n)), then the algorithm 'reduction+solve_L3' runs in O(n)+o(n*log(n)) which still is in o(n*log(n)) and it solves L2 - contradiction. Hence, L3 has to be in Omega(n*log(n)).

Related

What is the time complexity of clearing a heap?

I have googled for lots of websites and they all say "the time complexity of clearing a heap is O(n log n)." The reason is:
Swapping the tailing node the root costs O(1).
Swapping "the new root" to suitable place costs O(level) = O(log n).
So deleting a node (the root) costs O(log n).
So deleting all n nodes costs O(n log n).
In my opinion, the answer is right but not "tight" because:
The heap (or its level) becoming smaller during deleting.
As a result, the cost of "swapping the new root to suitable place" becomes smaller.
The aforementioned reason of "O(n log n)" does not embody such change.
The time complexity of creating a heap is proved as O(n) at here.
I tend to believe that the time complexity of clearing a heap is O(n) as well because creating and clearing is very similar - both contain "swapping node to suitable position" and "change of heap size".
However, when considering O(n) time for clearing a heap, here is a contradiction:
By creating and clearing a heap, it is possible to sort an array in O(n) time.
The lower limit of time complexity of sorting is O(n log n).
I have thought about the question for a whole day but still been confused.
What on earth clearing a heap costs? Why?
As you correctly observe, the time taken is O((log n) + (log n-1) + ... + (log 2) + (log 1)). That's the same as O(log(n!)), which is the same as O(n log n) (proof in many places, but for example: What is O(log(n!)) and O(n!) and Stirling Approximation).
So you're right that the argument given for the time complexity of removing every element of a heap being O(nlog n) is wrong, but the result is still right.
Your equivalence between creating and "clearing" the heap is wrong. When you create the heap, there's a lot of slack because the heap invariant allows many choices at every level and this happens to mean that it's possible to find a valid ordering of the elements in O(n) time. When "clearing" the heap, there's no such slack (and the standard proof about comparison sorts needing at least n log n time proves that it's not possible).

How to account for cache misses in estimating performance?

Generally performance is given in terms of O() Order of magnitude: O(Magnitude)+K where the K is generally ignored as it applies mainly to smaller Ns.
But more and more I have seen performance dominated by underlying data size, but this is not part of algorithmic complexity
Assuming algorithm A is O(logN) but uses O(N) space and algorithm B is O(N) but uses O(logN) It used to be the case that algorithm A was faster. Now with cache misses in multi-tiered caches, it is likely that algorithm B will be faster for large numbers and possibly small numbers if it has a smaller K
The problem is how do you represent this?
Well, the use of O(N) nomenclature abstracts away some important details that generally are only insignificant as N approaches infinity. Those details can and often are the most significant factors at values of N less than infinity. To help explain, consider that if a term is listed as O(N^x), it is only specifying the most significant factor of N. In reality, the performance could be characterized as:
aN^x + bN^(x-1) +cN^(x-2) + ... + K
So as N approaches infinity, the dominant term becomes N^x, but clearly at values of N that are less than infinity the dominant term could be one of the lesser terms. Looking at your examples, you give two algorithms. Let's call algorithm A the one that provides O(N) performance, and the one that provides O(logN) performance we'll call algorithm B. In reality, these two algorithms have performance characteristics as follows:
Performance A = aN + b(log N) + c
Performance B = x(log N) + y
If your constant values are a=0.001 and x=99,999, you can see how A provides better performance than B. In addition, you mention that one algorithm increases the likelihood of a cache miss, and that likelihood depends on the size of the data. You'll need to figure out the likelihood of the cache miss as a function of the data size and use that as a factor when calculating the O performance of the overall algorithm. For example:
If the cost of a cache miss is CM (we'll assume it's constant), then for algorithm A the overall cache performance is F(N)CM. If that cache performance is a factor in the dominant loop of algorithm A (the O(log N) part), then the real performance characteristic of algorithm A is O( F(N)(log N)). For algorithm B the overall cache performance would be F(log N)*CM. If the cache miss manifests during the dominant loop of algorithm B, then the real performance of algorithm B is O(F(log N)*N). As long as you can determine F(), you can then compare algorithm A and B.
cache misses do not take into account in big O notation since they are constant factors.
Even if you pessimistically assume every array seek is going to be a cache miss, and let's say a cache miss takes 100 cycles (this time is constant, since we are assuming Random Access Memory), than iterating the array of length n is going to take 100*n cycles for the cache misses (+ overhead for loop and control), and in general terms it remains O(n).
One reason big O is used so often is because it is platform independent (well, when speaking about RAM machines at least). If we would have took cache misses into account, the result would have been different for each platform.
If you are looking for a theoretic notation that takes constants into account - you are looking for tilde notation.
Also, that's why "big O notation" is seldom enough for large scale, or time critical systems, and these are constantly profiled to find bottlenecks which will be improved locally by the developers, so if you're looking for real performance - do it empirically, and don't settle for theoretic notations.

Why is the constant factor of quicksort better than that of heapsort?

According to my calculation:
Quicksort cost = n + (n/2 + n/2) + (n/4 + n/4 + n/4 + n/4) + ... = n * log(n) = log(nn)
Heapsort cost = sum [log(i)] for i = n, n-1, n-2, ..., 1 = log(n!)
Why it is said quicksort has better constant factor than heapsort and therefore quick sort is better than heapsort in average? Isn't log(nn) > log(n!)?
I think the issue here is that your analysis of quicksort and heapsort is not precise enough to show why the constant factors would be different.
You can indeed show that on average, quicksort will do more comparisons than heapsort (roughly 1.44 n log2 n for quicksort versus n log2 n versus heapsort). However, comparisons are not the only determining factor in the runtime of heapsort and quicksort.
The main reason quicksort is faster is locality of reference. Due to the way that memory caches work, array accesses in locations that are adjacent to one another tend to be much, much faster than array accesses scattered throughout an array. In quicksort, the partitioning step typically does all its reads and writes at the ends of the arrays, so the array accesses are closely packed toward one another. Heapsort, on the other hand, jumps around the array as it moves up and down the heap. Therefore, the array accesses in quicksort, on average, take much less time than the array accesses in heapsort. The difference is large enough that the constant factor in front of the n log n term in quicksort is lower than the constant factor in front of the n log n term in heapsort, which is one reason why quicksort is much faster than heapsort.
In short - if all we care about are comparisons, heapsort is a better choice than quicksort. But since memory systems use caches and cache misses are expensive, quicksort is usually a much better option.
Also, note that log(nn) = n log n and log (n!) = n log n - n + O(log n) via Stirling's approximation. This means that log (n!) is not much smaller than n log n, even as n gets very large. There definitely is a difference, but it's not large enough to make a huge dent on its own.
Hope this helps!
Here are paragraphs from Steven S. Skiena's The Algorithm Design Manual, which talking about the speed between the three O(nlogn) sorting algortihms:
But how can we compare two Θ(n log n) algorithms to decide which is
faster?How can we prove that quicksort is really quick? Unfortunately,
the RAM model and Big Oh analysis provide too coarse a set of tools to
make that type of distinction. When faced with algorithms of the same
asymptotic complexity, implementation details and system quirks such
as cache performance and memory size may well prove to be the decisive
factor.
What we can say is that experiments show that where a
properly implemented quicksort is implemented well, it is typically
2-3 times faster than mergesort or heapsort. The primary reason is
that the operations in the innermost loop are simpler. But I can’t
argue with you if you don’t believe me when I say quicksort is faster.
It is a question whose solution lies outside the analytical tools we
are using. The best way to tell is to implement both algorithms and
experiment.
-4.6.3 "Is Quicksort Really Quick?",The Algorithm Design Manual

Decision Problems Reduction in Polynomial Time

I just have a quick question. If we have two decisions problems, say L1 and L2. If L1 and can be reduced to L2 in polynomial time, then is it true that L2 CANNOT be reduced to L1 in polynomial time?
My understanding is that this would mean:
L1 can be reduced to L2 in polynomial time => NOT (L2 can be reduced to L1 in polynomial time)
=(L1 not in P) & (L2 in P) => (L1 in P) & (L2 not in P)
=[(L1 in P) OR (L2 not in P)] OR [(L1 in P) & L2 in P)]
=(L1 in P) OR (L2 not in P)
So the statement that L1 can be reduced to L2 in polytime implies that L2 cannot be reduced to L1 in polytime is only true if L1 is in P or if L2 is not in P. As in there, if that is not the case, then the statement is false.
Does my logic make sense or am I way off? Any advice or help would be much appreciated. Thank you!
The general statement "if L1 poly-time reduces to L2, then L2 does not reduce to L1" is in general false. Any two problems in P (except for ∅ and Σ*) are poly-time reducible to one another: just solve the problem in polynomial time and output a yes or no answer as appropriate.
Your particular logic is incorrect because polynomial-time reducibility between two problems does not guarantee anything about whether the languages are in P or not. For example, the halting problem is polynomial-time reducible to the problem of whether a TM accepts a given string, but neither problem is in P because neither problem is decidable.
Hope this helps!

PRAM( parallel) algo for Merge sort

I was in the middle of reading Multithreaded merge sort in Introduction to algorithm 3rd edition. However I am confused with the number of processors required for the following Merge-Sort algo:
MERGE-SORT(A, p, r)
1 if p < r
2 q = (p+r)/2
3 spawn MERGE-SORT(A, p, q)
4 MERGE-SORT(A, q + 1, r)
5 sync
6 MERGE(A, p, q, r)
the MERGE is the standard merge algorithm. Now what is the number of processor required for this algorithm ?? Though i am assuming it should be O(N) but the book is claiming it to be O(log n), why? Note i am not multithreading the MERGE procedure. an explaination with an example will be really helpful. Thanks in advance.
The O(log n) value is not the number of CPUs "required" to run the algorithm, but the actual "parallelism" achieved by the algorithm. Because MERGE itself is not parallelized, you don't get the full benefit if O(n) processors even if you have them all available.
That is, the single-threaded, serial time complexity for merge sort is O(n log n). You can think of 'n' as the cost of merge and 'log n' as the factor that counts in the recursive invocations of merge sort to get the array to a stage where you can merge it. When you parallelize the recursion, but merge is still serial, you save the O(log n) factor but the O(n) factor stays there. Therefore the parallelism is of the order O(log n) when you have enough processors available, but you can't get to O(n).
In yet other words, even if you have O(n) CPUs available, most of them fall idle very soon and less and less CPUs work when the large MERGEs start to take place.

Resources