On the efficiency of tries and radix sort - big-o

Radix sort's time complexity is O(kn) where n is the number of keys to be sorted and k is the key length. Similarly, the time complexity for the insert, delete, and lookup operations in a trie is O(k). However, assuming all elements are distinct, isn't k>=log(n)? If so, that would mean Radix sort's asymptotic time complexity is O(nlogn), equal to that of quicksort, and trie operations have a time complexity of O(logn), equal to that of a balanced binary search tree. Of course, the constant factors may differ significantly, but the asymptotic time complexities won't. Is this true, and if so, do radix sort and tries have other advantages over other algorithms and data structures?
Edit:
Quicksort and its competitors perform O(nlogn) comparisons; in the worst case each comparison will take O(k) time (keys differ only at last digit checked). Therefore, those algorithms take O(knlogn) time. By that same logic, balanced binary search tree operations take O(klogn) time.

Big O notation is not used that way, even if k>=log n for radix sorting, O(kn) means that your processing time will double if n doubles and so on, this is how you should use big-o notation.
One advantage of radix sort is that it's worst case is O(kn) (quicksort's O(n^2)) so radix sort is somehow more resistant to malicious input than quicksort. It can also be really fast in term of real perfomance, if you use bitwise operations, a power of 2 as a base and in-place msd-radix sort with insertion sort for smaller arrays.
The same argument is valid for tries, they are resistant to malicious input in the sense that insertion/search is O(k) in the worst case. Hashtables perform insertion/search in O(1) but with O(k) hashing and in the worst case O(N) insertion/search. Also, tries can store strings more efficiently.
Check Algorithmic Complexity Attacks

The asymptotic time complexity of Radix sort is O(NlogN) which is also the time complexity of Qucik sort. The advantage of Radix sort is that it's best, average and worst case performance is same where as the worst case performance of Quick sort is O(N^2). But it takes twice the sapce as required by Quick sort. So, if space complexity is not a problem then Radix sort is a better option.

Related

Best runtime for n-1 comparisons?

If an algorithm must make n-1 comparisons to find a certain element, then can we assume that best possible runtime of the algorithm is O(n)?
I know that the lower bound for sorting algorithms is nlogn but since we only return the found one element, I figured it would be possible to do better in terms of run time?
Thanks!
To find a certain element in an unsorted list you need O(n).
But if you sort the array (takes O(n log n) in general) you can find a certain element in O(log n).
So if you want to find often elements in the same list it is most likely worth to sort the list to then be able to find elements much more efficient.
If your array is unsorted and you find some element in it then in worst case Linear search algorithm make n-1 comparisons and time complexity will be O(n).
But if you want to reduce your time complexity then first sort your array and use Binary search algorithm it is take O(logn) in worst case.
So Binary search algorithm is more efficient then linear search.
For unsorted elements, worst case is when you have to go over all the elements, i.e., O(N). If you need many look-ups then you have several pre-processing alternatives that speed up all future accesses.
Option 1: put the elements in a standard hash table. Creating the hash table costs O(N), on average, and later pay O(1) on average for each lookup. This assumes that a reasonable hash-function can be created for this type of elements.
Most languages/libraries implement bucket-based hash-tables, which in pathological cases can put all elements in one bucket, costing O(N) per lookup.
Option 2: there are other hash-table implementations that don't suffer from pathological O(N) cases. The Robin Hood hashing (Wikipedia) (more at Programming.Guide) guarantees O(log N) lookup in the worst case, with average of O(1).
Option 3: another option is to sort elements in O(N log N) once, and then use binary-search to lookup in O(log N). Usually this is slower than Robin Hood hashing (Option 2).
Option 4: If the values are simple integers with limited range, with max-min around N, then it is possible to put the values in an array (list), such that array[value-min] will contain a count of how many times the value appears in the input. It costs O(N) to construct, and O(1) to lookup. Better, the constants for both preprocessing and lookup are significantly lower than in any other method.
Note: I didn't mention the O(N) counting-sort as an alternative to the general case of O(N log N) sorting (option 3), since if max(value)-min(value) is small enough for counting-sort, then option 4 is relevant and is simpler and faster.
If applicable, choose option 4, otherwise if you wish to invest time and code then choose option 2. If 4 isn't applicable, and 2 is not worth the effort in your case, then choose option 2 if you don't mind the pathological worst-case (never choose option 2 when an adversary may want to harm you in a DOS attack).
Your question has nothing to do with sorting, let alone linear search.
If you claim that n-1 comparisons are mandated, then your problem has certainly complexity Ω(n). But with that information alone, you can't guarantee O(n) because it is not said that these n-1 comparisons are sufficient, nor that the algorithm does not perform extra operations, for instance to decide which comparisons to perform. It could turn out that your algorithm is O(n³) with no chance to do better, but we can't tell.
Best case complexity: Ω(n).
Worst case complexity: unknown.

Sorting algorithm with quadratic and linear runtime

Is it possible to make a sorting algorithm
That its running time at worst case is quadratic => n^2
But in most cases
(That is, on more than half of the n-size inputs)
the run time will be linear => n ??
I was thinking about Radix Sort and just make the worst case, worse
But I do not know if it is possible.
yes, the Bucket Sort
https://www.geeksforgeeks.org/bucket-sort-2/?ref=lbp
you can reed abut the algorithm in the link
In the worst case, we could have a bucket which contains all n values of the
array. Since insertion sort has worst case running time O(n^2), so does Bucket
sort. We can avoid this by using merge sort to sort each bucket instead, which
has worst case running time O(n lg n).
Yes, it is possible.
Bucket sort analysis does reveal such behaviour (with reasonable number of buckets).

Time Complexity to Sort a K Sorted Array Using Quicksort Algorithm

Problem:
I have to analyze the time complexity to sort (using Quick sort) a list of integer values which are almost sorted.
What I have done?
I have read SO Q1, SO Q2, SO Q3 and this one.
However, I have not found anything which mentioned explicitly the time complexity to sort a k sorted array using Quick sort.
Since the time complexity of Quick sort algorithm depends on the strategy of choosing pivot and there is a probability to face the worst case due to having almost sorted data, to avoid worst case, I have used median of three values(first, middle, last) as a pivot as referred here.
What do I think?
Since in average case, the time complexity of Quick sort algorithm is O(n log(n)) and as mentioned here, "For any non trivial value of n, a divide and conquer algorithm will need many O(n) passes, even if the array be almost completely sorted",
I think the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)), if the worst case does not occur.
My Question:
Am I right that the time complexity to sort a k sorted array using Quick sort algorithm is O(n log(n)) if I try to avoid worst case selecting a proper pivot and if the worst case does not occur.
When you say time complexity of Quick Sort, it is O(n^2), because the worst case is assumed by default. However, if you use another strategy to choose pivot, like Randomized Quick Sort, for example, your time complexity is still going to be O(n^2) by default. But the expected time complexity is O(n log(n)), since the occurrence of the worst case is highly unlikely. So if you can prove somehow that the worst case is 100% guaranteed not to happen, then you can say time complexity is less than O(n^2), otherwise, by default, the worst case is considered, no matter how unlikely.

Why quick sort is considered as fastest sorting algorithm?

Quick sort has worst case time complexity as O(n^2) while others like heap sort and merge sort has worst case time complexity as O(n log n) ..still quick sort is considered as more fast...Why?
On a side note, if sorting an array of integers, then counting / radix sort is fastest.
In general, merge sort does more moves but fewer compares than quick sort. The typical implementation of merge sort uses a temp array of the same size as the original array, or 1/2 the size (sort 2nd half into second half, sort first half into temp array, merge temp array + 2nd half into original array), so it needs more space than quick sort which optimally only needs log2(n) levels of nesting, and to avoid worst case nesting, a nesting check may be used and quick sort changed to heap sort, (this is called introsort).
If the compare overhead is greater than the move overhead, then merge sort is faster. A common example where compares take longer than moves would be sorting an array of pointers to strings. Only the (4 or 8 byte) pointers are moved, while the strings may be significantly larger (and similar for a large number of strings).
If there is significant pre-ordering of the data to be sorted, then timsort (fixed sized runs) or a "natural" merge sort (variable sized runs) will be faster.
While it is true that quicksort has worst case time complexity of O(n^2), as long as the quicksort implementation properly randomizes the input, its average case (expected) running time is O(n log n).
Additionally, the constant factors hidden by the asymptotic notation that do matter in practice are pretty small as compared to other popular choices such as merge sort. Thus, in expectation, quicksort will outperform other O(n log n) comparison sorts despite the less savory worst case bounds
Not exactly like that. Quicksort is the best in most cases, however it's pesimistic time complexity can be O(n^2), it doesn't mean it always is. The issue lies in choosing the right point of pivot, if you choose it correctly you have time complexity O(n log n).
In addition, quicksort is one of the cheapest/easiest in implementation.

Is n or nlog(n) better than constant or logarithmic time?

In the Princeton tutorial on Coursera the lecturer explains the common order-of-growth functions that are encountered. He says that linear and linearithmic running times are "what we strive" for and his reasoning was that as the input size increases so too does the running time. I think this is where he made a mistake because I have previously heard him refer to a linear order-of-growth as unsatisfactory for an efficient algorithm.
While he was speaking he also showed a chart that plotted the different running times - constant and logarithmic running times looked to be more efficient. So was this a mistake or is this true?
It is a mistake when taken in the context that O(n) and O(n log n) functions have better complexity than O(1) and O(log n) functions. When looking typical cases of complexity in big O notation:
O(1) < O(log n) < O(n) < O(n log n) < O(n^2)
Notice that this doesn't necessarily mean that they will always be better performance-wise - we could have an O(1) function that takes a long time to execute even though its complexity is unaffected by element count. Such a function would look better in big O notation than an O(log n) function, but could actually perform worse in practice.
Generally speaking: a function with lower complexity (in big O notation) will outperform a function with greater complexity (in big O notation) when n is sufficiently high.
You're missing the broader context in which those statements must have been made. Different kinds of problems have different demands, and often even have theoretical lower bounds on how much work is absolutely necessary to solve them, no matter the means.
For operations like sorting or scanning every element of a simple collection, you can make a hard lower bound of the number of elements in the collection for those operations, because the output depends on every element of the input. [1] Thus, O(n) or O(n*log(n)) are the best one can do.
For other kinds of operations, like accessing a single element of a hash table or linked list, or searching in a sorted set, the algorithm needn't examine all of the input. In those settings, an O(n) operation would be dreadfully slow.
[1] Others will note that sorting by comparisons also has an n*log(n) lower bound, from information-theoretic arguments. There are non-comparison based sorting algorithms that can beat this, for some types of input.
Generally speaking, what we strive for is the best we can manage to do. But depending on what we're doing, that might be O(1), O(log log N), O(log N), O(N), O(N log N), O(N2), O(N3), or (or certain algorithms) perhaps O(N!) or even O(2N).
Just for example, when you're dealing with searching in a sorted collection, binary search borders on trivial and gives O(log N) complexity. If the distribution of items in the collection is reasonably predictable, we can typically do even better--around O(log log N). Knowing that, an algorithm that was O(N) or O(N2) (for a couple of obvious examples) would probably be pretty disappointing.
On the other hand, sorting is generally quite a bit higher complexity--the "good" algorithms manage O(N log N), and the poorer ones are typically around O(N2). Therefore, for sorting an O(N) algorithm is actually very good (in fact, only possible for rather constrained types of inputs), and we can pretty much count on the fact that something like O(log log N) simply isn't possible.
Going even further, we'd be happy to manage a matrix multiplication in only O(N2) instead of the usual O(N3). We'd be ecstatic to get optimum, reproducible answers to the traveling salesman problem or subset sum problem in only O(N3), given that optimal solutions to these normally require O(N!).
Algorithms with a sublinear behavior like O(1) or O(Log(N)) are special in that they do not require to look at all elements. In a way this is a fallacy because if there are really N elements, it will take O(N) just to read or compute them.
Sublinear algorithms are often possible after some preprocessing has been performed. Think of binary search in a sorted table, taking O(Log(N)). If the data is initially unsorted, it will cost O(N Log(N)) to sort it first. The cost of sorting can be balanced if you perform many searches, say K, on the same data set. Indeed, without the sort, the cost of the searches will be O(K N), and with pre-sorting O(N Log(N)+ K Log(N)). You win if K >> Log(N).
This said, when no preprocessing is allowed, O(N) behavior is ideal, and O(N Log(N)) is quite comfortable as well (for a million elements, Lg(N) is only 20). You start screaming with O(N²) and worse.
He said those algorithms are what we strive for, which is generally true. Many algorithms cannot possibly be improved better than logarithmic or linear time, and while constant time would be better in a perfect world, it's often unattainable.
constant time is always better because the time (or space) complexity doesn't depend on the problem size... isn't it a great feature? :-)
then we have O(N) and then Nlog(N)
did you know? problems with constant time complexity exist!
e.g.
let A[N] be an array of N integer values, with N > 3. Find and algorithm to tell if the sum of the first three elements is positive or negative.
What we strive for is efficiency, in the sense of designing algorithms with a time (or space) complexity that does not exceed their theoretical lower bound.
For instance, using comparison-based algorithms, you can't find a value in a sorted array faster than Omega(Log(N)), and you cannot sort an array faster than Omega(N Log(N)) - in the worst case.
Thus, binary search O(Log(N)) and Heapsort O(N Log(N)) are efficient algorithms, while linear search O(N) and Bubblesort O(N²) are not.
The lower bound depends on the problem to be solved, not on the algorithm.
Yes constant time i.e. O(1) is better than linear time O(n) because the former is not depending on the input-size of the problem. The order is O(1) > O (logn) > O (n) > O (nlogn).
Linear or linearthimic time we strive for because going for O(1) might not be realistic as in every sorting algorithm we atleast need a few comparisons which the professor tries to prove with his decison Tree- comparison analysis where he tries to sort three elements a b c and proves a lower bound of nlogn. Check his "Complexity of Sorting" in the Mergesort lecture.

Resources