Ordered array of strings - worst case? - algorithm

Would O(log n) be the worst case search time limited to an ordered array of strings of length n?
I just did a test today and i wondering if i'm right or wrong, selecting that out these...
O(n)
O(log n)
O(n/2)
O(√n)
EDIT: I edited this question to make things clearer.

Sequential Search:
Best:O(n) Worst:O(n)
Binary Search:
Best:O(1) Worst:O(log n)

Searching a string in a sorted array of strings would be O(|S| * logn) where |S| is the average length of a string, and n is the number of strings, using binary search, since it has O(logn) compare ops, and each compare is O(|S|) (It has to read the string...)
If you regard the length of the strings as constants - it is O(logn). This assumption is generally not taken when talking about strings.
Note that there are other data structures - such as trie, that allows better complexity. Trie allows O(|S|) search for each string in the collection.
P.S.
Mathematically speaking, since big O notation is an upper bound, and not a tight bound - all of the answers are correct1 for binary search, it is O(n), O(n/2) , O(logn), O(sqrt(n)), since all of them provide assymptotic upper bound for binary search.
(1) assuming binary search and all strings have bounded length, so each compare op is O(1).

Related

Binary Insertion sort complexity for swaps and comparison in best case

What could be the complexity for binary insertion sort? And how many swaps and comparisons are made?
It might be O(n(LG n)) comparisons, but I am not sure. For the worst case, it is indeed N^2 swaps. What about the best?
You can write binary insertion sort easily by leveraging built-in functions such as bisect_left and list.pop(..) and list.insert(..) :
def bininssort(L):
n = len(L)
i,j=0,0
for i in range(1,n):
j=i-1
x=L.pop(i)
i1=bisect_left(L,x,0,j+1)
L.insert(i1,x)
return L
About the worst-case, since at the i-th iteration of the loop, we perform a binary search inside the sub-array A[0..i], with 0<=i<n, that should take log(i) operations, so we now know we have to insert an element at location i1 and we insert it, but the insertion means we have to push all the elements that follow it one position to the right, and that's at least n-i operations (it can be more than n-i operations depending on the insertion location). If we sum up just these two we get \sum_{i=1}^n log(i) + (n-i) = log(n!) + (n*(n+1))/2 ~ n*log(n) + (n*(n+1))/2
(in the above Stirling's approximation of log(n!) is being used)
Now the wiki page says
As a rule-of-thumb, one can assume that the highest-order term in any given function dominates its rate of growth and thus defines its run-time order
So I think the conclusion would be that in the worst-case the binary insertion sort has O(n^2) complexity.
See also:
insertion sort using binary search
insertion sort with binary search
analysis of binary insertion sort
binary insertion sort and complexity
Then I tried to check how it's performing on reversed(n,n-1,n-2,..,1) and alternating (0,n-1,1,n-2,2,n-3,...) lists. And I fitted them (using the matchgrowth module) to different growth rates, this part is just an approximation. The reversed order was fitted to polynomial time, and the alternating order was fitted to quasilinear time
The best-case is explained here. If the list is already sorted, then even if we don't do any swaps, all the binary searches are still being performed, which leads to O(n*log(n)).
The code used here is available in this repository.

Choosing comparing algorithms to find k max values

Lets say that I want to find the K max values in an array of n elements , and also return them in a sorted output.
k may be -
k = 30 , k = n/5 ..
I thought about some efficient algorithms but all I could think of was in complexity of O(nlogn). Can I do it in `O(n)? maybe with some modification of quick sort?
Thanks!
The problem could be solved using min-heap-based priority queue in
O(NlogK) + (KlogK) time
If k is constant (k=30 case), then complexity is equal to O(N).
If k = O(N) (k=n/5 case), then complexity is equal to O(NlogN).
Another option for constant k - K-select algorithm based on Quicksort partition with average time O(N) (while worst case O(N^2) might occur)
There is a way of sorting elements in nearly O(n), if you assume that you only want to sort integers. This can be done with Algorithms like Bucket Sort or Radix Sort, that do not rely on the comparison between two elements (which are limited to O(n*log(n))).
However note, that these algorithms also have worst-case runtimes, that might be slower than O(n*log(n)).
More information can be found here.
No comparison based sorting algorithms can achieve a better average case complexity than O(n*lg n)
There are many papers with proofs out there but this site provides a nice visual example.
So unless you are given a sorted array, your best case is going to be an O(n lg n) algorithm.
There are sorts like radix and bucket, but they are not based off of comparison based sorting like your title seems to imply.

What is the Best/Worst/Average Case Big-O Runtime of a Trie Data Structure?

What is the best/worst/average case complexity (in Big-O notation) of a trie data structure for insertion and search?
I think it is O(K) for all cases, where K is the length of an arbitrary string which is being inserted or searched. Will someone confirm this?
According to Wikipedia and this source, the worst case complexity for insertion and search for a trie is O(M) where M is the length of a key. I'm failing to find any sources describing the best or average case complexity of insertion and search. However, we can safely say that best and average case complexity is O(M) where M is the length of a key, since Big-O only describes an upper bound on complexity.
k: the length of the string that you search or insert.
For Search
Worst: O(26*k) = O(k)
Average: O(k) # In this case, k is an average length of the string
Best: O(1) # Your string is just 'a'.
Trie's complexity does not change with the number of strings that you search, only with search string's length. That's why Trie is used when the number of strings to search is large, like searching the whole vocabularies in an English dictionary.
For Insertion
Worst: O(26*k) = O(k)
Average: O(k)
Best: O(1)
So, yes, you are right. You would probably have seen O(MN) and that might have confused you but it's simply talking about when you need to do above O(k) operations N times.
There is some great info about this on Wikpedia. http://en.wikipedia.org/wiki/Trie
The best case for Search will be when the word being looked up is not present in the trie so you'll know right away yielding the best-case run-time of O(1). For insertion, best-case remains O(n) as you're either doing successful look-ups or creating new nodes for all the letters in the word.

Can you sort n integers in O(n) amortized complexity?

Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
What about trying to create a worst case of O(n) complexity?
Most of the algorithms today are built on O(nlogn) average + O(n^2) worst case.
Some, while using more memory are O(nlogn) worst.
Can you with no limitation on memory usage create such an algorithm?
What if your memory is limited? how will this hurt your algorithm?
Any page on the intertubes that deals with comparison-based sorts will tell you that you cannot sort faster than O(n lg n) with comparison sorts. That is, if your sorting algorithm decides the order by comparing 2 elements against each other, you cannot do better than that. Examples include quicksort, bubblesort, mergesort.
Some algorithms, like count sort or bucket sort or radix sort do not use comparisons. Instead, they rely on the properties of the data itself, like the range of values in the data or the size of the data value.
Those algorithms might have faster complexities. Here is an example scenario:
You are sorting 10^6 integers, and each integer is between 0 and 10. Then you can just count the number of zeros, ones, twos, etc. and spit them back out in sorted order. That is how countsort works, in O(n + m) where m is the number of values your datum can take (in this case, m=11).
Another:
You are sorting 10^6 binary strings that are all at most 5 characters in length. You can use the radix sort for that: first split them into 2 buckets depending on their first character, then radix-sort them for the second character, third, fourth and fifth. As long as each step is a stable sort, you should end up with a perfectly sorted list in O(nm), where m is the number of digits or bits in your datum (in this case, m=5).
But in the general case, you cannot sort faster than O(n lg n) reliably (using a comparison sort).
I'm not quite happy with the accepted answer so far. So I'm retrying an answer:
Is it theoretically possible to sort an array of n integers in an amortized complexity of O(n)?
The answer to this question depends on the machine that would execute the sorting algorithm. If you have a random access machine, which can operate on exactly 1 bit, you can do radix sort for integers with at most k bits, which was already suggested. So you end up with complexity O(kn).
But if you are operating on a fixed size word machine with a word size of at least k bits (which all consumer computers are), the best you can achieve is O(n log n). This is because either log n < k or you could do a count sort first and then sort with a O (n log n) algorithm, which would yield the first case also.
What about trying to create a worst case of O(n) complexity?
That is not possible. A link was already given. The idea of the proof is that in order to be able to sort, you have to decide for every element to be sorted if it is larger or smaller to any other element to be sorted. By using transitivity this can be represented as a decision tree, which has n nodes and log n depth at best. So if you want to have performance better than Ω(n log n) this means removing edges from that decision tree. But if the decision tree is not complete, than how can you make sure that you have made a correct decision about some elements a and b?
Can you with no limitation on memory usage create such an algorithm?
So as from above that is not possible. And the remaining questions are therefore of no relevance.
If the integers are in a limited range then an O(n) "sort" of them would involve having a bit vector of "n" bits ... looping over the integers in question and setting the n%8 bit of offset n//8 in that byte array to true. That is an "O(n)" operation. Another loop over that bit array to list/enumerate/return/print all the set bits is, likewise, an O(n) operation. (Naturally O(2n) is reduced to O(n)).
This is a special case where n is small enough to fit within memory or in a file (with seek()) operations). It is not a general solution; but it is described in Bentley's "Programming Pearls" --- and was allegedly a practical solution to a real-world problem (involving something like a "freelist" of telephone numbers ... something like: find the first available phone number that could be issued to a new subscriber).
(Note: log(10*10) is ~24 bits to represent every possible integer up to 10 digits in length ... so there's plenty of room in 2*31 bits of a typical Unix/Linux maximum sized memory mapping).
I believe you are looking for radix sort.

On the efficiency of tries and radix sort

Radix sort's time complexity is O(kn) where n is the number of keys to be sorted and k is the key length. Similarly, the time complexity for the insert, delete, and lookup operations in a trie is O(k). However, assuming all elements are distinct, isn't k>=log(n)? If so, that would mean Radix sort's asymptotic time complexity is O(nlogn), equal to that of quicksort, and trie operations have a time complexity of O(logn), equal to that of a balanced binary search tree. Of course, the constant factors may differ significantly, but the asymptotic time complexities won't. Is this true, and if so, do radix sort and tries have other advantages over other algorithms and data structures?
Edit:
Quicksort and its competitors perform O(nlogn) comparisons; in the worst case each comparison will take O(k) time (keys differ only at last digit checked). Therefore, those algorithms take O(knlogn) time. By that same logic, balanced binary search tree operations take O(klogn) time.
Big O notation is not used that way, even if k>=log n for radix sorting, O(kn) means that your processing time will double if n doubles and so on, this is how you should use big-o notation.
One advantage of radix sort is that it's worst case is O(kn) (quicksort's O(n^2)) so radix sort is somehow more resistant to malicious input than quicksort. It can also be really fast in term of real perfomance, if you use bitwise operations, a power of 2 as a base and in-place msd-radix sort with insertion sort for smaller arrays.
The same argument is valid for tries, they are resistant to malicious input in the sense that insertion/search is O(k) in the worst case. Hashtables perform insertion/search in O(1) but with O(k) hashing and in the worst case O(N) insertion/search. Also, tries can store strings more efficiently.
Check Algorithmic Complexity Attacks
The asymptotic time complexity of Radix sort is O(NlogN) which is also the time complexity of Qucik sort. The advantage of Radix sort is that it's best, average and worst case performance is same where as the worst case performance of Quick sort is O(N^2). But it takes twice the sapce as required by Quick sort. So, if space complexity is not a problem then Radix sort is a better option.

Resources