sorting of words in lexicographic order - algorithm

Given n words, is it possible to sort them in lexicographic order with o(n) time complexity?? Well i found a method like creating a trie data structure and an inorder traversal of the trie would result in time complexity close to O(kn) where k is the arbitrary string length, but the problem here being space complexity. constructing BST and inorder traversal is also a good option but time complexit is O(nlogn) . So could anyone suggest me whoch would be better BST or trie given the constraints of both. Any other algorithm or suggestions are also welcomed.

It is easy to sort the words in O(nL) time with a bucket sort or radix sort. Here L is word length. It's impossible to do better, since you must look at all the keys at least once.
Your triesort idea is an old one.

Related

What is the time complexity of constructing a binary search tree?

"Every comparison-based algorithm to sort n elements must take Ω(nlogn) comparisons in the worst case. With this fact, what would be the complexity of constructing a n-node binary search tree and why?"
Based on this question, I am thinking that the construction complexity must be at least O(nlogn). That said, I can't seem to figure out how to find the total complexity of construction.
The title of the question and the text you quote are asking different things. I am going to address what the quote is saying because finding how expensive BST construction is can be done just by looking at an algorithm.
Assume that for a second it was possible to construct a BST in better than Ω(nlogn). With a binary search tree you can read out the sorted list in Θ(n) time. This means I could create a sorting algorithm as follows.
Algorithm sort(L)
B <- buildBST(L)
Sorted <- inOrderTraversal(B)
return Sorted
With this algorithm I would be able to sort a list in better than Ω(nlogn). But as you stated this is not possible because Ω(nlogn) is a lower bound. Therefor it is not possible to create a binary search tree in better than Ω(nlogn) time.
Furthermore since an algorithm exits to create a BST in O(nlogn) time you can actually say that the algorithm is optimal under the comparison based model
The construction of the BST will be O(n(log(n))).
You will need to insert each and every node which is an O(n) operation.
To insert that n nodes you will need to make at least O(log(n)) comparisons.
Hence the minimum will be O(n(log(n))).
Only in the best case where the array is already sorted the time complexity will be O(n)

Search operation complexity

What would be the complexity of a search operation in an unsorted array that allows duplicates. My guesses were O(N) since it allows duplicates, the whole array needs to be searched. But I'm new to algorithm complexity and I cannot be sure about my answer, can you please confirm if I'm correct.
Since the array is unsorted, you have to look at half of the array on average to find the element you are searching for. Therefore, complexity is linear - O(N). Duplicates or no duplicates, the same complexity.
Searching for elements within an unordered array would indeed be O(N) as no heuristic can speed up the search.
It is O(n) because in the worst case you still need to look at every element.

Is there a name for this sorting algorithm?

I thought of a sorting algorithm but I am not sure if this already exists.
Say we have a container with n items:
We choose the 3rd element and do a binary search on the first 2, putting it in the correct position. The first 3 items in the container are sorted.
We choose the 4th element and do a binary search on the first 3 and put it in the correct position. Now the first 4 items are sorted.
We choose the 5th element and do a binary search on the first 4 items and put it in the correct position. Now 5 items are sorted.
.
.
.
We choose the nth element and do a binary search on the other n-1 elements putting it in the correct position. All the items are sorted.
Binary search takes logk for k elements and let's say that the insertion takes constant time. Shouldn't this take:
log2 to put the 3rd element in the correct spot.
log3 to put the 4th element in the correct spot.
log4 to put the 5th element in the correct spot.
.
.
.
log(n-1) to put the nth element in the correct spot.
log2 + log3 + log4 + ... log(n-1) = log((n-1)!) ?
I may be talking nonsense but this looked interesting.
EDIT:
I did not take the insertion time into consideration. What if the sorting was done in a sorting array with gaps between the elements? This would allow for fast inserting without having to shift many elements. After a number of inserts, we could redistribute the elements. Considering that the array is not sorted (we could use a shuffle to ensure this) I think that the results could be quite fast.
It sounds like insertion sort modified to use binary search. It's fairly well-known, but not particularly well-used (as far as I know), possibly because it doesn't affect the O(n²) worst case, but makes the O(n) best case take O(n log n) instead, and because insertion sort isn't commonly used on anything but really small arrays or those already sorted or nearly sorted.
The problem is that you can't really insert in O(1). Random-access insert into an array takes O(n), which is of course what the well-known O(n²) complexity of insertion sort assumes.
One could consider a data structure like a binary search tree, which has O(log n) insert - it's not O(1), but we still end up with an O(n log n) algorithm.
Oh O(log (n!)) = O(n log n), in case you were wondering about that.
Tree sort (generic binary search tree) and splaysort (splay tree) both use binary search trees to sort. Adding elements to a balanced binary search tree is equivalent to doing a binary search to find where to add the elements then some tree operations to keep the tree balanced. Without a tree of some type, this becomes insertion sort as others have mentioned.
In the worst case the tree can become highly unbalanced, resulting in O(N^2) for tree sort. Using a self-balancing binary search tree yields O(N log N), at least on average. Splay sort is an adaptive sort, making it rather efficient when the input is already nearly sorted.
I think by binary search, he meant that there is an insertion taking place placed on a searchable index of where we would expect to find the item we are inserting. in which case it would be called insertion sort... Either way it's still N*log(N)

What is the Best/Worst/Average Case Big-O Runtime of a Trie Data Structure?

What is the best/worst/average case complexity (in Big-O notation) of a trie data structure for insertion and search?
I think it is O(K) for all cases, where K is the length of an arbitrary string which is being inserted or searched. Will someone confirm this?
According to Wikipedia and this source, the worst case complexity for insertion and search for a trie is O(M) where M is the length of a key. I'm failing to find any sources describing the best or average case complexity of insertion and search. However, we can safely say that best and average case complexity is O(M) where M is the length of a key, since Big-O only describes an upper bound on complexity.
k: the length of the string that you search or insert.
For Search
Worst: O(26*k) = O(k)
Average: O(k) # In this case, k is an average length of the string
Best: O(1) # Your string is just 'a'.
Trie's complexity does not change with the number of strings that you search, only with search string's length. That's why Trie is used when the number of strings to search is large, like searching the whole vocabularies in an English dictionary.
For Insertion
Worst: O(26*k) = O(k)
Average: O(k)
Best: O(1)
So, yes, you are right. You would probably have seen O(MN) and that might have confused you but it's simply talking about when you need to do above O(k) operations N times.
There is some great info about this on Wikpedia. http://en.wikipedia.org/wiki/Trie
The best case for Search will be when the word being looked up is not present in the trie so you'll know right away yielding the best-case run-time of O(1). For insertion, best-case remains O(n) as you're either doing successful look-ups or creating new nodes for all the letters in the word.

Number of occurrences of words in a file - Complexity?

Given I have a file which a set of words:
1) If I choose a hash table to store word -> count, what would be the time complexity to find the occurrences of a particular word?
2) How could I return those words alphabetically ordered?
If I chose a hash table, I know that the time complexity for 1) would be O(n) to parse all the words and O(1) to get the count of a particular word.
I fail to see how could I order the hash table and what would be the time complexity. Any help?
A sortable hash map becomes, essentially, a binary tree. In java you can see TreeMap implementing the SortableMap interface with the O(log n) on look-up and insert.
If you want the best theoretical performance you'd use a HashMap with O(1) look-up and insert and then you'd use a bucket/radix sort with O(n) for display/iteration.
In reality using a radix sort on strings will perform worse than a quick sort O(n log n).
Your analysis of (1) is correct.
Most hash table implementations (that I know of) has no implicit ordering.
To get an ordered list you'd have to sort the list (O(n log n)), queries on the list would take O(log n).
You could theoretically define a hash operation and implementation that sorts, but making it well-distributed (for it to be efficient) would be difficult and just sorting would be a lot simpler.
If it's a file containing lots of duplicates, the best idea may be to use hashing first to eliminate duplicates, then iterate through the hash table to get a list of non-duplicates and sort that.
Working with hash tables has two drawbacks 1- They do not store data in sorted way, 2-Calculation of the hash value is usually time consuming. They also have linear complexity for insert/delete/lookup in the worst case.
My suggestion is using a Trie for storing your words. Which has a guaranteed O(1) (number of words) for insert/lookup. A pre-order traverse over a Trie will give a sorted list of the words in the Trie.

Resources