Binary Search Tree Explanation - algorithm

I am trying to brush up a bit on my understanding of binary trees and in particular binary search trees. Looking through the wikipedia showed me the following information (http://en.wikipedia.org/wiki/Binary_search_tree):
"Binary search trees keep their keys in sorted order, so that lookup and other operations can use the principle of binary search: when looking for a key in a tree (or a place to insert a new key), they traverse the tree from root to leaf, making comparisons to keys stored in the nodes of the tree and deciding, based on the comparison, to continue searching in the left or right subtrees. On average, this means that each comparison allows the operations to skip over half of the tree, so that each lookup/insertion/deletion takes time proportional to the logarithm of the number of items stored in the tree. This is much better than the linear time required to find items by key in an unsorted array, but slower than the corresponding operations on hash tables."
Can someone please elaborate / explain the following portions of that description:
1) "On average, this means that each comparison allows the operations to skip over half of the tree, so that each lookup/insertion/deletion takes time proportional to the logarithm of the number of items stored in the tree."
2) [from the last sentence] "...but slower than the corresponding operations on hash tables."

1) "On average" is only applicable if the BST is balanced, i.e. the left and right subtree contain a rougly equal number of nodes. This makes searching an O(log n) operation, because on each iteration you can roughly discard half of the remaining items.
2) On hash tables, searching, insertion and deletion all take expected O(1) time.

Related

Data structure with O(1) insertion and O(log(n)) search complexity?

Is there any data structure available that would provide O(1) -- i.e. constant -- insertion complexity and O(log(n)) search complexity even in the worst case?
A sorted vector can do a O(log(n)) search but insertion would take O(n) (taken the fact that I am not always inserting the elements either at the front or the back). Whereas a list would do O(1) insertion but would fall short of providing O(log(n)) lookup.
I wonder whether such a data structure can even be implemented.
Yes, but you would have to bend the rules a bit in two ways:
1) You could use a structure that has O(1) insertion and O(1) search (such as the CritBit tree, also called bitwise trie) and add artificial cost to turn search into O(log n).
A critbit tree is like a binary radix tree for bits. It stores keys by walking along the bits of a key (say 32bits) and use the bit to decide whether to navigate left ('0') or right ('1') at every node. The maximum complexity for search and insertion is both O(32), which becomes O(1).
2) I'm not sure that this is O(1) in a strict theoretical sense, because O(1) works only if we limit the value range (to, say, 32 bit or 64 bit), but for practical purposes, this seems a reasonable limitation.
Note that the perceived performance will be O(log n) until a significant part of the possible key permutations are inserted. For example, for 16 bit keys you probably have to insert a significant part of 2^16 = 65563 keys.
No (at least in a model where the elements stored in the data structure can be compared for order only; hashing does not help for worst-case time bounds because there can be one big collision).
Let's suppose that every insertion requires at most c comparisons. (Heck, let's make the weaker assumption that n insertions require at most c*n comparisons.) Consider an adversary that inserts n elements and then looks up one. I'll describe an adversarial strategy that, during the insertion phase, forces the data structure to have Omega(n) elements that, given the comparisons made so far, could be ordered any which way. Then the data structure can be forced to search these elements, which amount to an unsorted list. The result is that the lookup has worst-case running time Omega(n).
The adversary's goal is to give away as little information as possible. Elements are sorted into three groups: winners, losers, and unknown. Initially, all elements are in the unknown group. When the algorithm compares two unknown elements, one chosen arbitrarily becomes a winner and the other becomes a loser. The winner is deemed greater than the loser. Similarly, unknown-loser, unknown-winner, and loser-winner comparisons are resolved by designating one of the elements a winner and the other a loser, without changing existing designations. The remaining cases are loser-loser and winner-winner comparisons, which are handled recursively (so the winners' group has a winner-unknown subgroup, a winner-winners subgroup, and a winner-losers subgroup). By an averaging argument, since at least n/2 elements are compared at most 2*c times, there exists a subsub...subgroup of size at least n/2 / 3^(2*c) = Omega(n). It can be verified that none of these elements are ordered by previous comparisons.
I wonder whether such a data structure can even be implemented.
I am afraid the answer is no.
Searching OK, Insertion NOT
When we look at the data structures like Binary search tree, B-tree, Red-black tree and AVL tree, they have average search complexity of O(log N), but at the same time the average insertion complexity is same as O(log N). Reason is obvious, the search will follow (or navigate through) the same pattern in which the insertion happens.
Insertion OK, Searching NOT
Data structures like Singly linked list, Doubly linked list have average insertion complexity of O(1), but again the searching in Singly and Doubly LL is painful O(N), just because they don't have any indexing based element access support.
Answer to your question lies in the Skiplist implementation, which is a linked list, still it needs O(log N) on average for insertion (when lists are expected to do insertion in O(1)).
On closing notes, Hashmap comes very close to meet the speedy search and speedy insertion requirement with the cost of huge space, but if horribly implemented, it can result into a complexity of O(N) for both insertion and searching.

Design a data structure with insertion, deletion, get random in O(1) time complexity,

It was a recent interview question. Please design a data structure with insertion, deletion, get random in o(1) time complexity, the data structure can be a basic data structures such as arrays, can be a modification of basic data structures, and can be a combination of basic data structures.
Combine an array with a hash-map of element to array index.
Insertion can be done by appending to the array and adding to the hash-map.
Deletion can be done by first looking up and removing the array index in the hash-map, then swapping the last element with that element in the array, updating the previously last element's index appropriately, and decreasing the array size by one (removing the last element).
Get random can be done by returning a random index from the array.
All operations take O(1).
Well, in reality, it's amortised (from resizing the array) expected (from expected hash collisions) O(1), but close enough.
A radix tree would work. See http://en.wikipedia.org/wiki/Radix_tree. Insertion and deletion are O(k) where k is the maximum length of the keys. If all the keys are the same length (e.g., all pointers), then k is a constant so the running time is O(1).
In order to implement get random, maintain a record of the total number of leaves in each subtree (O(k)). The total number of leaves in tree is recorded at the root. To pick one at random, generate a random integer to represent the index of the element to pick. Recursively scan down the tree, always following the branch that contains the element you picked. You always know which branch to choose because you know how many leaves can be reached from each subtree. The height of the tree is no more than k, so this is O(k), or O(1) when k is constant.

Complexity of inserting n numbers into a binary search tree

I have got a question, and it says "calculate the tight time complexity for the process of inserting n numbers into a binary search tree". It does not denote whether this is a balanced tree or not. So, what answer can be given to such a question? If this is a balanced tree, then height is logn, and inserting n numbers take O(nlogn) time. But this is unbalanced, it may take even O(n2) time in the worst case. What does it mean to find the tight time complexity of inserting n numbers to a bst? Am i missing something? Thanks
It could be O(n^2) even if the tree is balanced.
Suppose you're adding a sorted list of numbers, all larger than the largest number in the tree. In that case, all numbers will be added to the right child of the rightmost leaf in the tree, Hence O(n^2).
For example, suppose that you add the numbers [15..115] to the following tree:
The numbers will be added as a long chain, each node having a single right hand child. For the i-th element of the list, you'll have to traverse ~i nodes, which yields O(n^2).
In general, if you'd like to keep the insertion and retrieval at O(nlogn), you need to use Self Balancing trees.
What wiki is saying is correct.
Since the given tree is a BST, so one need not to search through entire tree, just comparing the element to be inserted with roots of tree/subtree will get the appropriate node for th element. This takes O(log2n).
Once we have such a node we can insert the key there bht after that it is required push all the elements in the right aub-tree to right, so that BST's searching property does not get violated. If the place to be inserted comes to be the very last last one, we need to worry for the second procedure. If note this procedure may take O(n), worst case!.
So the overall worst case complexity of inserting an element in a BST would be O(n).
Thanks!

Find k-th smallest element data structure

I have a problem here that requires to design a data structure that takes O(lg n) worst case for the following three operations:
a) Insertion: Insert the key into data structure only if it is not already there.
b) Deletion: delete the key if it is there!
c) Find kth smallest : find the ݇k-th smallest key in the data structure
I am wondering if I should use heap but I still don't have a clear idea about it.
I can easily get the first two part in O(lg n), even faster but not sure how to deal with the c) part.
Anyone has any idea please share.
Two solutions come in mind:
Use a balanced binary search tree (Red black, AVG, Splay,... any would do). You're already familiar with operation (1) and (2). For operation (3), just store an extra value at each node: the total number of nodes in that subtree. You could easily use this value to find the kth smallest element in O(log(n)).
For example, let say your tree is like follows - root A has 10 nodes, left child B has 3 nodes, right child C has 6 nodes (3 + 6 + 1 = 10), suppose you want to find the 8th smallest element, you know you should go to the right side.
Use a skip list. It also supports all your (1), (2), (3) operations for O(logn) on average but may be a bit longer to implement.
Well, if your data structure keeps the elements sorted, then it's easy to find the kth lowest element.
The worst-case cost of a Binary Search Tree for search and insertion is O(N) while the average-case cost is O(lgN).
Thus, I would recommend using a Red-Black Binary Search Tree which guarantees a worst-case complexity of O(lgN) for both search and insertion.
You can read more about red-black trees here and see an implementation of a Red-Black BST in Java here.
So in terms of finding the k-th smallest element using the above Red-Black BST implementation, you just need to call the select method, passing in the value of k. The select method also guarantees worst-case O(lgN).
One of the solution could be using the strategy of quick sort.
Step 1 : Pick the fist element as pivot element and take it to its correct place. (maximum n checks)
now when you reach the correct location for this element then you do a check
step 2.1 : if location >k
your element resides in the first sublist. so you are not interested in the second sublist.
step 2.2 if location
step 2.3 if location == k
you have got the element break the look/recursion
Step 3: repete the step 1 to 2.3 by using the appropriate sublist
Complexity of this solution is O(n log n)
Heap is not the right structure for finding the Kth smallest element of an array, simply because you would have to remove K-1 elements from the heap in order to get to the Kth element.
There is a much better approach to finding Kth smallest element, which relies on median-of-medians algorithm. Basically any partition algorithm would be good enough on average, but median-of-medians comes with the proof of worst-case O(N) time for finding the median. In general, this algorithm can be used to find any specific element, not only the median.
Here is the analysis and implementation of this algorithm in C#: Finding Kth Smallest Element in an Unsorted Array
P.S. On a related note, there are many many things that you can do in-place with arrays. Array is a wonderful data structure and only if you know how to organize its elements in a particular situation, you might get results extremely fast and without additional memory use.
Heap structure is a very good example, QuickSort algorithm as well. And here is one really funny example of using arrays efficiently (this problem comes from programming Olympics): Finding a Majority Element in an Array

Sorting an n element array with O(logn) distinct elements in O(nloglogn) worst case time

The problem at hand is whats in the title itself. That is to give an algorithm which sorts an n element array with O(logn) distinct elements in O(nloglogn) worst case time. Any ideas?
Further how do you generally handle arrays with multiple non distinct elements?
O(log(log(n))) time is enough for you to do a primitive operation in a search tree with O(log(n)) elements.
Thus, maintain a balanced search tree of all the distinct elements you have seen so far. Each node in the tree additionally contains a list of all elements you have seen with that key.
Walk through the input elements one by one. For each element, try to insert it into the tree (which takes O(log log n) time). If you find you've already seen an equal element, just insert it into the auxiliary list in the already-existing node.
After traversing the entire list, walk through the tree in order, concatenating the auxiliary lists. (If you take care to insert in the auxiliary lists at the right ends, this is even a stable sort).
Simple log(N) space solution would be:
find distinct elements using balanced tree (log(n) space, n+log(n) == n time)
Than you can use this this tree to allways pick correct pivot for quicksort.
I wonder if there is log(log(N)) space solution.
Some details about using a tree:
You should be able to use a red black tree (or other type of tree based sorting algorithm) using nodes that hold both a value and a counter: maybe a tuple (n, count).
When you insert a new value you either create a new node or you increment the count of the node with the value you are adding (if a node with that value already exists). If you just increment the counter it will take you O(logH) where H is the height of the tree (to find the node), if you need to create it it will also take O(logH) to create and position the node (the constants are bigger, but it's still O(logH).
This will ensure that the tree will have no more than O(logn) values (because you have log n distinct values). This means that the insertion will take O(loglogn) and you have n insertions, so O(nloglogn).

Resources