the time complexity of Skip List - algorithm

May I know why the time complexity of insertion of skip list is O(log n) for average case, and why the height of Skip list with n elements is O(log n) in high probability. And why average search time in each layer is O(1).

I can help with the O(log n) part.
Basically...
[Skip list searching] is quite reminiscent of binary search in an array and is perhaps the best way to intuitively understand why the maximum number of nodes visited in this list is in .

Related

Extracting k largest elements

If I have n integers, is it possible to list the k largest elements out of the n values in O(k+logn) time? The closest I've gotten is constructing a max heap and extracting the maximum k times, which takes O(klogn) time. Also thinking about using inorder traversal.
Ways to solve this problem.
Sort the data, then take top k. Sorting takes O(n lg n) and iterating over the top k takes O(k). Total time: O(n lg n + k)
Build a max-heap from the data and remove the top k times. Building the heap is O(n), and the operation to remove the top item is O(lg N) to reheapify. Total time: O(n) + O(k lg n)
Keep a running min-heap of maximum size k. Iterate over all the data, add to the heap, and then take the entirety of the heap. Total time: O(n lg k) + O(k)
Use a selection algorithm to find the k'th largest value. Then iterate over all the data to find all items that are larger than that value.
a. You can find the k'th largest using QuickSelect which has an average running time of O(n) but a worst case of O(n^2). Total average case time: O(n) + O(n) = O(n). Total worst case time: O(n^2) + O(n) = O(n^2).
b. You can also find the k'th largest using the median-of-medians algorithms which has a worst case running time of O(n) but is not in-place. Total time: O(n) + O(n) = O(n).
You can use Divide and Conquer technique for extracting kth element from array.Technique is sometimes called as Quick select because it uses the Idea of Quicksort.
QuickSort, we pick a pivot element, then move the pivot element to its correct position and partition the array around it. The idea is, not to do complete quicksort, but stop at the point where pivot itself is k’th smallest element. Also, not to recur for both left and right sides of pivot, but recur for one of them according to the position of pivot. The worst case time complexity of this method is O(n^2), but it works in O(n) on average.
Constructing a heap takes O(nlogn), and extracting k elements takes O(klogn). If you reached the conclusion that extracting k elements is O(klogn), it means you're not worried about the time it takes to build the heap.
In that case, just sort the list ( O(nlogn) ) and take the k largest element (O(k)).

Binary Search Tree vs Array for ordered elements

Consider the scenario where data to be inserted in an array is always in order, i.e. (1, 5, 12, 20, ...)/A[i] >= A[i-1] or (1000, 900, 20, 1, -2, ...)/A[i] <= A[i-1].
To support such a dataset, is it more efficient to have a binary search tree or an array.
(Side note: I am just trying to run some naive analysis for a timed hash map of type (K, T, V) and the time is always in order. I am debating using Map<K, BST<T,V>> vs Map<K, Array<T,V>>.)
As I understand, the following costs (worst case) apply—
Array BST
Space O(n) O(n)
Search O(log n) O(n)
Max/Min O(1) O(1) *
Insert O(1) ** O(n)
Delete O(n) O(n)
*: Max/Min pointers
**: Amortized time complexity
Q: I want to be more clear about the question. What kind of data structure should I be using for such a scenario between these two? Please feel free to discuss other data structures like self balancing BSTs, etc.
EDIT:
Please note I didn't consider the complexity for a balanced binary search tree (RBTree, etc). As mentioned, a naive analysis using a binary search tree.
Deletion has been updated to O(n) (didn't consider time to search the node).
Max/Min for skewed BST will cost O(n). But it's also possible to store pointers for Max & Min so overall time complexity will be O(1).
See below the table which will help you choose. Note that I am assuming 2 things:
1) data will always come in sorted order - you mentioned this i.e. if 1000 is the last data inserted, new data will always be more than 1000 - if data does not come in sorted order, insertion can take O(log n), but deletion will not change
2) your "array" is actually similar to java.util.ArrayList. In short, its length is mutable. (it is actually unfair compare a mutable and an immutable data structure) However, if it is a normal array, your deletion will take amortized O(log n) {O(log n) to search and O(1) to delete, amortized if you need to create new array} and insertion will take amortized O(1) {you need to create new array}
ArrayList BST
Space O(n) O(n)
Search O(log n) O(log n) {optimized from O(n)}
Max/Min O(1) O(log n) {instead of O(1) - you need to traverse till the leaf}
Insert O(1) O(log n) {optimized from O(n)}
Delete O(log n) O(log n)
So, based on this, ArrayList seems better

Time Complexity for alphabetical order using skip list

What will be time Complexity for Displaying data in alphabetical order using skip-list ?
and what time complexity will be for skip list if we implement using quad node ?
Let's assume that you have the input that contains N elements. Firstly you have to construct a skip-list. The complexity of a single insert operation is O(log N) in average so the complexity of inserting N elements would be O(N * log N). When the skip-list is constructed then elements within this list are sorted. So in order to enumerate them you only need to visit each element what is O(N).
It is worth saying that the skip-list is based on randomness. It means that O(log N) complexity of a single insert operation is not guaranteed. The worst case complexity is O(N), It means that in the worst case the complexity of inserting N elements into the skip-list would be O(N^2).

For faser searching, shouldn't one apply merge sort on the data before doing binary search or just jump straight to linear search?

I'm learning about algorithms and have doubts about their application in certain situations. There is the divide and conquer merge sort, and the binary search. Both faster than linear growth algos.
Let's say I want to search for some value in a large list of data. I don't know whether the data is sorted or not. How about instead of doing a linear search, why not first do merge sort and then do binary search. Would that be faster? Or the process of applying merge sort and then binary search combined would slow it down even more than linear search? Why? Would it depend on the size of the data?
There's a flaw in the premise of your question. Merge Sort has O(N logN) complexity, which is the best any comparison-based sorting algorithm can be, but that's still a lot slower than a single linear scan. Note that log2(1000) ~= 10. (Obviously, the constant-factors matter a lot, esp. for smallish problem sizes. Linear search of an array is one of the most efficient things a CPU can do. Copying stuff around for MergeSort is not bad, because the loads and stores are from sequential addresses (so caches and prefetching are effective), but it's still a ton more work than 10 reads through the array.)
If you need to support a mix of insert/delete and query operations, all with good time complexity, pick the right data structure for the task. A binary search tree is probably appropriate (or a Red-Black tree or some other variant that does some kind of rebalancing to prevent O(n) worst-case behaviour). That'll give you O(log n) query, and O(log n) insert/delete.
sorted array gives you O(n) insert/delete (because you have to shuffle the remaining elements over to make or close gaps), but O(log n) query (with lower time and space overhead than a tree).
unsorted array: O(n) query (linear search), O(1) insert (append to the end), O(n) delete (O(n) query, then shuffle elements to close the gap). Efficient deletion of elements near the end.
linked list, sorted or unsorted: few advantages other than simplicity.
hash table: insert/delete: O(1) average (amortized). query for present/not-present: O(1). Query for which two elements a non-present value is between: O(n) linear scan keeping track of the min element greater than x, and max element less than x.
If your inserts/deletes happen in large chunks, then sorting the new batch and doing a merge-sort is much more efficient than adding elements one at a time to a sorted array. (i.e. InsertionSort). Adding a chunk at the end and doing QuickSort is also an option, and might modify less memory.
So the best choice depends on the access pattern you're optimizing for.
If the list is of size n, then
TimeOfMergeSort(list) + TimeOfBinarySearch(list) = O(n log n) + O(log n) = O(n log n)
TimeOfLinearSearch(list) = O(n)
O(n) < O(n log n)
Implies
TimeOfLinearSearch(list) < TimeOfMergeSort(list) + TimeOfBinarySearch(list)
Of course, as mentioned in the comments frequency of sorting and frequency of searching play a huge role in amortized cost.

Lower-bound on comparison-based sorting of n values in the range 1 to k

Can we do better than O(n lg n) running time for a comparison-based algorithm when all of the values are in the range 1 to k, where k < n.
Counting sort and radix sort are not comparison-based algorithms and are disallowed. By a decision tree analysis, it seems like there are k^n possible permutations. There are 2^h leaves, so it should be possible to solve the problem in O(n lg k) time with a comparison-based sorting algorithm.
Please do not give a non-comparison based sorting algorithm for solving this problem, all sorting must be based on comparisons between two elements. Thanks!
It may easily be done in the bound you specified. Build a binary tree of k leaves and include a count value on each leaf. Processing each element (adding it or bumping the count) will be O(lg k) if one uses a suitable balancing algorithm, so doing all of them will be O(n lg k). Reconstituting the list will then be O(n).
Ok, if you insist you want comparisons.
You have k elements. So, keep a tree structure that will hold all the elements.
Go over the list of items, each time add the item to the tree. If the item is already in the tree, just increment the counter in that node. (or if you want the actual items you can keep a list in each node)
The tree will have no more than k items.
in the end, go over the tree in an inorder way, and add the items back in the right order (while adding the amount that are in the node's counter).
Complexity: O(nlogk)
Yes, you could use an array of size k. (Without comparisons)
Each cell i will contain a list.
go over the original array, put every item in the list of the right cell.
Go over the the second array, and pull them out, put them back in the right order.
O(n)

Resources