BST and RBT insersion worse case - data-structures

The insertion complexity of RBT and BST are O(logn).
I've implemented both of them in Java and give them a lot of numbers and measured the time in seconds to analyze the performance.The numbers I have plotted seem to show that it is O(n). Can anyone think on this and comment why this is the case?

It is possible for BST to have O (n) insertion time, for example if you are inserting elements in increasing or decreasing order.
It is also possible for RBT to have O (n) insertion time, because the tree needs additional time for rebalancing.
O (log n) is the average complexity for insertion (not worst-case).

Related

Is building a BST with N given elements is O(n lg n)?

What would be the worst case time complexity to build binary search tree with given arbitrary N elements ?
I think there is a difference between N given elements and the elements coming one by one and thereby making a BST of total N elements .
In the former case, it is O(n log n) and in second one is O(n^2) . Am i right ?
If Binary Search Tree (BST) is not perfectly balanced, then the worst case time complexity is O(n^2). Generally, BST is build by repeated insertion, so worst case will be O(n^2). But if you can sort the input (in O(nlogn)), it can be built in O(n), resulting in overall complexity of O(nlogn)
It BST is self-balancing, then the worst case time complexity is O(nlog n) even if we have repeated insertion.

Time Complexity for alphabetical order using skip list

What will be time Complexity for Displaying data in alphabetical order using skip-list ?
and what time complexity will be for skip list if we implement using quad node ?
Let's assume that you have the input that contains N elements. Firstly you have to construct a skip-list. The complexity of a single insert operation is O(log N) in average so the complexity of inserting N elements would be O(N * log N). When the skip-list is constructed then elements within this list are sorted. So in order to enumerate them you only need to visit each element what is O(N).
It is worth saying that the skip-list is based on randomness. It means that O(log N) complexity of a single insert operation is not guaranteed. The worst case complexity is O(N), It means that in the worst case the complexity of inserting N elements into the skip-list would be O(N^2).

Why isn't the time complexity of building a binary heap by insertion O(n)?

The background
According to Wikipedia and other sources I've found, building a binary heap of n elements by starting with an empty binary heap and inserting the n elements into it is O(n log n), since binary heap insertion is O(log n) and you're doing it n times. Let's call this the insertion algorithm.
It also presents an alternate approach in which you sink/trickle down/percolate down/cascade down/heapify down/bubble down the first/top half of the elements, starting with the middle element and ending with the first element, and that this is O(n), a much better complexity. The proof of this complexity rests on the insight that the sink complexity for each element depends on its height in the binary heap: if it's near the bottom, it will be small, maybe zero; if it's near the top, it can be large, maybe log n. The point is that the complexity isn't log n for every element sunk in this process, so the overall complexity is much less than O(n log n), and is in fact O(n). Let's call this the sink algorithm.
The question
Why isn't the complexity for the insertion algorithm the same as that of the sink algorithm, for the same reasons?
Consider the actual work done for the first few elements in the insertion algorithm. The cost of the first insertion isn't log n, it's zero, because the binary heap is empty! The cost of the second insertion is at worst one swap, and the cost of the fourth is at worst two swaps, and so on. The actual complexity of inserting an element depends on the current depth of the binary heap, so the complexity for most insertions is less than O(log n). The insertion cost doesn't even technically reach O(log n) until after all n elements have been inserted [it's O(log (n - 1)) for the last element]!
These savings sound just like the savings gotten by the sink algorithm, so why aren't they counted the same for both algorithms?
Actually, when n=2^x - 1 (the lowest level is full), n/2 elements may require log(n) swaps in the insertion algorithm (to become leaf nodes). So you'll need (n/2)(log(n)) swaps for the leaves only, which already makes it O(nlogn).
In the other algorithm, only one element needs log(n) swaps, 2 need log(n)-1 swaps, 4 need log(n)-2 swaps, etc. Wikipedia shows a proof that it results in a series convergent to a constant in place of a logarithm.
The intuition is that the sink algorithm moves only a few things (those in the small layers at the top of the heap/tree) distance log(n), while the insertion algorithm moves many things (those in the big layers at the bottom of the heap) distance log(n).
The intuition for why the sink algorithm can get away with this that the insertion algorithm is also meeting an additional (nice) requirement: if we stop the insertion at any point, the partially formed heap has to be (and is) a valid heap. For the sink algorithm, all we get is a weird malformed bottom portion of a heap. Sort of like a pine tree with the top cut off.
Also, summations and blah blah. It's best to think asymptotically about what happens when inserting, say, the last half of the elements of an arbitrarily large set of size n.
While it's true that log(n-1) is less than log(n), it's not smaller by enough to make a difference.
Mathematically: The worst-case cost of inserting the i'th element is ceil(log i). Therefore the worst-case cost of inserting elements 1 through n is sum(i = 1..n, ceil(log i)) > sum(i = 1..n, log i) = log 1 + log 1 + ... + log n = log(1 × 2 × ... × n) = log n! = O(n log n).
Ran into the same problem yesterday. I tried coming up with some form of proof to satisfy myself. Does this make any sense?
If you start inserting from the bottom, The leaves will have constant time insertion- just copying it into the array.
The worst case running time for a level above the leaves is:
k*(n/2h)*h
where h is the height (leaves being 0, top being log(n) ) k is a constant( just for good measure ). So (n/2h) is the number of nodes per level and h is the MAXIMUM number of 'sinking' operations per insert
There are log(n) levels,
Hence, The total running time will be
Sum for h from 1 to log(n): n* k* (h/2h)
Which is k*n * SUM h=[1,log(n)]: (h/2h)
The sum is a simple Arithmetico-Geometric Progression which comes out to 2.
So you get a running time of k*n*2, Which is O(n)
The running time per level isn't strictly what i said it was but it is strictly less than that.Any pitfalls?

Heapsort. How is it possible so simulate worstcase-scenario?

I am rather clear on how to programme it, but I am not sure on the definition, e.g. how to write it down in mathematics terms.
A normal heapsort is done with N elements in O notation. So O(log(n))
I just started with heapsort, so I might be a little bit off here.
But how can I for example look for a random element, when there are N elements?
And then pick that random element and delete it?
I was thinking that in a worst case - situation it has to go through the whole tree (Because the element could either be at the first place or at the last place, e.g. highest or lowest).
But how can I write that down in mathematics terms?
Heapsort's worst case performance is O(n log n), and to quote alestanis:
Max in max-heap: O(1). Min in min-heap: O(1). Opposite cases in O(n).
Here's an SO-answer explaining how to do the opposite cases in O(1) if you create the heap yourself.
To build maxheap array worstcase is O(n) and to max heapify complexcity in worst case is O(logn) so HeapSort worstCase is O(nlogn)

analyzing time complexity

I have 2 arrays
a of length n
b of length m
Now i want to find all elements that are common to both the arrays
Algorithm:
Build hash map consisting of all elements of A
Now for each element x in B check if the element x is present in hashmap.
Analyzing overall time complexity
for building hashmap O(n)
for second step complexity is O(m)
So the overall is O(m+n). Am i correct?
What is O(m+n) = ?? when m is large or vice versa?
O(m) + O(n) = O(m+n), if you know that m>n then O(m+n)=O(m+m)=O(m).
Note: hashes theoretically don't guarantee O(1) lookup, but practically you can count on it (= it's the average complexity, the expected runtime for a random input).
Also note, that your algo will repeatedly signal duplicated elements of b which are also present in a. If this is a problem you have to store in the hash that you already checked/printed out that element.
Average case time complexity is O(m + n). This is what you should consider if you are doing some implementation, since hash maps would usually not have collisions. O(m+n) = O(max(m, n))
However, if this is an test question, by time complexity, people mean worst case time complexity. Worst case time complexity is O(mn) since each of second steps can take O(n) time in worst case.

Resources