Efficiently rebalancing a tree of 2^n-1 nodes? - algorithm

I stumbled upon this question:
Given a binary search tree with 2^n-1 nodes, give an efficient algorithm to convert it to a self balancing tree(like avl or RB tree). and analyze its worst case running time as a function of n.
well I think the most efficient algorithm is at o(n) time for n nodes, but the 2^n-1 nodes is the tricky part. any idea what will be the running time then?
any help will be greatly appreciated

If you've already got a linear-time algorithm for solving this problem, great! Think of it this way. Let m = 2n - 1. If you have an algorithm that balances the tree and runs in time linear in the number of nodes, then your algorithm runs in time O(m) in this case, which is great. Don't let the exponential time scare you; if the runtime is O(2n) on inputs of size 2n - 1, then you're running efficiently.
As for particular algorithms, you seem to already know one, but if you haven't heard of it already, check out the Day-Stout-Warren algorithm, which optimally rebuilds a tree and does so in linear time and constant space.

Related

Why naive implementation of Dijkstra's shortest path algorithm takes O(nm) time

I'm watching the coursera lectures about algorithm and the professor introduces that the naive implementation of Dijkstra's shortest path algorithm, without using heaps, takes O(nm) time(n is the number of vertices, and m is the number of edges)
It claims that the main loop will go through the rest vertices besides the source, which are n-1 vertices, this I can understand, but inside the loop, the algorithm will go through edges that with tail in the processed vertices and head in the unprocessed vertices, to minimize the next path. But why does it mean there are m edges to go through, we can just go through the edges that qualifies the criteria(tail in the processed vertices and head in the unprocessed vertices) even in a naive implementation right?
Could anyone please help me understand this? thanks.
When you consider big-O time complexity, you should think of it as a form of upper-bound. Meaning, if some input can possibly make your program run in O(N) while some other input can possibly make your program run in O(NM), we generally say that this is an O(NM) program (especially when dealing with algorithms). This is called considering an algorithm in its worst-case.
There are rare cases where we don't consider the worst-case (amortized time complexity analysis, time complexity with random elements to it such as quicksort). In the case of quicksort, for example, the worst-case time complexity is O(N^2), but the chance of that happening is so, so small that in practice, if we select a random pivot, we can expect O(N log N) time complexity. That being said, unless you can guarantee (or have large confidence that) your program runs faster than O(NM), it is an O(NM) program.
Let's consider the case of your naive implementation of Dijkstra's algorithm. You didn't specify whether you were considering directed or undirected graphs, so I'll assume that the graph is undirected (the case for a directed graph is extremely similar).
We're going through all the nodes, besides the first node. This means we already have an O(N) loop.
In each layer of the loop, we're considering all the edges that stem from a processed node to an unprocessed node.
However, in the worst-case, there are close to O(M) of these in each layer of the loop; not O(1) or anything less than O(M).
There's an easy way to prove this. If each edge were to only be visited a single time, then we can say that the algorithm runs in O(M). However, in your naive implementation of Dijkstra's algorithm, the same edge can be considered multiple times. In fact, asymptomatically speaking, O(N) times.
You can try creating a graph yourself, then dry-running the process of Dijkstra on paper. You'll notice that each edge can be considered up to N times: once in each layer of the O(N) loop.
In other words, the reason we can't say the program is faster than O(NM) is because there is no guarantee that each edge isn't processed N times, or less than O(log N) times, or less than O(sqrt N) times, etc. Therefore, the best upper bound we can give is N, although in practice it may be less than N by some sort of constant. However, we do not consider this constant in big-O time complexity.
However, your thought process may lead to a better implementation of Dijsktra. When considering the algorithm, you may realize that instead of considering every single edge that goes from processed to unprocessed vertices in every iteration of the main loop, we only have to consider every single edge that is adjacent to the current node we are on (I'm talking about something like this or this).
By implementing it like so, you can achieve a complexity of O(N^2).

Ternary tree time complexity

I've an assignment to explain the time complexity of a ternary tree, and I find that info on the subject on the internet is a bit contradictory, so I was hoping I could ask here to get a better understanding.
So, with each search in the tree, we move to the left or right child a logarithmic amount of times, log3(n), with n being the amount of String in the tree, correct? And no matter what, we would also have to traverse down the middle child L number of times, where L is the length of the prefix we are searching.
Does the running time then come out to O(log3(n)+L)? I see many people simply saying that it runs in logarithmic time, but does Linear time not grow faster, and hence dominate?
Hope I'm making sense, thanks for any answers on the subject!
If the tree is balanced, then yes, any search that needs to visit only one child per iteration will run in logarithmic time.
Notice that O(log_3(n) = O(ln(n) / ln(3)) = O(ln(n) * c) = O(ln(n))
so the base of the logarithm does not matter. We say logarithmic time, O(log n).
Notice also that a balanced tree has a height of O(log(n)), where n is the number of nodes. So it looks like your L describes the height of the tree and is therefore also O(log n), so not linear w.r.t. n.
Does this answer your questions?

Recursive and Iterative DFS Algorithm Time Complexity and Space Complexity

So I am having some problems understanding why the time complexity of a recursive DFS and an iterative DFS is the same, perhaps someone can guide me through an easy explanation?
Thanks in advance.
A recursive implementation requires, in the worst case, a number of stack frames (invocations of subroutines that have not finished running yet) proportional to the number of vertices in the graph. This worst-case bound is reached on, e.g., a path graph if we start at one end.
An iterative implementation requires, in the worst case, a number of stack entries proportional to the number of vertices in the graph. The same inputs reach this worst-case time as for the recursive implementation.

Polynomial Time Reduction Gadget that runs in poly time but creates n! size output.

Say someone has found a way to create a graph given a CNF boolean expression in O(n^3) time, and that any spanning tree of this special graph will be a solution to the CNF equation.
The scenario seems to be hinting that the someone has found a solution to the SAT problem and solved P=NP by reducing the SAT problem to the Spanning Tree problem with a gadget (reduction) that runs in only O(n^3) time.
But what if the graph that their algorithm creates has n! or 2^n nodes and edges?
In that scenario, while a spanning tree algorithm such as DFS or BFS may run in linear time on the number of nodes/edges, it would NOT be running in poly time on the number of inputs to the boolean expression. And so the person would not have found an efficient algorithm to the SAT problem since running the full solution would take n! time to evaluate.
Is this reasoning correct?
Your reasoning is correct as you have identified that the algorithm takes n! rather than n^3 time to run.

Need an efficient selection algorithm?

I am looking for an algorithm for selecting A [N/4] the element in an unsorted array A where N is the Number of elements of the array A. I want the algorithm to do the selection in sublinear times .I have knowledge of basic structures like a BST etc? Which one will be the best algorithm for me keeping in mind I want it to be the fastest possible and should not be too tough for me to implement.Here N can vary upto 250000.Any help will be highly appreciated.Note array can have non unique elements
As #Jerry Coffin mentioned, you cannot hope to get a sublinear time algorithm here unless you are willing to do some preprocessing up front. If you want a linear-time algorithm for this problem, you can use the quickselect algorithm, which runs in expected O(n) time with an O(n2) worst-case. The median-of-medians algorithm has worst-case O(n) behavior, but has a high constant factor. One algorithm that you might find useful is the introselect algorithm, which combines the two previous algorithms to get a worst-case O(n) algorithm with a low constant factor. This algorithm is typically what's used to implement the std::nth_element algorithm in the C++ standard library.
If you are willing to do some preprocessing ahead of time, you can put all of the elements into an order statistic tree. From that point forward, you can look up the kth element for any k in time O(log n) worst-case. The preprocessing time required is O(n log n), though, so unless you are making repeated queries this is unlikely to be the best option.
Hope this helps!

Resources