Different interpretations of height of a binary tree - algorithm

I am studying data structures and algorithms and this thing is really confusing me
Height of a binary tree, as it is also used in AVL search tree.
According to the book I am following "DATA STRUCTURES by Lipschutz" , it says "the depth (or height) of a tree T is the maximum number of nodes in a branch of T. This turns out to be 1 more than the largest level number of T. The tree 7 in figure 7.1 has depth 5."
figure 7.1 :
A
/ \
/ \
/ \
/ \
B C
/ \ / \
D E G H
/ / \
F J K
/
L
But, on several other resources, height has been calculated differently, though same definition is given. For example as I was reading from internet http://www.cs.utexas.edu/users/djimenez/utsa/cs3343/lecture5.html
" Here is a sample binary tree:
1
/ \
/ \
/ \
/ \
2 3
/ \ / \
/ \ / \
/ \ / \
6 7 4 5
/ \ / /
9 10 11 8
The height of a tree is the maximum of the depths of all the nodes. So the tree above is of height 3. "
Another source http://www.comp.dit.ie/rlawlor/Alg_DS/searching/3.%20%20Binary%20Search%20Tree%20-%20Height.pdf
says, "Height of a Binary Tree
For a tree with just one node, the root node, the height is defined to be 0, if there are 2
levels of nodes the height is 1 and so on. A null tree (no nodes except the null node)
is defined to have a height of –1. "
Now these last 2 explanations comply with each other but not with the example given in the book.
Another source says "There are two conventions to define height of Binary Tree
1) Number of nodes on longest path from root to the deepest node.
2) Number of edges on longest path from root to the deepest node.
In this post, the first convention is followed. For example, height of the below tree is 3.
1
/ \
2 3
/ \
4 5
"
In this, I want to ask how can the number of nodes and edges between root and leaf be the same ?
And what will be the height of a leaf node, according to the book it should be 1 (as the number of largest level is 0, so height should be 0+1=1,
but it is usually said height of leaf node is 0.
Also why does the book mention depth and height as the same thing?
This thing is really really confusing me, I have tried clarifying from a number of sources but cant seem to choose between the two interpretations.
Please help.
==> I would like to add to it since now I am accepting the conventions of the book,
in the topic of AVL search trees where we need to calculate the BALANCE FACTOR (which is the difference of the heights left and right subtrees)
it says :
C (-1)
/ \
(0) A G (1)
/
D (0)
The numbers in the brackets are the balance factors.
Now, If I am to follow the book height of D is 1 and right subtree of G has height (-1) since its empty, so Balance factor of G should be = 1-(-1)=2!
Now why has it taken the height of D to be 0 here ?
PLEASE HELP.

The exact definition of height doesn't matter if what you care about is balance factor. Recall that balance factor is
height(left) - height(right)
so if both are one larger or one smaller than in your favorite definition of height, the balance factor doesn't change, as long as you redefine the height of an empty tree accordingly.
Now the problems is that the "maximum number of nodes in a branch" definition is both recursive but doesn't specify a base case. But since the height of a single-element tree is one according to this definition, the obvious choice for the height of a zero-element tree is zero, and if you work out the formulas you'll find this works.
You can also arrive at the zero value by observing that the base case of the other definition is -1, and otherwise it always gives a value one less than the "max. nodes in a branch" definition.

Related

How do you find the number of leaves at the lowest level of a complete binary tree?

I'm trying to define an algorithm that returns the number of leaves at the lowest level of a complete binary tree. By a complete binary tree, I mean a binary tree whose every level, except possibly the last, is filled, and all nodes in the last level are as far left as possible.
For example, if I had the following complete binary tree,
_ 7_
/ \
4 9
/ \ / \
2 6 8 10
/ \ /
1 3 5
the algorithm would return '3' since there are three leaves at the lowest level of the tree.
I've been able to find numerous solutions for finding the count of all the leaves in regular or balanced binary trees, but so far I haven't had any luck with the particular case of finding the count of the leaves at the lowest level of a complete binary tree. Any help would be appreciated.
Do a breadth-first search, so you can aswell find a number of nodes on each level.
Some pseudo code
q <- new queue of (node, level) data
add (root, 0) in q
nodesPerLevel <- new vector of integers
while q is not empty:
(currentNode, currentLevel) <- take from top of q
nodesPerLevel[currentLevel] += 1
for each child in currentNode's children:
add (child, currentLevel + 1) in q
return last value of nodesPerLevel

Balancing arithmetic expression tree with +, - operators

Given a binary arithmetic expression tree consisting of only addition and subtraction operators, and numbers, how to balance the tree as much as possible? The task is to balance the tree without evaluating the expression, that is the number of nodes should stay the same.
Example:
+ +
/ \ / \
+ 15 >>>>>> - +
/ \ / \ / \
5 - 6 4 5 15
/ \
6 4
Addition is commutative and associative and that allows for balancing. Commutativity allows for swapping of children of consecutive '+' nodes. Associativity allows for rotations. In the above example, the transformation performed can be viewed as
Rotation right on '+' at the root.
Swapping of '5' and '-' nodes.
I was thinking of doing an in order traversal and first balancing any sub-trees. I would try to balance any sub-tree with two consecutive '+' nodes by trying all possible arrangements of nodes (there are only 12 of them) to hopefully decrease the total height of the tree. This method should reduce the height of the tree by at most 1 at any step. However, I cannot determine whether it will always give a tree of minimum height, especially when there are more than 2 consecutive '+' nodes.
Another approach could be to read the expression tree into an array and substitute any '-' subtree with a variable. And then us DP to determine the best places for brackets. This must be done bottom up, so that any '-' subtree is already balanced when it is considered by DP algorithm. However, I am worried because there could be (n+1)! ways to arrange nodes and brackets. While I am looking for an O(n) algorithm.
Is it a known problem and is there a specific approach to it?
At the risk of doing something vaguely like "evaluating" (although it isn't in my opinion), I'd do the following:
Change the entire tree to addition nodes, by propagating negation markers down to the roots. A simple way to do this would be to add a "colour" to every leaf node. The colour of a node can be computed directly during a tree walk. During the walk, you keep track of the number (or the parity, since that's the only part we're interested in) of right-hand links from a - nodes taken; when a leaf is reached, it is coloured green if the parity is even and red if the parity is odd. (Red leaves are negated.) During the walk, - nodes are changed to +.
+ +
/ \ / \
+ 15 >>>>>> + 15
/ \ / \
5 - 5 +
/ \ / \
6 4 6 -4
Now minimise the depth of the tree by constructing a minimum depth binary tree over top of the leaves, taking the leaves in order without regard to the previous tree structure:
+ +
/ \ / \
+ 15 >>>>>> + +
/ \ / \ / \
5 + 5 6 -4 15
/ \
6 -4
Turn the colours back into - nodes. The easy transforms are nodes with no red children (just remove the colour) and nodes with exactly one red child and one green child. These latter nodes are turned into - nodes; if the red child is on the left, then the children are also reversed.
The tricky case is nodes all of whose children are red. In that case, move up the tree until you find a parent which has some green descendant. The node you find must have two children (since otherwise its only child would have to have a green descendant), of which exactly one child has green descendants. Then, change that node to -, reverse its children if the right-hand child has a green descendant, and recolour green all the children of the (possibly new) right-hand child.
+ +
/ \ / \
+ + >>>>>> + -
/ \ / \ / \ / \
5 6 -4 15 5 6 15 4
Perhaps it's worth pointing out that the root node has a green descendant on the left-hand side because the very first leaf node is green. That's sufficient to demonstrate that the above algorithm covers all cases.

Do we have to create a tree all the nodes of which have 3 children?

Steps to build Huffman Tree
Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.
Create a leaf node for each unique character and build a min heap of all leaf nodes (Min Heap is used as a priority queue. The value of frequency field is used to compare two nodes in min heap. Initially, the least frequent character is at root)
Extract two nodes with the minimum frequency from the min heap.
Create a new internal node with frequency equal to the sum of the two nodes frequencies. Make the first extracted node as its left child and the other extracted node as its right child. Add this node to the min heap.
Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
At a heap, a node can have at most 2 children, right?
So if we would like to generalize the Huffman algorithm for coded words in ternary system (i.e. coded words using the symbols 0 , 1 and 2 ) what could we do? Do we have to create a tree all the nodes of which have 3 children?
EDIT:
I think that it would be as follows.
Steps to build Huffman Tree
Input is array of unique characters along with their frequency of occurrences and output is Huffman Tree.
Create a leaf node for each unique character and build a min heap of all leaf nodes
Extract three nodes with the minimum frequency from the min heap.
Create a new internal node with frequency equal to the sum of the three nodes frequencies. Make the first extracted node as its left child, the second extracted node as its middle child and the third extracted node as its right child. Add this node to the min heap.
Repeat steps#2 and #3 until the heap contains only one node. The remaining node is the root node and the tree is complete.
How can we prove that the algorithm yields optimal ternary codes?
EDIT 2: Suppose that we have the frequencies 5,9,12,13,16,45.
Their number is even, so we add a dummy node with frequency 0. Do we put this at the end of the array? So, will it be as follows?
Then will we have the following heap?
Then:
Then:
Or have I understood it wrong?
Yes! you have to create all nodes with 3 children. Why 3? you can also have n-ary huffman coding using nodes with n child. The tree will look something like this-(for n=3)
*
/ | \
* * *
/|\
* * *
Huffman Algorithm for Ternary Codewords
I am giving the algorithms for easy reference.
HUFFMAN_TERNARY(C)
{
IF |C|=EVEN
THEN ADD DUMMY CHARACTER Z WITH FREQUENCY 0.
N=|C|
Q=C; //WE ARE BASICALLY HEAPIFYING THE CHARACTERS
FOR I=1 TO floor(N/2)
{
ALLOCATE NEW_NODE;
LEFT[NEW_NODE]= U= EXTRACT_MIN(Q)
MID[NEW_NODE] = V= EXTRACT_MIN(Q)
RIGHT[NEW_NODE]=W= EXTRACT_MIN(Q)
F[NEW_NODE]=F[U]+F[V]+F[W];
INSERT(Q,NEW_NODE);
}
RETURN EXTRACT_MIN(Q);
} //END-OF-ALGO
Why are we adding extra nodes? To make the number of nodes odd.(Why?) Because we want to get out of the for loop with just one node in Q.
Why floor(N/2)?
At first we take 3 nodes. Then replace with it 1 node.There are N-2 nodes.
After that we always take 3 nodes (if not available 1 node,it is never possible to get 2 nodes because of the dummy node) and replace with 1. In each iteration we are reducing it by 2 nodes. So that's why we are using the term floor(N/2).
Check it yourself in paper using some sample character set. You will understand.
CORRECTNESS
I am taking here reference from "Introduction to Algorithms" by Cormen, Rivest.
Proof: The step by step mathematical proof is too long to post here but it is quite similar to the proof given in the book.
Idea
Any optimal tree has the lowest three frequencies at the lowest level.(We have to prove this).(using contradiction) Suppose it is not the case then we could switch a leaf with a higher frequency from the lowest level with one of the lowest three leaves and obtain a lower average length. Without any loss of generality, we can assume that all the three lowest frequencies are the children of the same node. if they are at the same level, the average length does not change irrespective of where the frequencies are). They only differ in the last digit of their codeword (one will be 0,1 or 2).
Again as the binary codewords we have to contract the three nodes and make a new character out of it having frequency=total of three node's(character's) frequencies. Like binary Huffman codes, we see that the cost of the optimal tree is the sum of the tree
with the three symbols contracted and the eliminated sub-tree which had the nodes before contraction. Since it has been proved that the sub-tree has
to be present in the final optimal tree, we can optimize on the tree with the contracted newly created node.
Example
Suppose the character set contains frequencies 5,9,12,13,16,45.
Now N=6-> even. So add dummy character with freq=0
N=7 now and freq in C are 0,5,9,12,13,16,45
Now using min priority queue get 3 values. 0 then 5 then 9.
Add them insert new char with freq=0+9+5 in priority queue. This way continue.
The tree will be like this
100
/ | \
/ | \
/ | \
39 16 45 step-3
/ | \
14 12 13 step-2
/ | \
0 5 9 step-1
Finally Prove it
I will now go to straight forward mimic of the proof of Cormen.
Lemma 1. Let C be an alphabet in which each character d belonging to C has frequency c.freq. Let
x ,y and z be three characters in C having the lowest frequencies. Then there exists
an optimal prefix code for C in which the codewords for x ,y and z have the same
length and differ only in the last bit.
Proof:
Idea
First consider any tree T generating arbitrary optimal prefix code.
Then we will modify it to make a tree representing another optimal prefix such that the characters x,y,z appears as sibling nodes at the maximum depth.
If we can construct such a tree then the codewords for x,y and z will have the same length and differ only in the last bit.
Proof--
Let a,b,c be three characters that are sibling leaves of maximum depth in T .
Without loss of generality, we assume that a.freq < b:freq < c.freq and x.freq < y.freq < z.freq.
Since x.freq and y.freq and z.freq are the 3 lowest leaf frequencies, in order (means there are no frequencies between them) and a.freq
, b.freq and c.freq are two arbitrary frequencies, in order, we have x.freq < a:freq and
y.freq < b.freq and z.freq< c.freq.
In the remainder of the proof we can have x.freq=a.freq or y.freq=b.freq or z.freq=c.freq.
But if x.freq=b.freq or x.freq=c.freq
or y.freq=c.freq
then all of them are same. WHY??
Let's see. Suppose x!=y,y!=z,x!=z but z=c and x<y<z in order and aa<b<c.
Also x!=a. --> x<a
y!=b. --> y<b
z!=c. --> z<c but z=c is given. This contradicts our assumption. (Thus proves).
The lemma would be trivially true. Thus we will assume
x!=b and x!=c.
T1
* |
/ | \ |
* * x +---d(x)
/ | \ |
y * z +---d(y) or d(z)
/|\ |
a b c +---d(a) or d(b) or d(c) actually d(a)=d(b)=d(c)
T2
*
/ | \
* * a
/ | \
y * z
/|\
x b c
T3
*
/ | \
* * x
/ | \
b * z
/|\
x y c
T4
*
/ | \
* * a
/ | \
b * c
/|\
x y z
In case of T1 costt1= x.freq*d(x)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(a)*a.freq + b.freq*d(b) + c.freq*d(c)
In case of T2 costt2= x.freq*d(a)+ cost_of_other_nodes + y.freq*d(y) + z.freq*d(z) + d(x)*a.freq + b.freq*d(b) + c.freq*d(c)
costt1-costt2= x.freq*[d(x)-d(a)]+0 + 0 + 0 + a.freq[d(a)-d(x)]+0 + 0
= (a.freq-x.freq)*(d(a)-d(x))
>= 0
So costt1>=costt2. --->(1)
Similarly we can show costt2 >= costt3--->(2)
And costt3 >= costt4--->(3)
From (1),(2) and (3) we get
costt1>=costt4.-->(4)
But T1 is optimal.
So costt1<=costt4 -->(5)
From (4) and (5) we get costt1=costt2.
SO, T4 is an optimal tree in which x,y,and z appears as sibling leaves at maximum depth, from which the lemma follows.
Lemma-2
Let C be a given alphabet with frequency c.freq defined for each character c belonging to C.
Let x , y, z be three characters in C with minimum frequency. Let C1 be the
alphabet C with the characters x and y removed and a new character z1 added,
so that C1 = C - {x,y,z} union {z1}. Define f for C1 as for C, except that
z1.freq=x.freq+y.freq+z.freq. Let T1 be any tree representing an optimal prefix code
for the alphabet C1. Then the tree T , obtained from T1 by replacing the leaf node
for z with an internal node having x , y and z as children, represents an optimal prefix
code for the alphabet C.
Proof.:
Look we are making a transition from T1-> T.
So we must find a way to express the T i.e, costt in terms of costt1.
* *
/ | \ / | \
* * * * * *
/ | \ / | \
* * * ----> * z1 *
/|\
x y z
For c belonging to (C-{x,y,z}), dT(c)=dT1(c). [depth corresponding to T and T1 tree]
Hence c.freq*dT(c)=c.freq*dT1(c).
Since dT(x)=dT(y)=dT(z)=dT1(z1)+1
So we have `x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)=(x.freq+y.freq+z.freq)(dT1(z)+1)`
= `z1.freq*dT1(z1)+x.freq+y.freq+z.freq`
Adding both side the cost of other nodes which is same in both T and T1.
x.freq*dT(x)+y.freq*dT(y)+z.freq*dT(z)+cost_of_other_nodes= z1.freq*dT1(z1)+x.freq+y.freq+z.freq+cost_of_other_nodes
So costt=costt1+x.freq+y.freq+z.freq
or equivalently
costt1=costt-x.freq-y.freq-z.freq ---->(1)
Now we prove the lemma by contradiction.
We now prove the lemma by contradiction. Suppose that T does not represent
an optimal prefix code for C. Then there exists an optimal tree T2 such that
costt2 < costt. Without loss of generality (by Lemma 1), T2 has x and y and z as
siblings.
Let T3 be the tree T2 with the common parent of x and y and z replaced by a
leaf z1 with frequency z1.freq=x.freq+y.freq+z.freq Then
costt3 = costt2-x.freq-y.freq-z.freq
< costt-x.freq-y.freq-z.freq
= costt1 (From 1)
yielding a contradiction to the assumption that T1 represents an optimal prefix code
for C1. Thus, T must represent an optimal prefix code for the alphabet C.
-Proved.
Procedure HUFFMAN produces an optimal prefix code.
Proof: Immediate from Lemmas 1 and 2.
NOTE.: Terminologies are from Introduction to Algorithms 3rd edition Cormen Rivest

Is this algorithm for finding a maximum independent set in a graph correct?

We have the following input for the algorithm:
A graph G with no cycles (aka a spanning-tree) where each node has an associated weight.
I want to find an independent set S such that:
No two elements in S form an edge in G
There is no other possible subset which satisfies the above condition, for which there is a greater weight than S[0] + S[1] + ... + S[n-1] (where len(S)==n).
This is the high-level pseudocode I have so far:
MaxWeightNodes(SpanningTree S):
output = {0}
While(length(S)):
o = max(node in S)
output = output (union) o
S = S \ (o + adjacentNodes(o))
End While
Return output
Can someone tell me off the bat whether I've made any errors, or if this algorithm will give me the result I want?
The algorithm is not valid, since you'll soon face a case when excluding the adjacent nodes of an initial maximum may be the best local solution, but not the best global decision.
For example, output = []:
10
/ \
100 20
/ \ / \
80 90 10 30
output = [100]:
x
/ \
x 20
/ \ / \
x x 10 30
output = [100, 30]:
x
/ \
x x
/ \ / \
x x 10 x
output = [100, 30, 10]:
x
/ \
x x
/ \ / \
x x x x
While we know there are better solutions.
This means you're down on a greedy algorithm, without an optimal substructure.
I think the weights of the vertices make greedy solutions difficult. If all weights are equal, you can try choosing a set of levels of the tree (which obviously is easiest with a full k-ary tree, but spanning trees generally don't fall into that class). Maybe it'll be useful for greedy approximation to think about the levels as having a combined weight, since you can always choose all vertices of the same level of the tree (independent of which vertex you root it at) to go into the same independent set; there can't be an edge between two vertices of the same level. I'm not offering a solution because this seems like a difficult problem to me. The main problems seem to be the weights and the fact that you can't assume that you're dealing with full trees.
EDIT: actually, always choosing all vertices of one level seems to be a bad idea as well, as Rubens's example helps visualize; imagine the vertex on the second level on the right of his tree had a weight of 200.

algorithm for shortest weighted path - frequently changing edges

I'm trying to solve a graph problem. Graph is weighted and undirected.
Graph size:
no. of vertices upto 200,000
no. of edges upto 200,000
I've to find a shortest path between given 2 nodes (S & D) in graph. I'm using Dijkstra's algorithm to find this.
Now the problem is graph is frequently changing. I've to find shortest path between S & D, if particular edge is removed from graph. I'm calculating new path again using Dijkstra's algorithm by treating graph as new graph after edge removal. However this approach is too slow as there might be 200,000 edges and I'll be computing shortest path 200,000 times for each edge removal.
I was thinking of using any memoization technique but unable to figure out as shortest path may change altogether if particular edge is removed from graph.
//more details
Source and destination are fixed throughout the problem.
There will be upto 200,000 queries for each edge removal. at a time , only one edge is removed from initial graph for each test case
Since no edge is added, the shortest path after removal will always be greater than than (or equal to) the original.
Unless the edge removed is part of the original shortest path, the result will not change.
If it is a part of the original shortest path, then running the algorithm again is the worst case solution.
If you are not looking for an exact answer, you could try approximate local methods to fill in the missing edge.
You can augment the Dijkstra algorithm to store information which will allow you to backtrack to a particular state during the initial search.
This means that every time you end up changing the shortest path during relaxation, record the changes made to the data-structures, including the heap. That allows you to restore the state of the algorithm to any point during the first run.
When you remove an edge which was on the Shortest Path, you need to go back to the point right before the edge was relaxed, and then restart the algorithm as if the removed edge was never present.
My suggestion:
First use Dijkstra to find the shortest path from Source to Destination, but walk from both the source and destination at the same time (use negative distance numbers to indicate how far away from the destination you have traveled), always expand the one with the shortest distance (from either source or destination). Once you encounter a node that has a value from the reverse node then, the path to get to that node is the shortest.
Then remove the edge, if the edge is not part of the shortest path then return the current known shortest path
If the removed edge is part of the shortest path then perform the search again with known absolute distance greater (positive or negative) than the lesser either of the nodes that was removed. Add the previously known shortest path to the known results as positive when walking from the start and negative when walking from the end up to the broken segment. Now search from that starting point in both directions, if you hit a node that has a value set (positive or negative), or was part of the previous shortest path, then you will have found your new shortest path.
The major benefit of doing this is that:
you walk from both source and destination so unless your source node is by an edge you wander less nodes overall, and
you don't abandon the entire search results, even if the edge removed was the very first edge in the previous shortest path, you just have to find the path to reconnect in the shortest way.
Performance over brute recalculation each time will be considerable, even when the removed node is part of the previously known shortest path.
For how it works, consider this graph:
I
/
B---E
/ / H
A D /|
\ / \ / |
C---F--G
We want to get from A to H, to make it easy lets assume each edge is worth 1 (but it could be anything)
We start at A:
I
/
B---E
/ / H
0 D /|
\ / \ / |
C---F--G
Now set the value for H to start at 0:
I
/
B---E
/ / (0)
0 D / |
\ / \ / |
C---F---G
And Expand:
I
/
1---E
/ / (0)
0 D / |
\ / \ / |
1---F---G
Now we expand the next lowest value which will be H:
I
/
1---E
/ / (0)
0 D / |
\ / \ / |
1---(-1)--(-1)
Now we arbitrary pick B because it comes before C, F or G (they have the same absolute value):
I
/
1---2
/ / (0)
0 D / |
\ / \ / |
1---(-1)--(-1)
Then C
I
/
1---2
/ / (0)
0 2 / |
\ / \ / |
1---2 & (-1)--(-1)
Now we have a node that knows both it's positive value and it's negative value, hence we know it's distance to both A and H and since we were expanding the shortest node first this must be the shortest path, therefore we can say that the shortest path from A to H is A->C->F->H and costs ABS(2)+ABS(-1) = 3
Now suppose we removed the line C->F
I
/
1---2
/ / (0)
0 2 / |
\ / \ / |
1 2 & (-1)--(-1)
We then remove all known values with an absolute value above the lesser value of C and F (in this case is 1) leaving:
I
/
1---E
/ / (0)
0 D / |
\ / \ / |
1 (-1)--(-1)
Now again we expand as before, starting with B:
I
/
1---2
/ / (0)
0 D / |
\ / \ / |
1 (-1)--(-1)
Then C
I
/
1---2
/ / (0)
0 2 / |
\ / \ / |
1 (-1)--(-1)
Now F:
I
/
1---2
/ / (0)
0 2&(-2) / |
\ / \ / |
1 (-1)---(-1)
Hence we know the shortest path from A to H is now: A->C->D->F->H and costs ABS(2)+ABS(-2) = 4
This will work with any number of nodes, edges and edge weights, in the event that you have no further nodes to expand then you return your "No Route" response.
You can further optimize it by not resetting the node values of nodes that were in the previous shortest path, in doing so you lose the simple nature but it's not overly complex.
In the above example it wouldn't make a difference initially, but it would make a difference if we then removed the linkage A->C because we would remember the costs of C and the other nodes in the chain (as negative)
The benefit over just using a single-sided Dijkstra and rolling back to before the removed edge can be shown below:
I
/
1---E
/ / H
0 D /|
\ / \ / |
1 F--G
Now we would expand B:
I
/
1---2
/ / H
0 D /|
\ / \ / |
1 F--G
C:
I
/
1---2
/ / H
0 2 /|
\ / \ / |
1 F--G
D:
I
/
1---2
/ / H
0 2 /|
\ / \ / |
1 3--G
E:
3
/
1---2
/ / H
0 2 /|
\ / \ / |
1 3--G
F:
3
/
1---2
/ / 4
0 2 /|
\ / \ / |
1 3--4
And then we would determine the path is now A->C->D->F->H and it costs 4. Notice we needed to do 5 expand steps here, compare that to the 3 that we needed for the by-directional way.
As the removed edge gets more towards the middle of the path we will get a greatly improved saving by using a bi-directional graph walking algorithm to recalculate the new path. Unless there's say 50 nodes hanging off H but only a single path all the way from A to H but that's a fringe case and unlikely to happen in a normal network, in the event that it does this will still work good on average, in contrast to the reverse where there is only one direct path from H to A but 50 edges attached to A.
Given you have some ~200,000 edges with potentially up to 200,000 nodes, you're likely going to see considerable savings compared to my example graph which has only 9 nodes and 11 edges. This is based on the idea that we're looking for the algorithm that has the fewest node expansions as they are likely where the majority of the computational time will be spent looping over.
I have an idea:
First time do a Dijkstra and memorize all the edge's for the shortest path from Source to Destination.
When you make a removal you check if you deleted an edge from your shortest path. If no the result is the same. If yes you do another Dijkstra.
Another idea:
First perform a Dijkstra and for each vertex keep in mind all the elements that depend of that vertex.
When you perform a removal you should do something like a Topological Sorting and for all the vertex that depends on your vertex do an update and with those vertex perform a partial Dikstra.
If the removed edge is not from the shothest path, than the path will remain the same. Otherwise there is probably no good exact solution, because the problem is monotonic - shorthest path sp from A to B (sp(A, B)) using node C consists of all shorthest paths such that sp(A, B) = sp(A, C) + sp(C, B) (for all C).
By removing one (very good) edge you may destroy all of these paths. The best solution (but not exact) might be to use Floyd-Warshall algorithm to calculate all shorthest path between all pairs of nodes and that, after removal of the edge from path try to repair the path by using the shorthest detour.

Resources