Optimise Floyd-Warshall for symmetric adjacency matrix - algorithm

Is there an optimisation that lowers the constant factor of the runtime of Floyd-Warshall, if you are guaranteed to have a symmetric adjacency matrix?

After some thought I came up with:
for (int k = 0; k < N; ++k)
for (int i = 0; i < N; ++i)
for (int j = 0; j <= i; ++j)
dist[j][i] = dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j]);
Now of course we both need to show it's correct and faster.
Correctness is harder to prove, since it relies on the proof of Floyd-Warshall's which is non-trivial. A pretty good proof is given here: Floyd-Warshall proof
The input matrix is symmetric. Now the rest of the proof uses a modified Floyd-Warshall's proof to show that the order of the calculations in the 2 inner loops doesn't matter and that the graph stays symmetrical after each step. If we show both of these conditions are true then both algorithms do the same thing.
Let's define dist[i][j][k] as the distance from i to j using using only vertices from the set {0, ..., k} as intermediate vertices on the path from i to j.
dist[i][j][k-1] is defined as the weight of the edge from i to j. If there is no edge in between this weight is taken to be infinity.
Now using the same logic as used in the proof linked above:
dist[i][j][k] = min(dist[i][j][k-1], dist[i][k][k-1] + dist[k][j][k-1])
Now in the calculation of dist[i][k][k] (and similarly for dist[k][i][k]):
dist[i][k][k] = min(dist[i][k][k-1], dist[i][k][k-1] + dist[k][k][k-1])
Now since dist[k][k][k-1] cannot be negative (or we'd have a negative loop in the graph), this means that dist[i][k][k] = dist[i][k][k-1]. Since if dist[k][k][k-1] = 0 then both parameters are the same, otherwise the first parameter of the min() is chosen.
So now, because dist[i][k][k] = dist[i][k][k-1], when calculating dist[i][j][k] it doesn't matter if dist[i][k] or dist[k][j] already allow k in their paths. Since dist[i][j][k-1] is only used for the calculation of dist[i][j][k], dist[i][j] will stay dist[i][j][k-1] in the matrix until dist[i][j][k] is calculated. If i or j equals k then the above case applies.
Therefore, the order of the calculations doesn't matter.
Now we need to show that dist[i][j] = dist[j][i] after all steps of the algorithm.
We start out with a symmetric grid thus dist[a][b] = dist[b][a], for all a and b.
dist[i][j] = min(dist[i][j], dist[i][k] + dist[k][j])
= min(dist[j][i], dist[k][i] + dist[j][k])
= min(dist[j][i], dist[j][k] + dist[k][i])
= dist[j][i]
Therefore our assignment is both true and it will maintain the invariant that dist[a][b] = dist[b][a]. Therefore dist[i][j] = dist[j][i] after all steps of the algorithm
Therefore both algorithms yield the same, correct, result.
Speed is easier to prove. The inner loop is called just over half the number of times it is normally called, so the function is about twice as fast. Just made slightly slower because you still assign the same number of times, but this doesn't matter as min() is what takes up most of your time.
If you see anything wrong with my proof, however technical, feel free to point it out and I will attempt to fix it.
EDIT:
You can both speed up and save half the memory by changing the loop as such:
for (int k = 0; k < N; ++k) {
for (int i = 0; i < k; ++i)
for (int j = 0; j <= i; ++j)
dist[i][j] = min(dist[i][j], dist[i][k] + dist[j][k]);
for (int i = k; i < N; ++i) {
for (int j = 0; j < k; ++j)
dist[i][j] = min(dist[i][j], dist[k][i] + dist[j][k]);
for (int j = k; j <= i; ++j)
dist[i][j] = min(dist[i][j], dist[k][i] + dist[k][j]);
}
}
This just splits up the above for loops of the optimised algorithm, so it's still correct and it'll likely get the same speed, but uses half the memory.
Thanks to Chris Elion for the idea.

(Using the notation in the pseudo-code in the Wikipedia article) I believe (but haven't tested) that if the edgeCost matrix is symmetric, then the path matrix will also be symmetric after each iteration. Thus you only need to update half of the entries at each iteration.
At a lower level, you only need to store half of the matrix (since d(i,j) = d(j,i)), so you can reduce the amount of memory used, and hopefully reduce the number of cache misses since you'll access the same data multiple times.

Related

Analyze the time cost of the following algorithms using Θ notation

So many loops, I stuck at counting how many times the last loop runs.
I also don't know how to simplify summations to get big Theta. Please somebody help me out!
int fun(int n) {
int sum = 0
for (int i = n; i > 0; i--) {
for (int j = i; j < n; j *= 2) {
for (int k = 0; k < j; k++) {
sum += 1
}
}
}
return sum
}
Any problem has 2 stages:
You guess the answer
You prove it
In easy problems, step 1 is easy and then you skip step 2 or explain it away as "obvious". This problem is a bit more tricky, so both steps require some more formal thinking. If you guess incorrectly, you will get stuck at your proof.
The outer loop goes from n to 0, so the number of iterations is O(n). The middle loop is uncomfortable to analyze because its bounds depend on current value of i. Like we usually do in guessing O-rates, let's just replace its bounds to be from 1 to n.
for (int i = n; i > 0; i--) {
for (int j = 1; j < n; j *= 2) {
perform j steps
}
}
The run-time of this new middle loop, including the inner loop, is 1+2+4+...+n, or approximately 2*n, which is O(n). Together with outer loop, you get O(n²). This is my guess.
I edited the code, so I may have changed the O-rate when I did. So I must now prove that my guess is right.
To prove this, use the "sandwich" technique - edit the program in 2 different ways, one which makes its run-time smaller and one which makes its run-time greater. If you manage to make both new programs have the same O-rate, you will prove that the original code has the same O-rate.
Here is a "smaller" or "faster" code:
do n/2 iterations; set i=n/2 for each of them {
do just one iteration, where you set j = i {
perform j steps
}
}
This code is faster because each loop does less work. It does something like n²/4 iterations.
Here is a "greater" or "slower" code:
do n iterations; set i=n for each of them {
for (int j = 1; j <= 2 * n; j *= 2) {
perform j steps
}
}
I made the upper bound for the middle loop 2n to make sure its last iteration is for j=n or greater.
This code is slower because each loop does more work. The number of iterations of the middle loop (and everything under it) is 1+2+4+...+n+2n, which is something like 4n. So the number of iterations for the whole program is something like 4n².
We got, in a somewhat formal manner:
n²/4 ≤ runtime ≤ 4n²
So runtime = O(n²).
Here I use O where it should be Θ. O is usually defined as "upper bound", while sometimes it means "upper or lower bound, depending on context". In my answer O means "both upper and lower bound".

Time Complexity of 3 nested for loops

Although, I found pretty good replies to the same question! However, I want time complexity equation of the following piece of code
sum = 0; c1=11
for (i=0; i<N; i++) c2=6
for (j=0; j<N; j++) c2=6
for (j=0; j<N; j++) c2=7
sum += arr[i][j] c3=2
While each statement has a cost associated with it, I require complete time complexity equation and its answer.
Regards
The comments section got quite long so I am going to write up an answer summarizing everything.
Measuring Time Complexity
In Computer Science, we measure time complexity by the number of steps/iterations your algorithm takes to evaluate.
So if you have a simple array of length n and you go through this array only once, say to print all the elements, we say that this algorithm is O(n) because the time is takes to run will grow proportionally to the size of the array you have, thus n
You can think of Big-O O(..) as a higher order function that compares other functions. if we say f(x) = O(n) it means that you function grows at most as fast as y=n thus linearly. This means that if you were to plot these functions on a graph, there would be a point c x = c after which the graph of n will always be on top of f(x) for any given x > c. Big-O signifies upper bound of a function in terms of another function.
So let's look at your original question and what it means to be constant time. Say we have this function
def printFirst5(arr: Array[Int]) = {
for(i =0 ;i < 5; i++){
println(arr[i])
}
}
This is what we call a constant time algorithm. Can you see why? Because no matter what array you pass into this (as long as it has at least 5 elements), it will only go through the first 5 elements. You can pass it an array of length 100000000000 you can pass it an array of length 10 it doesn't matter. In each case it will only look at the first 5 elements. Meaning this function printFirst5 will never go above the line y=5 in terms of time complexity. These kind of functions are denoted O(1) for constant time.
Now, finally, let's look at you edited version. (I am not sure what you are trying to do in your example because it is syntactically wrong, so I will write my own example)
def groupAllBy3(array: Array[Int]) = {
for(i=0; i < array.length; i++){
for(j=0; j < array.length; j++){
for(k=0; k< array.length; k++{
println(s"$array[i], $array[j], $array[k]")
}
}
}
}
This functions time complexity is O(N3). Why? Let's take a look.
The innermost loop will go through N elements for every j
How many js are there? Well there will be N js for every i.
How many is are there? N many.
So in total we get numberof-i * numberof-j * numberof-k = N * N * N = O(N^3)
Just to make sure you understand this correctly, let's take a look at another example. What would happen if these loops weren't nested? If we had:
def printAllx3(array: Array[Int]) = {
for(i=0; i < array.length; i++){
println(s"array[i]")
}
for(j=0; j < array.length; j++){
println(s"array[j]")
}
for(k=0; k< array.length; k++{
println(s"array[k]")
}
}
What is the case here?
The first loop goes through N elements, the second loop goes through N elements, the third loop goes through N elements. But they don't depend on each other in terms of iterations so we get N + N + N = 3N = O(N)
Do you see the difference?
With all due respect, I believe you are missing some of the fundamentals of what time complexity is & how we measure it. There is only so much I can explain here, I highly recommend you do some reading on the subject and ask any further questions you don't understand.
Hope this helps

Path reconstruction - Pseudo-Multiply (Matrix multiplication) Algorithm

Context
I am currently working on path reconstruction in some of my graph algorithms. For the single-source-shortest-paths problem I used an array of predecessors to reconstruct the shortest path from one source node to all the other nodes.
Quick example:
[0, 3, 0, 0]
The shortest path from source 0 to target 1 would be [0, 3, 1] because starting from the target node 1 the path can be constructed by going backwards using the 'parent' array. 1 has been reached over 3 and 3 has been reached over 0. 0 is the source. Done.
The next algorithms are all-pairs-shortest-paths algorithms. The easiest example has been the Floyd-Warshall algorithm which results in a matrix containing all 'successor'-nodes. A good example for the reconstruction pseudo code can be found on Wikipedia - Floyd Warshall.
To summarize it: A matrix is used to store each successor from one specific source node. It basically follows the same approach as before just for each node as a source and going forward instead of backwards.
Question - How to create the matrix of successors in case of the pseudo multiply algorithm?
Let's have a look at the algorithm first:
for(int m = 0; m < nodeCount - 1; m++) {
Matrix nextResultMatrix = new Matrix(nodeCount, nodeCount, Integer.MAX_VALUE);
for(int i = 0; i < nodeCount; i++) {
for(int j = 0; j < nodeCount; j++) {
int value = Integer.MAX_VALUE;
for(int k = 0; k < nodeCount; k++) {
value = Math.min(
value,
resultMatrix.at(i, k) + sourceMatrix.at(k, j)
);
}
nextResultMatrix.setAt(i, j, value);
}
}
resultMatrix = nextResultMatrix;
}
In each iteration the matrix for the shortest paths of length m will be calculated. The inner loop is pretty similar to the matrix multiplication itself. In the most inner loop the algorithm checks wether the current path is shorter than the path from source i over k to target j. After the inner k-loop has finished the value inside of the new result matrix is set. Which leads to the problem:
In case of the Floyd-Warshall algorithm it was easier to identify wether the path has been shorter and which node is now the successor. In this case the value that has been calculated in the k-loop will be set anyway. Is it possible to determine the successor here?
Thoughts about a possible solution
The pseudo multiply algorithm provides a matrix for each iteration which represents the shortest paths of length m. Might this help to find a solution without increasing the - already quite bad - time complexity and without having to store each matrix at the same time.
I found an interesting idea in a comment here on stackoverflow which might lead to a solution reference. From what is stated there it seems to be quite heavy lifting for keeping track of the shortest paths. I haven't fully wrapped my head around the idea and how to implement this though.
My solution
So after stepping through the algorithm and making clear what each step exactly means I was finally able to figure out a solution. I will try to explain the changes in the code here but first let me present the solution:
for(int m = 0; m < nodeCount - 1; m++) {
Matrix nextResultMatrix = new Matrix(nodeCount, nodeCount, Integer.MAX_VALUE);
for(int i = 0; i < nodeCount; i++) {
for(int j = 0; j < nodeCount; j++) {
int value = resultMatrix.at(i, j);
int shorterPathFoundOverNode = prevMatrix.at(i, j);
// new shortest path from node i to j is
// the minimum path that can be found from node i over k to j
// k will be the node itself and every other node
for(int k = 0; k < nodeCount; k++) {
if(resultMatrix.at(i, k) != Graph.NO_EDGE && sourceMatrix.at(k, j) != Graph.NO_EDGE) {
if(value > resultMatrix.at(i, k) + sourceMatrix.at(k, j)) {
// update value if path is shorter
value = resultMatrix.at(i, k) + sourceMatrix.at(k, j);
shorterPathFoundOverNode = k;
}
}
}
nextResultMatrix.setAt(i, j, value);
prevMatrix.setAt(i, j, shorterPathFoundOverNode);
}
}
resultMatrix = nextResultMatrix;
}
A very basic but important idea was to replace the initialization value of value inside the j loop from Integer.MAX with the value that has previously been found or at first iteration the value that has been used to initialize the matrix (Integer.MAX). This was also important because the condition would have been true once per iteration which did not cause any problems before but now - since we perform more actions inside of the condition - it matters.
It was necessary to replace the Math.min method with an if condition to have the ability to do more than just setting the value.
To reconstruct the shortest paths the method of keeping track of the previous nodes is used. This is very similar to the single-source-shortest-paths problem as stated in the question. Of course it is required to use a matrix in this case because all nodes will be the source node.
To summarize the idea: Setup an additional matrix that keeps track of the previous node for each target node. When iterating through the k loop save the previous node if a shorter path has been found (Important: Only if it is actually shorter than the previous path).

Boolean matrix multiplication algorithm

This is my first question on stackoverflow. I've been solving some exercises from "Algorithm design" by Goodrich, Tamassia. However, I'm quite clueless about this problem. Unusre where to start from and how to proceed. Any advice would be great. Here's the problem:
Boolean matrices are matrices such that each entry is 0 or 1, and matrix multiplication is performed by using AND for * and OR for +. Suppose we are given two NxN random Boolean matrices A and B, so that the probability that any entry
in either is 1, is 1/k. Show that if k is a constant, then there is an algorithm for multiplying A and B whose expected running time is O(n^2). What if k is n?
Matrix multiplication using the standard iterative approach is O(n3), because you have to iterate over n rows and n columns, and for each element do a vector multiply of one of the rows and one of the columns, which takes n multiplies and n-1 additions.
Psuedo code to multiply matrix a by matrix b and store in matrix c:
for(i = 0; i < n; i++)
{
for(j = 0; j < n; j++)
{
int sum = 0;
for(m = 0; m < n; m++)
{
sum += a[i][m] * b[m][j];
}
c[i][j] = sum;
}
}
For a boolean matrix, as specified in the problem, AND is used in
place of multiplication and OR in place of addition, so it becomes
this:
for(i = 0; i < n; i++)
{
for(j = 0; j < n; j++)
{
boolean value = false;
for(m = 0; m < n; m++)
{
value ||= a[i][m] && b[m][j];
if(value)
break; // early out
}
c[i][j] = value;
}
}
The thing to notice here is that once our boolean "sum" is true, we can stop calculating and early out of the innermost loop, because ORing any subsequent values with true is going to produce true, so we can immediately know that the final result will be true.
For any constant k, the number of operations before we can do this early out (assuming the values are random) is going to depend on k and will not increase with n. At each iteration there will be a (1/k)2 chance that the loop will terminate, because we need two 1s for that to happen and the chance of each entry being a 1 is 1/k. The number of iterations before terminating will follow a Geometric distribution where p is (1/k)2, and the expected number of "trials" (iterations) before "success" (breaking out of the loop) doesn't depend on n (except as an upper bound for the number of trials) so the innermost loop runs in constant time (on average) for a given k, making the overall algorithm O(n2). The Geometric distribution formula should give you some insight about what happens if k = n. Note that in the formula given on Wikipedia k is the number of trials.

Proposed analysis of algorithm

I have been practicing analyzing algorithms lately. I feel like I have a pretty good understanding of analyzing non-recursive algorithms but I am unsure, and have just begun to embark on a full understanding of recursive algorithm as well. Although, I have not had a formal check on my methods and if what I have been doing is indeed correct
Would it be too much to ask if someone could check a few algorithms that I have implemented and analyzed and see if my understanding is along the right lines or if I am completely off.
here:
1)
sum = 0;
for (i = 0; i < n; i++){
for (j = 0; j < i*i; j++){
if (j % i == 0) {
for (k = 0; k < j; k++){
sum++;
}
}
}
}
My analysis of this one was O(n^5) due to:
Sum(i = 0 to n)[Sum(j = 0 to i^2)[Sum(k = 0 to j) of 1]]
which evaluated to:
(1/2)(n^5/5 + n^4/2 + n^3/3 - n/30) + (1/2)(n^3/3 + n^2/2 + n/6) + (1/2)(n^3/3 + n^2/2 + n/6) + n + 1.
Hence it is O(n^5)
Is this correct as an evaluation of the summations of the loops?
a triple summation. I have assumed that the if statement will always pass for worse case complexity. Is this a correct assumption for worst case?
2)
int tonyblair (int n, int a) {
if (a < 12) {
for (int i = 0; i < n; i++){
System.out.println("*");
}
tonyblair(n-1, a);
} else {
for (int k = 0; k < 3000; k++){
for (int j = 0; j < nk; j++){
System.out.println("#");
}
}
}
}
My analysis of this algorithm is O(infinity) due to the infinite recursion in the if statement if it is assumed to be true, which would be the worst case. Although, for pure analysis, I analyzed if this were not true and the if statement would not run. I then got a complexity of O(nk) due to:
Sum(k = 0 to 3000)[Sum(j = 0 to nk) of 1]
which then evaluated to nk(3001) + 3001. Hence is O(nk), where k is not discarded due to it controlling the number of iterations of the loop.
Number 1
I can't tell how you've derived your formula. Usually adding terms happens when there are multiple steps in an algorithm, such as precomputing data and then looking up values from the data. Instead, nested for loops implies multiplication. Also, the worst case is the best case for this snippet of code, because given a value of n, sum will be the same at the end.
To find the complexity, we want to find the number of times that the inner loop is evaluated. Summations are often easy to solve if they go from 1 to n, so I'm going to drop the 0s from them later on. If i is 0, the middle loop won't run, and if j is 0, the inner loop won't run. We can rewrite the code equivalently as:
sum = 0;
for (i = 1; i < n; i++)
{
for (j = 1; j < i*i; j++)
{
if (j % i == 0)
{
for (k = 0; k < j; k++)
{
sum++;
}
}
}
}
I could make my life harder by forcing the outer loop to start at 2, but I'm not going to. The outer loop now runs from 1 to n-1. The middle loop runs based on the current value of i, so we need to do a summation:
The middle for loop always goes to (i^2 - 1), and j will only be divisible by i for a total of (i - 1) times (i, i*2, i*3, ..., i*(i-2), i*(i-1)). With this, we get:
The middle loop then executes j times. The j in our summation is not the same as the j in the code though. The j in the summation represents each time the middle loop executes. Each time the middle loop executes, the j in the code will be (i * (number of executions so far)) = i * (the j in the summation). Therefore, we have:
We can move the i to in-between the two summations, as it is a constant for the inner summation. Then, the formula for the sum of 1 to n is well known: n*(n+1)/2. Because we are going to n - 1, we must subtract n out. This gives:
The summations for the sum of squares and the sum of cubes are also well known. Keeping in mind that we are only summing to n-1 in both cases, we must remember to subtract n^3 and n^2, respectively, and we get:
This is obviously n^4. If we solve it all the way, we get:
Number 2
For the last one, it is in fact O(infinity) if a < 12 because of the if statement. Well, technically everything is O(infinity), because Big-O only provides an upper bound on runtime. If a < 12, it is also omega(infinity) and theta(infinity). If only the else runs, then we have the summation from 1 to 2999 of i*n:
It's very important to notice that the summation from 1 to 2999 is a constant (it's 4498500). No matter how large a constant is, it's still a constant, and not dependent on n. We will end up throwing it out of the runtime calculations. Sometimes, when a theoretically fast algorithm has a large constant, it is practically slower than other algorithms that are theoretically slow. One example I can think of is Chazelle's linear time triangulation algorithm. No one has ever implemented it. In any case, we have 4498500 * n. This is theta(n):

Resources