Proving breadth-first traversal on graphs - algorithm

I am trying to prove the following algorithm to see if a there exists a path from u to v in a graph G = (V,E).
I know that to finish up the proof, I need to prove termination, the invariants, and correctness but I have no idea how. I think I need to use induction on the while loop but I am not exactly sure how.
How do I prove those three characteristics about an algorithm?

Disclaimer: I don't know how much formal you want your proof to be and I'm not familiar with formal proofs.
induction on the while loop: Is it true at the beginning? Does it remain true after a step (quite simple path property)?
same idea, induction on k (why k+1???): Is it true at the beginning? Does it remain true after a step (quite simple path property)?
Think Reach as a strictly increasing set.
Termination: maybe you can use a quite simple property linked to the diameter of the graph?
(This question could probably be better answered elsewhere, on https://cstheory.stackexchange.com/ maybe?)

There is a lot of possibilities. For example, for a Breadth First Search, we note that:
(1) The algorithm never visits the same node twice.(as any path back must be >= the length that put it in the discovered pile already.
(2) At every step, it adds exactly one node.
Thus, it clearly must terminate on any finite graph, as the set of nodes which are discoverable cannot be larger than the set of nodes which are in the graph.
Finally, since, give a start node, it will only terminate when it has reached every node which is connected by any path to the start node, it will always find a path between the start and target if it exists.
You can rewrite these logical steps above in deeper rigour if you like, for example, by showing that the list of visited nodes is strictly increasing, and non convergent (i.e. adding one to something repeatedly tends to infinity) and the termination condition must be met at at most some finite value, and a non convergent increasing function always intersects a given bound exactly once.
BFS is an easy example because it has such simple logic, but proving these things for a given algorithm may be extremely tricky.

Related

Is Dijkstra's algorithm deterministic?

I think that Dijkstra's algorithm is determined, so that if you choose the same starting vertex, you will get the same result (the same distances to every other vertex). But I don't think that it is deterministic (that it has defined the following operation for each operation), because that would mean that it wouldn't have to search for the shortest distances in the first place.
Am I correct? If I'm wrong, could you please explain why it is deterministic, and maybe give an example?
I'm not sure there is a universal definition of determinism, but Wikipedia defines it as...
... an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
So this requires both determinism of the output and determinism of the execution. The output of Dijkstra's algorithm is deterministic no matter how you look at it, because it's the length of the shortest path, and there is only one such length.
The execution of Dijkstra's algorithm in the abstract sense is not deterministic, because the final step is:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
If there are multiple nodes with the same smallest tentative distance, the algorithm is free to select one arbitrarily. This doesn't affect the output, but it does affect the order of operations within the algorithm.
A particular implementation of Dijkstra's algorithm, however, probably is deterministic, because the nodes will be stored in a deterministic data structure like a min heap. This will result in the same node being selected each time the program is run. Although things like hashtable salts may also affect determinism even here.
Allow me to expand on Thomas's answer.
If you look at an implementation of Dijkstra, such as this example: http://graphonline.ru/en/?graph=NnnNwZKjpjeyFnwx you'll see a graph like this
In the example graph, 0→1→5, 0→2→5, 0→3→5 and 0→4→5 are all the same length. To find "the shortest path" is not necessarily unique, as is evidenced by this diagram.
Using the wording on Wikipedia, at some point the algorithm instructs us to:
select the unvisited node that is marked with the smallest tentative distance.
The problem here is the word the, suggesting that it is somehow unique. It may not be. For an implementation to actually pick one node from many equal candidates requires further specification of the algorithm regarding how to select it. But any such selected candidate having the required property will determine a path of the shortest length. So the algorithm doesn't commit. The modern approach to wording this algorithm would be to say:
select any unvisited node that is marked with the smallest tentative distance.
From a mathematical graph theory algorithm standpoint, that algorithm would technically proceed with all such candidates simultaneously in a sort of multiverse. All answers it may arrive at are equally valid. And when proving the algorithm works, we would prove it for all such candidates in all the multiverses and show that all choices arrive at a path of the same distance, and that the distance is the shortest distance possible.
Then, if you want to use the algorithm to just compute one such answer because you want to either A) find one such path, or B) determine the distance of such a path, then it is left up to you to select one specific branch of the multiverse to explore. All such selections made according to the algorithm as defined will yield a path whose length is the shortest length possible. You can define any additional non-conflicting criteria you wish to make such a selection.
The reason the implementation I linked is deterministic and always gives the same answer (for the same start and end nodes, obviously) is because the nodes themselves are ordered in the computer. This additional information about the ordering of the nodes is not considered for graph theory. The nodes are often labelled, but not necessarily ordered. In the implementation, the computer relies on the fact that the nodes appear in an ordered array of nodes in memory and the implementation uses this ordering to resolve ties. Possibly by selecting the node with the lowest index in the array, a.k.a. the "first" candidate.
If an implementation resolved ties by randomly (not pesudo-randomly!) selecting a winner from equal candidates, then it wouldn't be deterministic either.
Dijkstra's algorithm as described on Wikipedia just defines an algorithm to find the shortest paths (note the plural paths) between nodes. Any such path that it finds (if it exists) it is guaranteed to be of the shortest distance possible. You're still left with the task of deciding between equivalent candidates though at step 6 in the algorithm.
As the tag says, the usual term is "deterministic". And the algorithm is indeed deterministic. For any given input, the steps executed are always identical.
Compare it to a simpler algorithm like adding two multi-digit numbers. The result is always the same for two given inputs, the steps are also the same, but you still need to add the numbers to get the outcome.
By deterministic I take it you mean it will give the same answer to the same query for the same data every time and give only one answer, then it is deterministic. If it were not deterministic think of the problems it would cause by those who use it. I write in Prolog all day so I know non-deterministic answers when I see them.
Here I just introduced a simple mistake in Prolog and the answer was not deterministic, and with a simple fix it is deterministic.
Non-deterministic
spacing_rec(0,[]).
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = '' ;
false.
Deterministic
spacing_rec(0,[]) :- !.
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = ''.
I will try and keep this short and simple, there are so many great explanations on this on here and online as well, if some good research is done of course.
Dijkstra's algorithm is a greedy algorithm, the main goal of a Dijsktra's algorithm is to find the shortest path between two nodes of a weighted graph.
Wikipedia does a great job with explaining what a deterministic and non-deterministic algorithms are and how you can 'determine' which algorithm would fall either either category:
From Wikipedia Source:
Deterministic Algorithm:
In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.
Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.
Nondeterministic Algorithm
In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition.
So going back to the goal of Dijkstra's algorithm is like saying I want to get from X location to Z location but to do that I have options going through shorter nodes that will get my to my end a lot quicker and more efficiently than other longer routes or nodes...
Thinking through cases where Dijsktra's algorithm could be deterministic would be a good idea as well.

Recursion to iteration - or optimization?

I've got an algorithm which searches for all possible paths between two nodes in graph, but I'm choosing path without repeating nodes and with my specified length of that path.
It is in ActionScript3 and I need to change my algorithm to iterative or to optimize it (if it's possible).
I have no idea how to do that and I'm not sure if changing to iterative will bring some better execution times of that function. Maybe it can't be optimized. I'm not sure.
Here is my function:
http://pastebin.com/nMN2kkpu
If someone could give some tips about how to solve that, that would be great.
For one thing, you could sort the edges by starting vertex. Then, iterating through a vertex' neighbours will be proportional to the number of neighbours of this vertex (while right now it's taking O(M), where M is the edge count for the whole graph).
If you relax the condition of not repeating a vertex, I'm certain the problem can be solved in better time.
If, however, you need that, I'm afraid there's no simple change that would make your code way faster. I can't guarantee on this, though.
Also, if I am correct, the code snippet tests if the edge is used already, not if the vertex is used. Another thing I noticed is that you don't stop the recursion once you've found such a path. Since in most* graphs, such a path will exist for reasonable values of length, I'd say if you need only one such path, you might be wasting a lot of CPU time after one such path is found.
*Most - to be read 'maybe most (IMO)'.

Exhaustive Search Big-O

I am working on some revision at the moment and specifically going over Big-O notation. I have asked a similar question (which dealt with a different algorithm) but am still unsure if I am going the right way about it or not.
The algorithm that I am looking at is Exhaustive Search (aka Brute Force, I believe) and looks like this:
Input: G- the graph
n- the current node
p– the path so far
1) For every edge nm (from n to m) in G do
2) If m ∉ p then
3) p = p ∪ {m}
4) Exhaustive(G, m, p)
5) End If
6) End For
So far I have come to the result that this algorithm is O(n) - is this correct? I doubt that it is, and would love to know exactly how to go about working it out; what to look for, what exactly it is that I 'count' each time, etc. I understand that the number of operations taking place need to be counted, but is that all that I need to take note of/count?
EDIT: I have learned that this algorithm is, in fact, O((n-1)!) - is this correct and if so, how did this solution come about as I cannot work it out?
Usually (but not always) with graphs, the input size n is the number of nodes in the graph. It's fairly easy to prove to ourselves that the function (let alone the runtime) is called at least n times - a single path through a graph (assuming it's connected, that is, every node is reachable from every other node via some path) will take `n' calls.
To compute running time of recursive functions, an upper bound on the running time will be the number of times the recursive function is called multiplied by the runtime of the function in a single call.
To see that the worst case runtime is O((n-1)!), consider how many paths are in a fully connected graph - you can visit any node directly from any node. Another way of phrasing this is that you can visit the nodes in any order, save the starting state. This is the same as the number of permutations of (n-1) elements. I believe it's actually going to be O(n!), since we are iterating over all edges which takes O(n) for each state on the path (n*(n-1)!). EDIT: More precisely, we can say it's big-omega(N!). See comments for more details.
Sometimes, it's easier to look at what the algorithm computes than the actual code - that is, the cardinality of all the states (more specificity here, paths).

What's the difference between backtracking and depth first search?

What's the difference between backtracking and depth first search?
Backtracking is a more general purpose algorithm.
Depth-First search is a specific form of backtracking related to searching tree structures. From Wikipedia:
One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.
It uses backtracking as part of its means of working with a tree, but is limited to a tree structure.
Backtracking, though, can be used on any type of structure where portions of the domain can be eliminated - whether or not it is a logical tree. The Wiki example uses a chessboard and a specific problem - you can look at a specific move, and eliminate it, then backtrack to the next possible move, eliminate it, etc.
I think this answer to another related question offers more insights.
For me, the difference between backtracking and DFS is that backtracking handles an implicit tree and DFS deals with an explicit one. This seems trivial, but it means a lot. When the search space of a problem is visited by backtracking, the implicit tree gets traversed and pruned in the middle of it. Yet for DFS, the tree/graph it deals with is explicitly constructed and unacceptable cases have already been thrown, i.e. pruned, away before any search is done.
So, backtracking is DFS for implicit tree, while DFS is backtracking without pruning.
IMHO, most of the answers are either largely imprecise and/or without any reference to verify. So let me share a very clear explanation with a reference.
First, DFS is a general graph traversal (and search) algorithm. So it can be applied to any graph (or even forest). Tree is a special kind of Graph, so DFS works for tree as well. In essence, let’s stop saying it works only for a tree, or the likes.
Based on [1], Backtracking is a special kind of DFS used mainly for space (memory) saving. The distinction that I’m about to mention may seem confusing since in Graph algorithms of such kind we’re so used to having adjacency list representations and using iterative pattern for visiting all immediate neighbors (for tree it is the immediate children) of a node, we often ignore that a bad implementation of get_all_immediate_neighbors may cause a difference in memory uses of the underlying algorithm.
Further, if a graph node has branching factor b, and diameter h (for a tree this is the tree height), if we store all immediate neighbors at each steps of visiting a node, memory requirements will be big-O(bh). However, if we take only a single (immediate) neighbor at a time and expand it, then the memory complexity reduces to big-O(h). While the former kind of implementation is termed as DFS, the latter kind is called Backtracking.
Now you see, if you’re working with high level languages, most likely you’re actually using Backtracking in the guise of DFS. Moreover, keeping track of visited nodes for a very large problem set could be really memory intensive; calling for a careful design of get_all_immediate_neighbors (or algorithms that can handle revisiting a node without getting into infinite loops).
[1] Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd Ed
According to Donald Knuth, it's the same.
Here is the link on his paper about Dancing Links algorithm, which is used to solve such "non-tree" problems as N-queens and Sudoku solver.
Backtracking, also called depth-first search
Backtracking is usually implemented as DFS plus search pruning. You traverse search space tree depth-first constructing partial solutions along the way. Brute-force DFS can construct all search outcomes, even the ones, that do not make sense practically. This can be also very inefficient to construct all solutions (n! or 2^n). So in reality as you do DFS, you need to also prune partial solutions, which do not make sense in context of the actual task, and focus on the partial solutions, which can lead to valid optimal solutions. This is the actual backtracking technique - you discard partial solutions as early as possible, make a step back and try to find local optimum again.
Nothing stops to traverse search space tree using BFS and execute backtracking strategy along the way, but it doesn't make sense in practice because you would need to store search state layer by layer in the queue, and tree width grows exponentially to the height, so we would waste a lot of space very quickly. That's why trees are usually traversed using DFS. In this case search state is stored in the stack (call stack or explicit structure) and it can't exceed tree height.
Usually, a depth-first-search is a way of iterating through an actual graph/tree structure looking for a value, whereas backtracking is iterating through a problem space looking for a solution. Backtracking is a more general algorithm that doesn't necessarily even relate to trees.
I would say, DFS is the special form of backtracking; backtracking is the general form of DFS.
If we extend DFS to general problems, we can call it backtracking.
If we use backtracking to tree/graph related problems, we can call it DFS.
They carry the same idea in algorithmic aspect.
DFS describes the way in which you want to explore or traverse a graph. It focuses on the concept of going as deep as possible given the choice.
Backtracking, while usually implemented via DFS, focuses more on the concept of pruning unpromising search subspaces as early as possible.
IMO, on any specific node of the backtracking, you try to depth firstly branching into each of its children, but before you branch into any of the child node, you need to "wipe out" previous child's state(this step is equivalent to back walk to the parent node). In other words, each siblings state should not impact each other. On the contrary, during normal DFS algorithm, you don't usually have this constraint, you don't need to wipe out(back track) previous siblings state in order to construct next sibling node.
Depth first is an algorithm for traversing or searching a tree. See here. Backtracking is a much more broad term that is used whereever a solution candidate is formed and later discarded by backtracking to a former state. See here. Depth first search uses backtracking to search a branch first (solution candidate) and if not successful search the other branch(es).
Depth first search(DFS) and backtracking are different searching and traversing algorithms. DFS is more broad and is used in both graph and tree data structure while DFS is limited to tree. With that being said,it does not mean DFS can't be used in graph. It is used in graph as well but only produce spanning tree, a tree without loop(multiple edge starting and ending at same vertex). That is why it is limited to tree.
Now coming back to backtracking, DFS use backtracking algorithm in tree data structure so in tree, DFS and backtracking are similar.
Thus,we can say, they are in same in tree data structure whereas in graph data structure, they are not same.
Idea - Start from any point, check if its the desired endpoint, if yes then we found a solution else goes to all next possible positions and if we can't go further then return to the previous position and look for other alternatives marking that current path wont lead us to the solution.
Now backtracking and DFS are 2 different names given to the same idea applied on 2 different abstract data types.
If the idea is applied on matrix data structure we call it backtracking.
If the same idea is applied on tree or graph then we call it DFS.
The cliche here is that a matrix could be converted to a graph and a graph could be transformed to a matrix. So, we actually apply the idea. If on a graph then we call it DFS and if on a matrix then we call it backtracking.
The idea in both of the algorithm is same.
The way I look at DFS vs. Backtracking is that backtracking is much more powerful. DFS helps me answer whether a node exists in a tree, while backtracking can help me answer all paths between 2 nodes.
Note the difference: DFS visits a node and marks it as visited since we are primarily searching, so seeing things once is sufficient. Backtracking visits a node multiple times as it's a course correction, hence the name backtracking.
Most backtracking problems involve:
def dfs(node, visited):
visited.add(node)
for child in node.children:
dfs(child, visited)
visited.remove(node) # this is the key difference that enables course correction and makes your dfs a backtracking recursion.
In a depth-first search, you start at the root of the tree and then explore as far along each branch, then you backtrack to each subsequent parent node and traverse it's children
Backtracking is a generalised term for starting at the end of a goal, and incrementally moving backwards, gradually building a solution.
Backtracking is just depth first search with specific termination conditions.
Consider walking through a maze where for each step you make a decision, that decision is a call to the call stack (which conducts your depth first search)... if you reach the end, you can return your path. However, if you reach a deadend, you want to return out of a certain decision, in essence returning out of a function on your call stack.
So when I think of backtracking I care about
State
Decisions
Base Cases (Termination Conditions)
I explain it in my video on backtracking here.
An analysis of backtracking code is below. In this backtracking code I want all of the combinations that will result in a certain sum or target. Therefore, I have 3 decisions which make calls to my call stack, at each decision I either can pick a number as part of my path to reach the target num, skip that number, or pick it and pick it again. And then if I reach a termination condition, my backtracking step is just to return. Returning is the backtracking step because it gets out of that call on the call stack.
class Solution:
"""
Approach: Backtracking
State
-candidates
-index
-target
Decisions
-pick one --> call func changing state: index + 1, target - candidates[index], path + [candidates[index]]
-pick one again --> call func changing state: index, target - candidates[index], path + [candidates[index]]
-skip one --> call func changing state: index + 1, target, path
Base Cases (Termination Conditions)
-if target == 0 and path not in ret
append path to ret
-if target < 0:
return # backtrack
"""
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
"""
#desc find all unique combos summing to target
#args
#arg1 candidates, list of ints
#arg2 target, an int
#ret ret, list of lists
"""
if not candidates or min(candidates) > target: return []
ret = []
self.dfs(candidates, 0, target, [], ret)
return ret
def dfs(self, nums, index, target, path, ret):
if target == 0 and path not in ret:
ret.append(path)
return #backtracking
elif target < 0 or index >= len(nums):
return #backtracking
# for i in range(index, len(nums)):
# self.dfs(nums, i, target-nums[i], path+[nums[i]], ret)
pick_one = self.dfs(nums, index + 1, target - nums[index], path + [nums[index]], ret)
pick_one_again = self.dfs(nums, index, target - nums[index], path + [nums[index]], ret)
skip_one = self.dfs(nums, index + 1, target, path, ret)
In my opinion, the difference is pruning of tree. Backtracking can stop (finish) searching certain branch in the middle by checking the given conditions (if the condition is not met). However, in DFS, you have to reach to the leaf node of the branch to figure out if the condition is met or not, so you cannot stop searching certain branch until you reach to its leaf nodes.
The difference is: Backtracking is a concept of how an algorithm works, DFS (depth first search) is an actual algorithm that bases on backtracking. DFS essentially is backtracking (it is searching a tree using backtracking) but not every algorithm based on backtracking is DFS.
To add a comparison: Backtracking is a concept like divide and conqueror. QuickSort would be an algorithm based on the concept of divide and conqueror.

Matching algorithm

Odd question here not really code but logic,hope its ok to post it here,here it is
I have a data structure that can be thought of as a graph.
Each node can support many links but is limited to a value for each node.
All links are bidirectional. and each link has a cost. the cost depends on euclidian difference between the nodes the minimum value of two parameters in each node. and a global modifier.
i wish to find the maximum cost for the graph.
wondering if there was a clever way to find such a matching, rather than going through in brute force ...which is ugly... and i'm not sure how i'd even do that without spending 7 million years running it.
To clarify:
Global variable = T
many nodes N each have E,X,Y,L
L is the max number of links each node can have.
cost of link A,B = Sqrt( min([a].e | [b].e) ) x
( 1 + Sqrt( sqrt(sqr([a].x-[b].x)+sqr([a].y-[b].y)))/75 + Sqrt(t)/10 )
total cost =sum all links.....and we wish to maximize this.
average values for nodes is 40-50 can range to (20..600)
average node linking factor is 3 range 0-10.
For the sake of completeness for anybody else that looks at this article, i would suggest revisiting your graph theory algorithms:
Dijkstra
Astar
Greedy
Depth / Breadth First
Even dynamic programming (in some situations)
ect. ect.
In there somewhere is the correct solution for your problem. I would suggest looking at Dijkstra first.
I hope this helps someone.
If I understand the problem correctly, there is likely no polynomial solution. Therefore I would implement the following algorithm:
Find some solution by beng greedy. To do that, you sort all edges by cost and then go through them starting with the highest, adding an edge to your graph while possible, and skipping when the node can't accept more edges.
Look at your edges and try to change them to archive higher cost by using a heuristics. The first that comes to my mind: you cycle through all 4-tuples of nodes (A,B,C,D) and if your current graph has edges AB, CD but AC, BD would be better, then you make the change.
Optionally the same thing with 6-tuples, or other genetic algorithms (they are called that way because they work by mutations).
This is equivalent to the traveling salesman problem (and is therefore NP-Complete) since if you could solve this problem efficiently, you could solve TSP simply by replacing each cost with its reciprocal.
This means you can't solve exactly. On the other hand, it means that you can do exactly as I said (replace each cost with its reciprocal) and then use any of the known TSP approximation methods on this problem.
Seems like a max flow problem to me.
Is it possible that by greedily selecting the next most expensive option from any given start point (omitting jumps to visited nodes) and stopping once all nodes are visited? If you get to a dead end backtrack to the previous spot where you are not at a dead end and greedily select. It would require some work and probably something like a stack to keep your paths in. I think this would work quite effectively provided the costs are well ordered and non negative.
Use Genetic Algorithms. They are designed to solve the problem you state rapidly reducing time complexity. Check for AI library in your language of choice.

Resources