breadth first implementation in Giraph, Graphchi or Pregel - algorithm

Does anyone knows if there exists a breadth first(from multiple sources) implementation in either of the graph processing systems- Giraph, Pregel or Graphchi.
Or please tell some easier implementation on either of the systems.

In the Giraph user mail list, one can find a few discussions - and i guess an implementation also - of BFS implementation.
I have made this type of search for Giraph in the past and they are available at:
https://github.com/MarcoLotz/GiraphBFSSO
https://github.com/MarcoLotz/GiraphBFSTO
The difference between them is that one is target oriented and other one is structure oriented.
Although they are not from multiple start vertex, one can easily modify the code to support it :)

You are looking for the multi-seed breadth first search (BFS) algorithm.
For Giraph, this is still an open issue as can be read in this
feature request.
For Pregel, you can not expect to find any open algorithm as Pregel is a closed-source graph system at Google.
I guess the easiest thing would be to use the code from Github and execute it for each source separately. One idea to optimize the algorithm's runtime complexity is to execute BFS for the first seed vertex and reuse your results for subsequent vertex seeds (the first BFS results in a spanning tree which can be easily transformed to a BFS order for any given seed vertex).
Nevertheless, KISS suggests to simply execute k times BFS for k seed vertices until you run into performance issues (which is unlikely due to linear runtime complexity of BFS).

Related

How do the BFS and DFS search algorithms choose between nodes with the "same priority"?

I am currently taking an Artificial Intelligence course and learning about DFS and BFS
if we take the following example:
From my understanding, the BFS algorithm will explore the first level containing then the second level containing D,E,F,G , etc... till it reaches the last level
I am lost concerning which node between B and C (for example) will the BFS expand first? Originally I thought it is different every time and by convention, we choose to do illustrate it done from left to right (so exploring B then C) but my professor said that our choice between B and C depends on each case and we choose the "shallowest node first", in made examples there isn't a distance factor between A,B and A,C so how could one choose then?
My question is the same concerning DFS where I was told to choose the "deepest node first", I am aware that there are pre-order versions and others but the book "Artificial Intelligence - A Modern Approach, by Stuart Russel" didn't get into them
I tried checking the CLRE algorithms book for more help but the expansion is done based on the order in the adjacency list which didn't really help.
In the definition, BFS doesn't state any rule regarding which node you should visit first as long as it is in the same level. However, depending on your actual implementation, there will be some order. For example for a binary tree (like in the example you have provided), the implementation of BFS might prefer going left then right, other implementations might do the opposite.
Conclusion: it doesn't really matter, as long as we are considering the general definition of BFS / DFS. If you find the order of visiting nodes in the same level specified in some book, that's not BFS, that's a modified BFS, meaning a specific implementation / variant.
Actually there are many variants for DFS / BFS (like right-first DFS on grids) but as I said, all of them are specific implementations, the general definition doesn't state the order of nodes in the same level.
Here is another view of the problem:
A BFS is usually implemented by using a queue, a first-in first-out data structure, while
A DFS is usually implemented by using a stack, a first-in last-out data structure.
Therefore, how to choose between nodes with the "same priority" depends on how you put those nodes in the stack/queue. There is no fixed rule about it.

Is Dijkstra's algorithm deterministic?

I think that Dijkstra's algorithm is determined, so that if you choose the same starting vertex, you will get the same result (the same distances to every other vertex). But I don't think that it is deterministic (that it has defined the following operation for each operation), because that would mean that it wouldn't have to search for the shortest distances in the first place.
Am I correct? If I'm wrong, could you please explain why it is deterministic, and maybe give an example?
I'm not sure there is a universal definition of determinism, but Wikipedia defines it as...
... an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states.
So this requires both determinism of the output and determinism of the execution. The output of Dijkstra's algorithm is deterministic no matter how you look at it, because it's the length of the shortest path, and there is only one such length.
The execution of Dijkstra's algorithm in the abstract sense is not deterministic, because the final step is:
Otherwise, select the unvisited node that is marked with the smallest tentative distance, set it as the new "current node", and go back to step 3.
If there are multiple nodes with the same smallest tentative distance, the algorithm is free to select one arbitrarily. This doesn't affect the output, but it does affect the order of operations within the algorithm.
A particular implementation of Dijkstra's algorithm, however, probably is deterministic, because the nodes will be stored in a deterministic data structure like a min heap. This will result in the same node being selected each time the program is run. Although things like hashtable salts may also affect determinism even here.
Allow me to expand on Thomas's answer.
If you look at an implementation of Dijkstra, such as this example: http://graphonline.ru/en/?graph=NnnNwZKjpjeyFnwx you'll see a graph like this
In the example graph, 0→1→5, 0→2→5, 0→3→5 and 0→4→5 are all the same length. To find "the shortest path" is not necessarily unique, as is evidenced by this diagram.
Using the wording on Wikipedia, at some point the algorithm instructs us to:
select the unvisited node that is marked with the smallest tentative distance.
The problem here is the word the, suggesting that it is somehow unique. It may not be. For an implementation to actually pick one node from many equal candidates requires further specification of the algorithm regarding how to select it. But any such selected candidate having the required property will determine a path of the shortest length. So the algorithm doesn't commit. The modern approach to wording this algorithm would be to say:
select any unvisited node that is marked with the smallest tentative distance.
From a mathematical graph theory algorithm standpoint, that algorithm would technically proceed with all such candidates simultaneously in a sort of multiverse. All answers it may arrive at are equally valid. And when proving the algorithm works, we would prove it for all such candidates in all the multiverses and show that all choices arrive at a path of the same distance, and that the distance is the shortest distance possible.
Then, if you want to use the algorithm to just compute one such answer because you want to either A) find one such path, or B) determine the distance of such a path, then it is left up to you to select one specific branch of the multiverse to explore. All such selections made according to the algorithm as defined will yield a path whose length is the shortest length possible. You can define any additional non-conflicting criteria you wish to make such a selection.
The reason the implementation I linked is deterministic and always gives the same answer (for the same start and end nodes, obviously) is because the nodes themselves are ordered in the computer. This additional information about the ordering of the nodes is not considered for graph theory. The nodes are often labelled, but not necessarily ordered. In the implementation, the computer relies on the fact that the nodes appear in an ordered array of nodes in memory and the implementation uses this ordering to resolve ties. Possibly by selecting the node with the lowest index in the array, a.k.a. the "first" candidate.
If an implementation resolved ties by randomly (not pesudo-randomly!) selecting a winner from equal candidates, then it wouldn't be deterministic either.
Dijkstra's algorithm as described on Wikipedia just defines an algorithm to find the shortest paths (note the plural paths) between nodes. Any such path that it finds (if it exists) it is guaranteed to be of the shortest distance possible. You're still left with the task of deciding between equivalent candidates though at step 6 in the algorithm.
As the tag says, the usual term is "deterministic". And the algorithm is indeed deterministic. For any given input, the steps executed are always identical.
Compare it to a simpler algorithm like adding two multi-digit numbers. The result is always the same for two given inputs, the steps are also the same, but you still need to add the numbers to get the outcome.
By deterministic I take it you mean it will give the same answer to the same query for the same data every time and give only one answer, then it is deterministic. If it were not deterministic think of the problems it would cause by those who use it. I write in Prolog all day so I know non-deterministic answers when I see them.
Here I just introduced a simple mistake in Prolog and the answer was not deterministic, and with a simple fix it is deterministic.
Non-deterministic
spacing_rec(0,[]).
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = '' ;
false.
Deterministic
spacing_rec(0,[]) :- !.
spacing_rec(Length0,[' '|T]) :-
succ(Length,Length0),
spacing_rec(Length,T).
?- spacing(0,Atom).
Atom = ''.
I will try and keep this short and simple, there are so many great explanations on this on here and online as well, if some good research is done of course.
Dijkstra's algorithm is a greedy algorithm, the main goal of a Dijsktra's algorithm is to find the shortest path between two nodes of a weighted graph.
Wikipedia does a great job with explaining what a deterministic and non-deterministic algorithms are and how you can 'determine' which algorithm would fall either either category:
From Wikipedia Source:
Deterministic Algorithm:
In computer science, a deterministic algorithm is an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently.
Formally, a deterministic algorithm computes a mathematical function; a function has a unique value for any input in its domain, and the algorithm is a process that produces this particular value as output.
Nondeterministic Algorithm
In computer science, a nondeterministic algorithm is an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition.
So going back to the goal of Dijkstra's algorithm is like saying I want to get from X location to Z location but to do that I have options going through shorter nodes that will get my to my end a lot quicker and more efficiently than other longer routes or nodes...
Thinking through cases where Dijsktra's algorithm could be deterministic would be a good idea as well.

sudoku parallelization with MPI

I want to parallelize my sudoku solver program with MPI. The current serial code relies on backtracking with depth-first search. I did some research, but I am still not sure how to do it.
Some say that program must do a breadth-first search to get some data in master process and then use slave processes with this data. So that the slave processes will do depth-first search using this data.
Also I saw that some depth-first search parallelization examples use work sharing or work stealing methods. But in case of sudoku, I am not sure that using this technique can handle process relations, work queue and process size because of sudoku's solving methodology.
Any ideas?
Thank you.
This is not an answer pertaining to Sudoku, more to the fact that you specify your serial algorithm uses a depth-first search. Depth-first search is a problem that is known to be difficult to parallelize, despite not appearing to be 'inherently serial'.
There are parallel DFS implementations however. For example this 1987 paper presents a parallel DFS algorithm. The general principle is that each processor searches a different set of paths until it reaches a leaf (or an arbitrary search depth), and when it has completed it's subset of paths, chooses a new unexplored branch.
If you are keen to implement a parallel DFS I would recommend reading that article and having a go implementing that algorithm. However I think there are probably more intelligent parallel sudoku algorithms that do not use DFS, for example it could be solved using Constraint Propagation.

Depth First Search Basics

I'm trying to improve my current algorithm for the 8 Queens problem, and this is the first time I'm really dealing with algorithm design/algorithms. I want to implement a depth-first search combined with a permutation of the different Y values described here:
http://en.wikipedia.org/wiki/Eight_queens_puzzle#The_eight_queens_puzzle_as_an_exercise_in_algorithm_design
I've implemented the permutation part to solve the problem, but I'm having a little trouble wrapping my mind around the depth-first search. It is described as a way of traversing a tree/graph, but does it generate the tree graph? It seems the only way that this method would be more efficient only if the depth-first search generates the tree structure to be traversed, by implementing some logic to only generate certain parts of the tree.
So essentially, I would have to create an algorithm that generated a pruned tree of lexigraphic permutations. I know how to implement the pruning logic, but I'm just not sure how to tie it in with the permutation generator since I've been using next_permutation.
Is there any resources that could help me with the basics of depth first searches or creating lexigraphic permutations in tree form?
In general, yes, the idea of the depth-first search is that you won't have to generate (or "visit" or "expand") every node.
In the case of the Eight Queens problem, if you place a queen such that it can attack another queen, you can abort that branch; it cannot lead to a solution.
If you were solving a variant of Eight Queens such that your goal was to find one solution, not all 92, then you could quit as soon as you found one.
More generally, if you were solving a less discrete problem, like finding the "best" arrangement of queens according to some measure, then you could abort a branch as soon as you knew it could not lead to a final state better than a final state you'd already found on another branch. This is related to the A* search algorithm.
Even more generally, if you are attacking a really big problem (like chess), you may be satisfied with a solution that is not exact, so you can abort a branch that probably can't lead to a solution you've already found.
The DFS algorithm itself does not generate the tree/graph. If you want to build the tree and graph, it's as simple building it as you perform the search. If you only want to find one soution, a flat LIFO data structure like a linked list will suffice for this: when you visit a new node, append it to the list. When you leave a node to backtrack in the search, pop the node off.
A book called "Introduction to algorithms" by anany levitan has a proper explanation for your understanding. He also provided the solution to 8 queens problem just the way you desctribed it. It will helpyou for sure.
As my understanding, for finding one solution you dont need any permutation all you need is dfs.That will lonely suffice in finding solution

I need an algorithm to find the best path

I need an algorithm to find the best solution of a path finding problem. The problem can be stated as:
At the starting point I can proceed along multiple different paths.
At each step there are another multiple possible choices where to proceed.
There are two operations possible at each step:
A boundary condition that determine if a path is acceptable or not.
A condition that determine if the path has reached the final destination and can be selected as the best one.
At each step a number of paths can be eliminated, letting only the "good" paths to grow.
I hope this sufficiently describes my problem, and also a possible brute force solution.
My question is: is the brute force is the best/only solution to the problem, and I need some hint also about the best coding structure of the algorithm.
Take a look at A*, and use the length as boundary condition.
http://en.wikipedia.org/wiki/A%2a_search_algorithm
You are looking for some kind of state space search algorithm. Without knowing more about the particular problem, it is difficult to recommend one over another.
If your space is open-ended (infinite tree search), or nearly so (chess, for example), you want an algorithm that prunes unpromising paths, as well as selects promising ones. The alpha-beta algorithm (used by many OLD chess programs) comes immediately to mind.
The A* algorithm can give good results. The key to getting good results out of A* is choosing a good heuristic (weighting function) to evaluate the current node and the various successor nodes, to select the most promising path. Simple path length is probably not good enough.
Elaine Rich's AI textbook (oldie but goodie) spent a fair amount of time on various search algorithms. Full Disclosure: I was one of the guinea pigs for the text, during my undergraduate days at UT Austin.
did you try breadth-first search? (BFS) that is if length is a criteria for best path
you will also have to modify the algorithm to disregard "unacceptable paths"
If your problem is exactly as you describe it, you have two choices: depth-first search, and breadth first search.
Depth first search considers a possible path, pursues it all the way to the end (or as far as it is acceptable), and only then is it compared with other paths.
Breadth first search is probably more appropriate, at each junction you consider all possible next steps and use some score to rank the order in which each possible step is taken. This allows you to prioritise your search and find good solutions faster, (but to prove you have found the best solution it takes just as long as depth-first searching, and is less easy to parallelise).
However, your problem may also be suitable for Dijkstra's algorithm depending on the details of your problem. If it is, that is a much better approach!
This would also be a good starting point to develop your own algorithm that performs much better than iterative searching (if such an algorithm is actually possible, which it may not be!)
A* plus floodfill and dynamic programming. It is hard to implement, and too hard to describe in a simple post and too valuable to just give away so sorry I can't provide more but searching on flood fill and dynamic programming will put you on the path if you want to go that route.

Resources