diameter of a huge graph - algorithm

I have a huge graph that I would like to process using many machines.
I had like to compute if the graph diameter is higher than 50.
How would I split the data and I would I write a parallel algorithm that can calculate it?
(the return value is boolean)
The graph diameter is the greatest distance between any pair of vertices

The standard way to figure this out would be an all-pairs shortest path algorithm -- the Floyd-Warshall algorithm is a good place to start. Another option using Hadoop is located here.

Take a look at Parallel implementation of graph diameter algorithms
Also: Parallel Graph Algorithms

Related

How to build a Minimum Spanning Tree given a list of 200 000 nodes?

Problem
I have a list of approximatly 200000 nodes that represent lat/lon position in a city and I have to compute the Minimum Spanning Tree. I know that I need to use Prim algorithm but first of all I need a connected graph. (We can assume that those nodes are in a Euclidian plan)
To build this connected graph I thought firstly to compute the complete graph but (205000*(205000-1)/2 is around 19 billions edges and I can't handle that.
Options
Then I came across to Delaunay triangulation: with the fact that if I build this "Delauney graph", it contains a sub graph that is the Minimum Spanning Tree according and I have a total of around 600000 edges according to Wikipedia [..]it has at most 3n-6 edges. So it may be a good starting point for a Minimum Spanning Tree algorithm.
Another options is to build an approximately connected graph but with that I will maybe miss important edges that will influence my Minimum Spanning Tree.
My question
Is Delaunay a reliable solution in this case? If so, is there any other reliable solution than delaunay triangulation to this problem ?
Further information: this problem has to be solved in C.
The Delaunay triangulation of a point set is always a superset of the EMST of these points. So it is absolutely "reliable"*. And recommended, as it has a size linear in the number of points and can be efficiently built.
*When there are cocircular point quadruples, neither the triangulation nor the EMST are uniquely defined, but this is usually harmless.
There's a big question here of what libraries you have access to and how much you trust yourself as a coder. (I'm assuming the fact that you're new on SO should not be taken as a measure of your overall experience as a programmer - if it is, well, RIP.)
If we assume you don't have access to Delaunay and can't implement it yourself, minimum spanning trees algorithms that pre-suppose a graph aren't necessarily off limits to you. You can have the complete graph conceptually but not actually. Kruskal's algorithm, for instance, assumes you have a sorted list of all edges in your graph; most of your edges will not be near the minimum, and you do not have to compare all n^2 to find the minimum.
You can find minimum edges quickly by estimations that give you a reduced set, then refinement. For instance, if you divide your graph into a 100*100 grid, for any point p in the graph, points in the same grid square as p are guaranteed to be closer than points three or more squares away. This gives a much smaller set of points you have to compare to safely know you've found the closest.
It still won't be easy, but that might be easier than Delaunay.

Find all pair shortest path in a large scale of graph

I want to find all pair shortest path in a large scale of graph. What can I do? In the mean time, is there any kind of stream algorithm to solve all pair shortest path problem in stream graph?
According to the hint of #sudomakeinstall2, I find something interesting to solve dynamical graph problem. Camil Demetrescu's page A New Approach to Dynamic All Pairs Shortest Paths is great one for this problem. They yield a fully dynamic algorithm for general directed graphs with non-negative real- valued edge weights that supports any sequence of operations in O(n2log3n) amortized time per update and unit worst-case time per distance query, where n is the number of vertices

Algorithm for finding shortest distance covering least number of cells

I am designing a system to find shortest route covering least number of cells. Suppose the plane is divided into rectangular cells. What will be the best suited algorithm for this. I am just looking for the head-start and not the proper code or implementation.
You are dealing with shortest path problem, in an unweighted graph (vertices are the cells in your grid, and edges are possible moves from one cell to the other)
The simplest approach is a simple BFS - that finds the shortest
path from a source to all targets (in unweighted graphs).
This algorithm is fairly simple and is iteratively "discovering" the closest nodes, all nodes of distance 1, then of distance 2, ....
An optimization is using a bi-directional search. Here you can take advantage of having single source and single target by doing BFS from both sides, effectively reducing the number of nodes you need to develop dratically.
One more optimization could be to use A* Search Algorithm with
an admissible heuristic function, such as manhattan distances.
In A* Search, you take advantage that the graph is not some arbitrary graph - but a grid, and you can estimate distance from one node to the other, based on their locations on the grid. The algorithm will uses these estimations to find the optimal path quicker.
Note - all algorithms I suggested find the shortest path, difference is in the time it will take them to find it.

Is it possible to use A* search for non grid graphs

I know A* is the most optimal algorithm for finding the shortest path, but I cannot see how any heuristic can work on a non lattice graph? This made me wonder whether or not A* is in fact capable of being used on an undirected or directed graph. If A* is capable of doing this, what kind of heuristic could one use? If A* is not, what is the current fastest algorithm for calculating the shortest path on a directed or undirected non lattice graph? Please comment if any more information is required.
It might be possible yes, but A* is a popular algorithm for finding shortest paths in grid like graphs. For graphs in general there are a lot of other algorithms for shortest path calculation which will match your case better. It depends on your setting which one to use.
If you plan to do a Single Source Shortest Path (SSSP), where you try to find the shortest path from one node to another one and your graph is unweighted you could use Breath-First-Search. A bidirectional version of this algorithm performs well in practice. If you do SSSP and your graph is weighted Dijkstra's algorithm is a common choice, a bidirectional version could be used as well. For all pairs shortest path other algorithms like Floyd-Warshall or Johnson's algorithm are useful.
If you want to use heuristics and estimate the distance before the search is done you can do pre calculations which are mostly applicable to each of the algorithms mentioned above. Some examples:
shortcut calculation (ARC-Flags, SHARC, KFlags)
hub identification (also transite nodes): pre calculate distances between all hub nodes (has to be done only once in non dynamic graphs), find next hub of source and sink e.g. with dijkstra. add up distance between hubs and source to next hub and sink to next hub
structured look up tables, e.g. distance between hubs and distances between a hub an nodes in a specific distance. After pre calculation you never need to traverse the graph again, instead your shortest path calculation is a number of look-ups. this results in high memory consumption but strong performance. You can tune the memory by using the upper approximations for distance calculations.
I would highly recommend that you identify your exact case and do some research for graph algorithms well suited to this one. A lot of research is done in shortest path for dozens of fields of application.

Edit distance between two graphs

I'm just wondering if, like for strings where we have the Levenshtein distance (or edit distance) between two strings, is there something similar for graphs?
I mean, a scalar measure that identifies the number of atomic operations (node and edges insertion/deletion) to transform a graph G1 to a graph G2.
I think graph edit distance is the measure that you were looking for.
Graph edit distance measures the minimum number of graph edit operations to transform one graph to another, and the allowed graph edit operations includes:
Insert/delete an isolated vertex
Insert/delete an edge
Change the label of a vertex/edge (if labeled graphs)
However, computing the graph edit distance between two graphs is NP-hard. The most efficient algorithm for computing this is an A*-based algorithm, and there are other sub-optimal algorithms.
You should look at the paper A survey of graph edit distance
For a general graph it is a NP-complete problem as others mentioned in their answer. But for tree graph there are well known polynomial algorithms. May be most famous of them is "Zhang Shasha" algorithm which was published in 1989.
Note:
The Levenshtein distance (or edit distance) is between two strings
But in Graph you should search between at least N! position that you find Identity of each edge and vertex.
You can compare between two graph by unique index easily,But
The master question is define identity for each vertex and edge.this question (find identity for each vertex and edge in two graph that they can to transform ) is very hard and was called isomorphism problem (NP-Complete).
You can search about isomorphism graph.

Resources