Context
While trying to generate a directed random connected graph in networkx, I am experiencing some difficulties combining the randomness with the "connected" property.
I use the following parameters to specify the properties of the graph:
size [2,100]: the number of nodes of the graph.
density (0,1]: The probability that an arbitrary edge is created.
Issue/Doubts
I am in doubt about the term connected in the context of a directed graph. What I mean is that any arbitrary node in the graph is able to reach all other nodes by traveling in the direction of the edges. I believe this does not mean that every edge between 2 nodes has to be bi-directional, as a detour may be possible to still reach nodes on the other side of an opposing-one-way-edge.
Another issue is that below certain edge densities, directed connected graphs are not possible anymore. However, I am not yet sure how I can account for that in the random graph generation. Additionally, for some densities, a directed graph may be possible, yet I would not directly know which edges to "randomly" select such that it actually is connected. Hence, I was wondering whether there exists an algorithm, or built in networkx function for this purpose.
Question
How can I generate a random graph of size=size that is directional and connected with an edge density=density in networkx?
MWE
The following MWE generates random graphs of a certain size and density, however I think they do not satisfy the connected property:
def gnp_random_connected_graph(
density,
size,
test_scope,
):
"""Generates a random undirected graph, similarly to an Erdős-Rényi graph,
but enforcing that the resulting graph is conneted.
:param density:
:param size:
:param test_scope:
"""
random.seed(test_scope.seed)
edges = combinations(range(size), 2)
G = nx.DiGraph()
G.add_nodes_from(range(size))
if density <= 0:
return G
if density >= 1:
return nx.complete_graph(size, create_using=G)
for _, node_edges in groupby(edges, key=lambda x: x[0]):
node_edges = list(node_edges)
random_edge = random.choice(node_edges) # nosec - using a random seed.
G.add_edge(*random_edge)
for e in node_edges:
if random.random() < density: # nosec - no security application.
G.add_edge(*e)
Related
I have this road network with elevation data for every POINT and a calculated grade value for every LINESTRING:
Note: points plotted on graph are my own they do not represent the nodes on the graph which include all the missing endpoints
I have converted it to a networkx MultiGraph which I have generated from a GeoPandasDataframe:
seg_df = pd.DataFrame(
{'grade': grades})
seg_grade = gpd.GeoDataFrame(
seg_df, geometry=new_seg_list)
network = momepy.gdf_to_nx(seg_grade, approach='primal')
in the code above grades is a list of integer grade values and new_seg_list is a list of LINESTRING objects which match with the indices of the grade list ex:
the grade of new_seg_list[0] = grades[0]
this grade value is the elevation_change from LINESTRING.coords[0] to LINESTRING.coords[-1] divided by the length of the LINESTRING.
My network object has the correct node and edge values so that functions such as
nx.shortest_path(G=Gr, source=start_node, target=end_node, weight='length')
works correctly. How do I find the longest path (only use each edge once) that is entirely downhill (negative grade from LINESTRING.coords[0] to LINESTRING.coords[-1])?
The main difficulties that I'm having are the fact that the grade values are from the start vertex to the end vertex of each LINESTRING which makes it hard to translate into the networkx graph. I still have the elevation data for each node so if there is some way to calculate this grade as paths are tested that might be the best way
So, in general longest path problem is NP-hard. However, it is solvable in linear time when the graph is a directed acyclic graph (DAG).
For your case, I don't fully understand the problem, but perhaps you can build a DAG like this to produce your desired output:
Only have an edge between nodes that are on different elevations, and make the direction from the one higher to lower.
So then you'd have a DAG, where there are edges from higher POINTs to lower POINTs, and there you can find the longest path using networkx.algorithms.dag.dag_longest_path readily available in networkx.
I want to find an algorithm to generate an undirected graph with a given size and a given connectivity, where each vertex has exactly the specified number of edges coming from it. The closest thing I've found so far is this:
Random simple connected graph generation with given sparseness
The difference here being that I'm less interested in the total number of edges in the graph (though that can easily be computed too) -- what I'm looking for is a guarantee that each vertex has exactly the given number of connections.
For example:
Input: Size - 6 Connectivity - 4
Output: Undirected graph with 6 verteces and (6*4/2)=12 edges, where each vertex has 4 connnections.
I am assuming that my inputs will have a valid output.
I want to check if my unidirected graph is a tree. Tree is an acyclic and connected graph. I have a function that checks if graph is connected. So it is enough to be a tree if graph is connected and |E|=|V|-1?
You are correct, E = V - 1 is sufficient to check that your graph is a tree.
The logic is that every tree begins with just a root note (V=1, E=0, so E=V-1), and from there, any time we add one node (V=V+1), we must also add exactly one edge (E=E+1). This makes the equation E=V-1 remain true for all trees.
A cycle occurs when we connect two existing nodes with a new edge (E=E+1 but V stays the same), rendering the equation E=V-1 false.
If it interests you, you may want to read about the more general formula v - e + f = 2, where f is the number of regions inside a graph, including the exterior region. (A tree only has an exterior region so f=1). This rule is called Euler's Formula, which you can read about on Wikipedia.
Connected: It means that for every pair of vertices you choose, there will always be a path between them.
|E|=|V|-1: if your graph has |V| vertices and you are given |E|=|V|-1 edges to connect them, then if you form a cycle, you won't be able to form a connected graph (some vertices will remain without edges). We can conclude that these conditions are enough.
I'm trying to find an efficient algorithm to generate a simple connected graph with given the number of nodes. Something like:
Input:
N - size of generated graph
Output:
simple connected graph G(v,e) with N vertices and S edges, The number of edges should be uniform distribution.
You might want to create a minimal spanning tree first to ensure connectivity. Later, randomly generate two nodes (which are not yet connected) and connect them. Repeat until you have S edges.
For minimal spanning tree, the easiest thing to do is start with a random node as tree. For every remaining node (ordered randomly), connect it to any node in the tree. The way you select the node within the tree (to connect to) defines the distribution of edges/node.
It may be a bit surprising but if you choose (randomly) log(n) edges (where n - maximum number of edges) then you can be almost sure that your graph is connected (a reference). If number of edges is much lower than log(n) you can be almost sure that the graph is disconnected.
How to generate the graph:
GenerateGraph(N,S):
if (log(N)/N) > S) return null // or you can take another action
V <- {0,...,N-1}
E <- {}
possible_edges <- {{v,u}: v,u in V} // all possible edges
while (size(E) < S):
e <- get_random_edge(possible_edges)
E.add(e)
possible_edges.remove(e)
G=(V,E)
if (G is connected)
return G
else
return GenerateGraph(N,S)
Let me know if you need any more guidance.
(Btw. right now I am dealing with exactly the same problem! Let me know if you need to generate some more sophisticated graphs:-))
A very common random graph generation algorithm (used in many academic works) is based on RMat method, or a generalization in the form of Kronecker graphs. It's a simple, iterative process, that uses very few parameters and is easily scalable.
There's a really good explanation here (including why it's better than other methods) -
http://www.graphanalysis.org/SIAM-PP08/Leskovic.pdf
There are versions of both implemented in many graph benchmark suites, for e.g.
BFS benchmark with a subdir with rMat generator - http://www.cs.cmu.edu/~pbbs/benchmarks/breadthFirstSearch.tar
Kronecker graph generator (both c and matlab codes included) - http://www.graph500.org/sites/default/files/files/graph500-2.1.4.tar.bz2
Say I have a series of several thousand nodes. For each pair of nodes I have a distance metric. This distance metric could be a physical distance ( say x,y coordinates for every node ) or other things that make nodes similar.
Each node can connect to up to N other nodes, where N is small - say 6.
How can I construct a graph that is fully connected ( e.g. I can travel between any two nodes following graph edges ) while minimizing the total distance between all graph nodes.
That is I don't want a graph where the total distance for any traversal is minimized, but where for any node the total distance of all the links from that node is minimized.
I don't need an absolute minimum - as I think that is likely NP complete - but a relatively efficient method of getting a graph that is close to the true absolute minimum.
I'd suggest a greedy heuristic where you select edges until all vertices have 6 neighbors. For example, start with a minimum spanning tree. Then, for some random pairs of vertices, find a shortest path between them that uses at most one of the unselected edges (using Dijkstra's algorithm on two copies of the graph with the selected edges, connected by the unselected edges). Then select the edge that yielded in total the largest decrease of distance.
You can use a kernel to create edges only for nodes under a certain cutoff distance.
If you want non-weighted edges You could simply use a basic cutoff to start with. You add an edge between 2 points if d(v1,v2) < R
You can tweak your cutoff R to get the right average number of edges between nodes.
If you want a weighted graph, the preferred kernel is often the gaussian one, with
K(x,y) = e^(-d(x,y)^2/d_0)
with a cutoff to keep away nodes with too low values. d_0 is the parameter to tweak to get the weights that suits you best.
While looking for references, I found this blog post that I didn't about, but that seems very explanatory, with many more details : http://charlesmartin14.wordpress.com/2012/10/09/spectral-clustering/
This method is used in graph-based semi-supervised machine learning tasks, for instance in image recognition, where you tag a small part of an object, and have an efficient label propagation to identify the whole object.
You can search on google for : semi supervised learning with graph