How to store a Euler graph struct? - algorithm

I'm working around the Euler Path issue and found a problem:How to define or store a Euler graph struct?
An usual way is using an "Adjoint Matrix",C[i][j] is defined to store the edge between i-j.It's concise and effective! But this kind of matrix is limited by the situation that the edge between 2 nodes is unique (figure 1).
class EulerPath
{
int[][] c;//adjoint matrix,c[i][j] means the edge between i and j
}
What if there are several edges (figure 2)?My solution might be using customized class ,like "Graph","Node","Edge" to store a graph,but dividing the graph into some discrete structs ,which means we have to take more class details into consideration,may hurt the efficiency and concision. So I'm very eager to hear your advice!Thanks a lot!
class EulerPath
{
class Graph
{
Node[] Nodes;
Edge[] Edges;
}
class Node{...}
class Edge{...}
}

You can use an adjacency matrix to store graphs with multi-edges. You just let the value of c[i][j] be the number of times that vertex i is adjacent to vertex j. In your first case, it's 1, in your second case, it's 3. See also Wikipedia -- adjacency matrices aren't defined as being composed of only 1 and 0, that's just the special case of an adjacency matrix for a simple graph.
EDIT: You can represent your second graph in an adjacency matrix like this:
1 2 3 4
1 0 3 1 1
2 3 0 1 1
3 1 1 0 0
4 1 1 0 0

You can do this in at least three ways:
Adjacency list
Meaning that you have a 2D array called al[N][N]
al[N][N] This N is the node index
al[N][N] This N is the neighbor node index
Example, a graph with this input:
0 => 1
1 => 2
2 => 3
3 => 1
The adjacency list will look like this:
0 [1]
1 [2,3]
2 [1,3]
3 [1,2]
PS: Since this is a 2D array, and not all horizontal cells are going to be used, you need to keep track of the number of connected neighbours for each graph index because some programming languages initialise array values with a zero which is a node index in the graph. This can be done easily by creating another array that will count the number of neighbours for each graph index. Example of this case: numLinks: [1,2,2,2]
Matrix
With a matrix, you create an N x N 2D array, and you put a 1 value in the intersection of row col neighobor nodes:
Example with the same input above:
0 1 2 3
0 0 1 0 0
1 1 0 1 1
2 0 1 0 1
3 0 1 1 0
Class Node
The last method is creating a class called Node that contain a dynamic array of type Node. And you can store in this array the other nodes connected

Consider using a vector of linked list. Add a class that will have a field for a Vertex as well as the Weight (let's name it Entry). Your weights should be preferably another vector or linked list (preferably ll) which will contain all possible weights to the according Vertex. Your main class will have a vector of vectors, or a vector of linked lists (I'd prefer linked lists since you will most likely not need random access, being forced to iterate through every Entry when performing any operation). You main class will have one more vector containing all vertices. In C++ this would look like this:
class Graph{
std::vector<std::forward_list<Entry>> adj_list;
std::vector<Vertex> vertices;
};
Where the Vertex that corresponds to vertices[i] has the corresponding list in adj_list[i]. Since every Entry contains the info regarding the Vertex to which you are connected and the according weights, you will have your graph represented by this class.

Efficiency for what type of operation?
If you want to find a route between two IP addresses on the internet, then your adjacency matrix might be a million nodes squared, ie a gigabyte of entries. And as finding all the nodes connected to a given node goes up as n, you could be looking at a million lookups per node just to find the nodes connected to that node. Horribly inefficient.
If your problem only involves a few nodes and is run infrequently, then adjacency matrices are simple and intuitive.
For most problems which involve traversing graphs, a better solution could be to create a class called node, which has a property a collection (say a List) of all the nodes it is connected to. For most real world applications, the list of connected nodes is much less than the total number of all nodes, so this works out as more compact. Plus it is highly efficient in finding edges - you can get a list of all connected nodes in fixed time per node.
If you use this structure, where you have a node class which contains as a property a collection of all the nodes it is connected to, then when you create a new edge (say between node A and node B) then you add B to the collection of nodes to which A is connected, and A to the collection of nodes to which B is connected. Excuse my Java/C#, something like
class Node{
Arraylist<Node> connectedNodes;
public Node() // initializer
{
connectedNodes = new ArrayList<Node>;
}
}
// and somewhere else you have this definition:
public addEdgeBetween(Node firstNode, Node secondNode) {
firstNode.connectedNodes.Add(secondNode);
secondNode.connectedNodes.Add(firstNode);
}
And similarly to delete an edge, remove the reference in A to B's collection and vice versa. There is no need to define a separate edge class, edges are implicit in the structure which cross-links the two nodes.
And that's about all you have to do to implement this structure, which is (for most real world problems) uses far less memory than an adjacency matrix, is much faster for large numbers of nodes for most problems, and is ultimately far more flexible.
Defining a node class also opens up a logical place to add enhancements of many sorts. For example, you might decide to generate for each node a list of all the nodes which are two steps away, because this improves path finding. You can easily add this in as another collection within the node class; this would be a pretty messy thing to do with adjacency matrices. You can obviously squeeze a lot more functionality into a class than a into a matrix of ints.
Your question concerning multiple links is unclear to me. If you want multiple edges between the same two points, then this can be accommodated in both ways of doing it. In adjacency matrices, simply have a number at that row and column which indicates the number of links. If you use a node class, just add each edge separately. Similarly directional graphs; an edge pointing from A to B has a reference to B in A's list of connected nodes, but B doesn't have A in its list.

Related

Generate random graph with probability p

Write a function in main.cpp, which creates a random graph of a certain size as follows. The function takes two parameters. The first parameter is the number of vertices n. The second parameter p (1 >= p >= 0) is the probability that an edge exists between a pair of nodes. In particular, after instantiating a graph with n vertices and 0 edges, go over all possible vertex pairs one by one, and for each such pair, put an edge between the vertices with probability p.
How to know if an edge exists between two vertices .
Here is the full question
PS: I don't need the code implementation
The problem statement clearly says that the first input parameter is the number of nodes and the second parameter is the probability p that an edge exists between any 2 nodes.
What you need to do is as follows (Updated to amend a mistake that was pointed out by #user17732522):
1- Create a bool matrix (2d nested array) of size n*n initialized with false.
2- Run a loop over the rows:
- Run an inner loop over the columns:
- if row_index != col_index do:
- curr_p = random() // random() returns a number between 0 and 1 inclusive
- if curr_p <= p: set matrix[row_index][col_index] = true
else: set matrix[row_index][col_index] = false
- For an undirected graph, also set matrix[col_index][row_index] = true/false based on curr_p
Note: Since we are setting both cells (both directions) in the matrix in case of a probability hit, we could potentially set an edge 2 times. This doesn't corrupt the correctnes of the probability and isn't much additional work. It helps to keep the code clean.
If you want to optimize this solution, you could run the loop such that you only visit the lower-left triangle (excluding the diagonal) and just mirror the results you get for those cells to the upper-right triangle.
That's it.

Fastest way to find triangles from connected line points

I want to find out a way to detect all triangles from connected lines. I have array list with all available lines called lines, and when I add another line I want to find if it creates new triangle. As shown on the image below as soon as I add new line f, that connects the points with indices 1,3. I want fast way to detect that new triangle is created and add the indices 1,3,5 to another array list called triangles.
// array with lines indices, each line contains two point indices
val lines: ArrayList<Int> = arrayListOf(
0,1, // a
1,5, // b
5,4, // c
5,3, // d
5,2 // e
)
val triangles: ArrayList<Int> = arrayListOf()
One way I came out with, is to have an array list that contains separate array lists for each point with all the indices it is connected to. For example the point with index 5 is connected to the points with indices [1,2,3,4]. So when a new line that connect the indices 3,1 is added, we can check if the
array list circlesHash[3] and circlesHash[1] contains equal elements. In our case both arrays contain the index 5, that means triangle with indices 3,1,5 is created. If the array lists are sorted I can use the method Arrays.binarySearch. But if I have large amount of points for example 1000-2000 points, what is the quickest way to find if new triangle is created?
val circlesHash: ArrayList<ArrayList<Int>> = arrayListOf(
arrayListOf(1), // 0
arrayListOf(0,5), // 1
arrayListOf(5), // 2
arrayListOf(5), // 3
arrayListOf(5), // 4
arrayListOf(1,2,3,4) // 5
)
And another question is if there is existing algorithm to find all the existing triangles just from the indices of the existing lines. And what structure is preferable HashMaps, HashSet or some sort of Binary tree??
The simplest solution would be to pair edges by shared points. So two edges are paired if one end-point is the same. Then for each pair choose the two nodes that are not shared by the edges, insert an edge between and you've formed a triangle. Assuming your graph isn't a multigraph this algorithm won't produce any duplicates and finds all triangles.
Taking it from there we can create a list of edges that don't exist and will create a triangle. As pseudocode:
triangle_candidates(g):
triangle_edges = set()
for e1 in g.edges:
for e2 in g.edges:
if e1 and e2 share a node:
a, b = nodes not shared by e1 and e2
if a and b are not neighbors:
triangle_edges.add(tuple(a, b))
return triangle_edges
creates_new_triangle(g, a, b):
return tuple(a, b) in triangle_edges(g)
Note that tuples are assumed to be non-ordered! The above code is a demonstration of the basic principle; there's still plenty of room for optimizations in triangle_candidates.
The basic idea is to create a set of all node-pairs that are not neighbors, but would complete a cycle of length three if connected. Checking whether a new edge would create a triangle is a simple set-lookup, which should be fairly fast assuming a proper implementation.

How to find the set of trees every one of which spans over another given tree?

Imagine it's given a set of trees ST and each vertex of every tree is labeled. Also another tree T is given (also with labels vertices). The question is how can I find which trees of the ST can span over the tree T starting from the root of T in such a way that the labels of the vertices of the spanning tree T' coincide with those labels of T 's vertices. Note that the children of every vertex of T should be either completely covered or not covered at all - partial covering of children is not allowed. Stated in other words: Given a tree and the following procedure: pick a vertex and remove all vertices and edges below this vertex (except the vertex itself). Find those trees of ST such that each tree is generated with a series of procedures applied to T.
For example given the tree T
the trees
cover T and the tree
does not because this tree has children 3, 5 unlike T which has 2, 3 as children. The best thing I was able to think of was either to brute force it or to find the set of tree every one of which has the same root label as T and then to search for the answer among those trees but I guess neither of those two approaches is the optimal one. I was thinking of somehow hashing the trees but nothing came out. Any thoughts?
Notes:
The trees are not necessarily binary
A tree T can cover another tree T' if they share a root
The tree is ordered meaning that you cannot swap the position of any two children.
TL; DR Find a efficient algorithm which on query with given tree T the algorithm finds all trees from a given(fixed/static) set ST which are able to cover T.
I'll sketch an answer and then provide some working source code.
First off, you need an algorithm to hash a tree. We can assume, without loss of generality, that the children of each of your tree's nodes are ordered from least to greatest (or vice versa).
Run this algorithm on every member of ST and save the hashes.
Now, take your test tree T and generate all of its subtrees TP that retain the original root. You can do this (perhaps inefficiently) by:
Making a set S of its nodes
Generating the power set P of S
Generating the subtrees by removing the nodes present in each member of P from copies of T
Adding those subtrees which retain the original root to TP.
Now generate a set of all of the hashes of TP.
Now check each of your ST hashes for membership in TP.
ST hash storage requires O(n) space in ST, and possibly the space to hold the trees.
You can optimize the membership code so that it requires no storage space (I have not done this in my test code). The code will require approximately 2N checks, where N is the number of nodes in **T.
So the algorithm runs in O(H 2**N), where H is the size of ST and N is the number of nodes in T. The best way of speeding this up is to find an improved algorithm for generating the subtrees of T.
The following Python code accomplishes this:
#!/usr/bin/python
import itertools
import treelib
import Crypto.Hash.SHA
import copy
#Generate a hash of a tree by recursively hashing children
def HashTree(tree):
digester=Crypto.Hash.SHA.new()
digester.update(str(tree.get_node(tree.root).tag))
children=tree.get_node(tree.root).fpointer
children.sort(key=lambda x: tree.get_node(x).tag, cmp=lambda x,y:x-y)
hash=False
if children:
for child in children:
digester.update(HashTree(tree.subtree(child)))
hash = "1"+digester.hexdigest()
else:
hash = "0"+digester.hexdigest()
return hash
#Generate a power set of a set
def powerset(iterable):
"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return itertools.chain.from_iterable(itertools.combinations(s, r) for r in range(len(s)+1))
#Generate all the subsets of a tree which still share the original root
#by using a power set of all the tree's nodes to remove nodes from the tree
def TreePowerSet(tree):
nodes=[x.identifier for x in tree.nodes.values()]
ret=[]
for s in powerset(nodes):
culled_tree=copy.deepcopy(tree)
for n in s:
try:
culled_tree.remove_node(n)
except:
pass
if len([x.identifier for x in culled_tree.nodes.values()])>0:
ret.append(culled_tree)
return ret
def main():
ST=[]
#Generate a member of ST
treeA = treelib.Tree()
treeA.create_node(1,1)
treeA.create_node(2,2,parent=1)
treeA.create_node(3,3,parent=1)
ST.append(treeA)
#Generate a member of ST
treeB = treelib.Tree()
treeB.create_node(1,1)
treeB.create_node(2,2,parent=1)
treeB.create_node(3,3,parent=1)
treeB.create_node(4,4,parent=2)
treeB.create_node(5,5,parent=2)
ST.append(treeB)
#Generate hashes for members of ST
hashes=[(HashTree(tree), tree) for tree in ST]
print hashes
#Generate a test tree
T=treelib.Tree()
T.create_node(1,1)
T.create_node(2,2,parent=1)
T.create_node(3,3,parent=1)
T.create_node(4,4,parent=2)
T.create_node(5,5,parent=2)
T.create_node(6,6,parent=3)
T.create_node(7,7,parent=3)
#Generate all the subtrees of this tree which still retain the original root
Tsets=TreePowerSet(T)
#Hash all of the subtrees
Thashes=set([HashTree(x) for x in Tsets])
#For each member of ST, check to see if that member is present in the test
#tree
for hash in hashes:
if hash[0] in Thashes:
print [x for x in hash[1].expand_tree()]
main()
To verify that one tree covers another, one must look at all vertices of the first tree at least once. It is trivial to verify that a tree covers another by looking at all vertices of the first tree exactly once. Thus the simplest possible algorithm is already optimal, if it's only needed to check one tree.
Everything below are untested fruits of my sick imagination.
If there are many possible T that must be checked against the same ST, then it's possible to store trees of ST as sets of facts like these
root = 1
children of node 1 = (2, 3)
children of node 2 = ()
children of node 3 = ()
These facts can be stored in a standard relational DB in two tables, "roots" (fields "tree" and rootnode") and "branches" (fields "tree", "node" and "children"). then an SQL query or a series of queries can be built to find matching trees quickly. My SQL-fu is rudimentary so I could not manage it in a single query, but I'm believe it should be possible.

A fast way to find connected component in a 1-NN graph?

First of all, I got a N*N distance matrix, for each point, I calculated its nearest neighbor, so we had a N*2 matrix, It seems like this:
0 -> 1
1 -> 2
2 -> 3
3 -> 2
4 -> 2
5 -> 6
6 -> 7
7 -> 6
8 -> 6
9 -> 8
the second column was the nearest neighbor's index. So this was a special kind of directed
graph, with each vertex had and only had one out-degree.
Of course, we could first transform the N*2 matrix to a standard graph representation, and perform BFS/DFS to get the connected components.
But, given the characteristic of this special graph, is there any other fast way to do the job ?
I will be really appreciated.
Update:
I've implemented a simple algorithm for this case here.
Look, I did not use a union-find algorithm, because the data structure may make things not that easy, and I doubt whether It's the fastest way in my case(I meant practically).
You could argue that the _merge process could be time consuming, but if we swap the edges into the continuous place while assigning new label, the merging may cost little, but it need another N spaces to trace the original indices.
The fastest algorithm for finding connected components given an edge list is the union-find algorithm: for each node, hold the pointer to a node in the same set, with all edges converging to the same node, if you find a path of length at least 2, reconnect the bottom node upwards.
This will definitely run in linear time:
- push all edges into a union-find structure: O(n)
- store each node in its set (the union-find root)
and update the set of non-empty sets: O(n)
- return the set of non-empty sets (graph components).
Since the list of edges already almost forms a union-find tree, it is possible to skip the first step:
for each node
- if the node is not marked as collected
-- walk along the edges until you find an order-1 or order-2 loop,
collecting nodes en-route
-- reconnect all nodes to the end of the path and consider it a root for the set.
-- store all nodes in the set for the root.
-- update the set of non-empty sets.
-- mark all nodes as collected.
return the set of non-empty sets
The second algorithm is linear as well, but only a benchmark will tell if it's actually faster. The strength of the union-find algorithm is its optimization. This delays the optimization to the second step but removes the first step completely.
You can probably squeeze out a little more performance if you join the union step with the nearest neighbor calculation, then collect the sets in the second pass.
If you want to do it sequencially you can do it using weighted quick union and path compression .Complexity O(N+Mlog(log(N))).check this link .
Here is the pseudocode .honoring #pycho 's words
`
public class QuickUnion
{
private int[] id;
public QuickUnion(int N)
{
id = new int[N];
for (int i = 0; i < N; i++) id[i] = i;
}
public int root(int i)
{
while (i != id[i])
{
id[i] = id[id[i]];
i = id[i];
}
return i;
}
public boolean find(int p, int q)
{
return root(p) == root(q);
}
public void unite(int p, int q)
{
int i = root(p);
int j = root(q);
id[i] = j;
}
}
`
#reference https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf
If you want to find connected components parallely, the asymptotic complexity can be reduced to O(log(log(N)) time using pointer jumping and weighted quick union with path compression. Check this link
https://vishwasshanbhog.wordpress.com/2016/05/04/efficient-parallel-algorithm-to-find-the-connected-components-of-the-graphs/
Since each node has only one outgoing edge, you can just traverse the graph one edge at a time until you get to a vertex you've already visited. An out-degree of 1 means any further traversal at this point will only take you where you've already been. The traversed vertices in that path are all in the same component.
In your example:
0->1->2->3->2, so [0,1,2,3] is a component
4->2, so update the component to [0,1,2,3,4]
5->6->7->6, so [5,6,7] is a component
8->6, so update the compoent to [5,6,7,8]
9->8, so update the compoent to [5,6,7,8,9]
You can visit each node exactly once, so time is O(n). Space is O(n) since all you need is a component id for each node, and a list of component ids.

In a tree structure, find highest key between a root and k number of nodes

Need to define a seek(u,v) function, where u is the new node within the tree (the node where I want to start searching), and v is the number of descendants below the new node, and this function would return index of highest key value. The tree doesn't have a be a BST, there can be nodes with many many children. Example:
input:
5 // tree with 5 nodes
1 3 5 2 7 // the nodes' keys
1 2 // 1 and 2 are linked
2 3 // 2 and 3 are linked
1 4 // 1 and 4 are linked
3 5 // 3 and 5 are linked
4 // # of seek() requests
2 3 // index 2 of tree, which would be key 5, 3 descendants from index 3
4 1 // index 4 of tree, but only return highest from index 4 to 4 (it would
// return itself)
3 2 // index 3, next 2 descendants
3 2 // same
output:
5 // Returned index 5 because the 7 is the highest key from array[3 'til 5]
4 // Returned index 4 because the range is one, plus 4's children are null
5 // Returned index 5 because the 7 is the highest key from array[4 'til 5]
5 // Same as prior line
I was thinking about putting the new root into a new Red Black Tree, but can't find a way to efficiently save successor or predecessor information for each node. Also thinking about putting into an array, but due to the nature of an unbalanced and unsorted tree, it doesn't guarantee that my tree would be sorted, plus because it's not a BST i can't perform an inorder tree walk. Any suggestions as to how I can get the highest key from a specific range?
I dont understand very well what you mean by : "the number of descendants below the new node". The way you say it, it implies there is a some sort of imposed tree walk, or at least an order in which you have to visit the nodes. In that case it would be best to explain more thoroughly what you mean.
In the rest of the answer I assume you mean distance from u.
From a pure algorithmic point of view, since you cannot assume anything about your tree, you have to visit all concerned vertices of the graph (i.e vertices at a distance <= v from u) to get your result. It means any partial tree traversal (such as depth-first or breadth-First) should be enough and necessary (since you have to visit all concerned nodes below u), since the order in which we visit the nodes doesn't matter.
If you can, it's simpler to use a recursive function seek'(u,v) which return a couple (index, key) defined as follows :
if v > 1, you define seek'(u,v) as the couple which maximizes its second component among the couples (u, key(u)) and seek(w,v-1) for w son of u.
else (v = 1) you define seek'(u,v) as (u, key(u))
You then have seek(u,v) = first(seek'(u,v)).
All of what I said presumes you have built a tree from the input, or that you can easily get the key of a node and its sons from its index.

Resources