Odd generalization of trees? - data-structures

When dealing with directed graphs, a tree is a graph in which every node except one (the root) has a single incoming edge? Are there any examples of treelike structures in which every node has at most some constant number of incoming edges; say, at most two, or at most three? I haven't come across any graphs specifically described this way; is there a particular application in which they are used?

In graph theory, a tree is a connected acyclic graph. There is no requirement that every node have one incoming edge. In computer science, we often deal with rooted trees that agree with your definition.
Here is one description of a tree where some of the nodes have a constant number of incoming edges: an assignment of projects to employees, where each employee can be assigned at most three projects.

The most common generalization of a tree is a "DAG" (Directed Acyclic Graph) which is tangentially related but does not set a maximum on the size of in-neighborhoods (arcs which lead into a vertex) and specification of a single source (vertices with empty in-neighborhood).
From what I know, there's no neat term for what you're looking for. You'll need to find a true mathematician with a deep interest in graph theory to know with any certainty!

Lattices (partially ordered sets) have that property.

Related

Minimum spanning trees on two graphs with some common edges

Given two complete graphs with weighted edges, I would like to find two minimum spanning trees (MST) on the two graphs, respectively, under the constraint that the two learned MSTs have common edges on a given subset of edges. Note that the two graphs has same number of vertices but the edge weights are all different.
For example, if the two graphs are complete edge-weighted graphs with vertices {1,...,d}. We require the two learned MSTs has same edges on the complete subgraphs with vertices {1,...,d/2}.
What algorithm can I use to find such MSTs? I tried using a modification of Kruskal's algorithm, but wasn't able to make it work.
Not sure I got the problem because the description lacks some important details.
Anyway, here is a possible approach with the given constraints for it to be applayable.
As long as two graphs have the same number of edges and you can represent those graphs as lists of edges, the MRT algorithm can be used to find all their common spanning trees.
It is commonly referred to as the Two Graphs Common Spanning Trees Algorithm and it's described in an academic article of Mint, Read and Tarjan.
Note that the Boost Graph Library already contains a proper implementation.
Once you have found those trees, you can iterate over them to drop the ones that are not minimum spanning trees for their respective graphs. Note that if you drop the i-th common spanning tree for the first graph, you should as well drop the i-th tree of the second graph.
After that, in case the set is not empty, you can drop all those trees that don't contain the given subset of edges that are part of your problem (I didn't fully understand what you mean saying that an edge is in common to two graphs, but if it's a constraint you can enforce it on the resulting set).
The remaining trees are the ones you are looking for.
If the two graphs haven't the same number of edges, you can add fake nodes and edges to the smaller one.
In other terms, create a fake node nf-i and add an edge nf-i -> n-i, where n-i is a real node. Give to the edge a null weight.
At the end of the process, you can easily remove those nodes and edges and get back the original spanning trees.

Terminology: Non-tree graph?

I'm new to data structures, and had a question on terminology. Is there a term for non-tree like graphs?
I realize that bidirectional/undirected graphs are inherently non-tree like. Is that the appropriate term? I'm asking because it seems that the tree is such a common subcategory of a graph that I figured there might be a term denoting all graphs that fall outside the subcategory.
P.s.: Please feel free to hack through any vernacular above. Would love tips on appropriate terminology in general concerning data structures.
I don't think there is a single universal term for a non-tree graph (except perhaps "non-tree graph" itself).
Trees are connected, acyclic, directed graphs, with some additional rules like each node (except the root) having exactly one parent. Some kinds of trees have other additional rules that are not common among other kinds of graphs (such as there being a significance to the order of a node's children). Depending on which of those limitations a non-tree graph violates, you might describe it differently.
A tree-like graph that is not fully connected can be described as a "forest". A forest has several root nodes, each anchoring a disjoint subtree.
If you have a graph with multiple root nodes, but their descendents overlap (so that a given child node may have more than one parent node), you have a "multitree". A human family tree may be a multitree if there there are no marriages between cousins or other relatives.
The next more general term is probably a "directed acyclic graph" or "DAG". A DAG is more general than a multitree because an ancestor node may be connected to a descendent node by more than one path. Human genealogical trees are more properly though of as DAGs, since sufficiently distant relatives are generally allowed to get married and have children (but nobody can be their own ancestor). There are many algorithms designed to work on DAGs, as forbidding cycles allows better performance for many useful applications (such as path finding).
More general still is a "directed graph" or "digraph", which relaxes the restrictions cycles. A common digraph data structure is an adjacency list (a list of arcs from one node to another).
I don't think there's any more general term beyond that, other than just "graph". If you have a specific application for a graph, there might be a specialized term for the kind of graph you will use (and perhaps algorithms or even library code to go along with it), but you'd need to ask about that specifically.

Check if a changing undirected graph has at least one circle

I have an undirected graph which initially has no edges. Now in every step an edge is added or deleted and one has to check whether the graph has at least one circle. Probably the easiest sufficient condition for that is
connected components + number of edges <= number of nodes.
As the "steps" I mentioned above are executed millions of times, this check has to be really fast. So I wonder what would be a quick way to check the condition depending on the fact that in each step only one edge changes.
Any suggestions?
If you are keen, you can try to implement a fully dynamic graph connectivity data structure like described in "Poly-logarithmic deterministic fully-dynamic graph algorithms I: connectivity and minimum spanning tree" by Jacob Holm, Kristian de Lichtenberg, Mikkel Thorup.
When adding an edge, you check whether the two endpoints are connected. If not, the number of connected components decreases by one. After deleting an edge, check if the two endpoints are stil connected. If not, the number of connected components increases by one. The amortized runtime of edge insertion and deletion would be O(log^2 n), but I can imagine the constant factor is quite high.
There are newer result with better bounds. There is also an experimental evaluation of some of the dynamic connectivity algorithms that considers implementation details as well. There is also a Javascript implementation. I have no idea how good it is.
I guess in practice you can have it much easier by maintaining a spanning forest. You get edge additions and non-tree edge deletions (almost) for free. For tree edge deletions you could just use "brute force" in the form of BFS or DFS to check whether the end points are still connected. Especially if the number of nodes is bounded, maybe that works well enough in practice, BFS and DFS are both O(n^2) for dense graphs and you can charge some of that work to the operations where you got lucky and didn't have a lot to do.
I suggest you label all the nodes. Use integers, that's easiest.
At any point, your graph will be divided into a number of disjoint subgraphs. Initially, each node is in its own subgraph.
Maintain the condition that each subgraph has a unique label, and all the nodes in the subgraph carry that label. Initially, just give each node a unique label. If your problem includes adding nodes, you might want to maintain a variable to hold the next available label.
If and only if a new edge would connect two nodes with identical labels, then the edge would create a cycle.
Whenever you add an edge, you will connect two previously disjoint subgraphs. You must relabel one of the subgraphs to match the other, which will require visiting all the nodes of one subgraph. This is the highest computatonal burden in this scheme.
If you don't mind allocating more space, you should also maintain a list of labels in use, associated with a count of the nodes carrying that label. This will allow you to choose the smaller subgraph when relabeling.
If you know which two nodes are being connected by the new edge, you could use some sort of path finding algorithm to detect an alternative path between the two nodes. In other words, if a path exists which connects the two nodes of your new edge before you add the new edge, adding the new edge will create a circle.
Your problem then reduces to finding the paths between two given nodes.

Is it possible to develop an algorithm to solve a graph isomorphism?

Or will I need to develop an algorithm for every unique graph? The user is given a type of graph, and they are then supposed to use the interface to add nodes and edges to an initial graph. Then they submit the graph and the algorithm is supposed to confirm whether the user's graph matches the given graph.
The algorithm needs to confirm not only the neighbours of each node, but also that each node and each edge has the correct value. The initial graphs will always have a root node, which is where the algorithm can start from.
I am wondering if I can develop the logic for such an algorithm in the general sense, or will I need to actually code a unique algorithm for each unique graph. It isn't a big deal if it's the latter case, since I only have about 20 unique graphs.
Thanks. I hope I was clear.
Graph isomorphism problem might not be hard. But it's very hard to prove this problem is not hard.
There are three possibilities for this problem.
1. Graph isomorphism problem is NP-hard.
2. Graph isomorphism problem has a polynomial time solution.
3. Graph isomorphism problem is neither NP-hard or P.
If two graphs are isomorphic, then there exist a permutation for this isomorphism. Take this permutation as a certificate, we could prove this two graphs are isomorphic to each other in polynomial time. Thus, graph isomorphism lies in the territory of NP set. However, it has been more than 30 years that no one could prove whether this problem is NP-hard or P. Thus, this problem is intrinsically hard despite its simple problem description.
If I understand the question properly, you can have ONE single algorithm, which will work by accepting one of several reference graphs as its input (in addition to the input of the unknown graph which isomorphism with the reference graph is to be asserted).
It appears that you seek to assert whether a given graph is exactly identical to another graph rather than asserting if the graphs are isomorph relative to a particular set of operations or characteristics. This implies that the algorithm be supplied some specific reference graph, rather than working off some set of "abstract" rules such as whether neither graphs have loops, or both graphs are fully connected etc. even though the graphs may differ in some other fashion.
Edit, following confirmation that:
Yeah, the algorithm would be supplied a reference graph (which is the answer), and will then check the user's graph to see if it is isomorphic (including the values of edges and nodes) to the reference
In that case, yes, it is quite possible to develop a relatively simple algorithm which would assert isomorphism of these two graphs. Note that the considerations mentioned in other remarks and answers and relative to the fact that the problem may be NP-Hard are merely indicative that a simple algorithm [or any algorithm for that matter] may not be sufficient to solve the problem in a reasonable amount of time for graphs which size and complexity are too big. However, assuming relatively small graphs and taking advantage (!) of the requirement that the weights of edges and nodes also need to match, the following algorithm should generally be applicable.
General idea:
For each sub-graph that is disconnected from the rest of the graph, identify one (or possibly several) node(s) in the user graph which must match a particular node of the reference graph. By following the paths from this node [in an orderly fashion, more on this below], assert the identity of other nodes and/or determine that there are some nodes which cannot be matched (and hence that the two structures are not isomorphic).
Rough pseudo code:
1. For both the reference and the user supplied graph, make the the list of their Connected Components i.e. the list of sub-graphs therein which are disconnected from the rest of the graph. Finding these connected components is done by following either a breadth-first or a depth-first path from starting at a given node and "marking" all nodes on that path with an arbitrary [typically incremental] element ID number. Once a given path has been fully visited, repeat the operation from any other non-marked node, and do so until there are no more non-marked nodes.
2. Build a "database" of the characteristics of each graph.
This will be useful to identify matching candidates and also to determine, early on, instances of non-isomorphism.
Each "database" would have two kinds of "records" : node and edge, with the following fields, respectively:
- node_id, Connected_element_Id, node weight, number of outgoing edges, number of incoming edges, sum of outgoing edges weights, sum of incoming edges weight.
node
- edge_id, Connected_element_Id, edge weight, node_id_of_start, node_id_of_end, weight_of_start_node, weight_of_end_node
3. Build a database of the Connected elements of each graph
Each record should have the following fields: Connected_element_id, number of nodes, number of edges, sum of node weights, sum of edge weights.
4. [optionally] Dispatch the easy cases of non-isomorphism:
4.a mismatch of the number of connected elements
4.b mismatch of of number of connected elements, grouped-by all fields but the id (number of nodes, number of edges, sum of nodes weights, sum of edges weights)
5. For each connected element in the reference graph
5.1 Identify candidates for the matching connected element in the user-supplied graph. The candidates must have the same connected element characteristics (number of nodes, number of edges, sum of nodes weights, sum of edges weights) and contain the same list of nodes and edges, again, counted by grouping by all characteristics but the id.
5.2 For each candidate, finalize its confirmation as an isomorph graph relative to the corresponding connected element in the reference graph. This is done by starting at a candidate node-match, i.e. a node, hopefully unique which has the exact same characteristics on both graphs. In case there is not such a node, one needs to disqualify each possible candidate until isomorphism can be confirmed (or all candidates are exhausted). For the candidate node match, walk the graph, in, say, breadth first, and by finding matches for the other nodes, on the basis of the direction and weight of the edges and weight of the nodes.
The main tricks with this algorithm is are to keep proper accounting of the candidates (whether candidate connected element at higher level or candidate node, at lower level), and to also remember and mark other identified items as such (and un-mark them if somehow the hypothetical candidate eventually proves to not be feasible.)
I realize the above falls short of a formal algorithm description, but that should give you an idea of what is required and possibly a starting point, would you decide to implement it.
You can remark that the requirement of matching nodes and edges weights may appear to be an added difficulty for asserting isomorphism, effectively simplify the algorithm because the underlying node/edge characteristics render these more unique and hence make it more likely that the algorithm will a) find unique node candidates and b) either quickly find other candidates on the path and/or quickly assert non-isomorphism.

How to modify preorder tree traversal algorithm to handle nodes with multiple parents?

I've been searching for a while now and can't seem to find an alternative solution. I need the tree traversal algorithm in such a way that a node can have more than 1 parent, if it's possible (found a great article here: Storing Hierarchical Data in a Database). Are there any algorithms so that, starting from a root node, we can determine the sequence and dependencies of nodes (currently reading topological sorting)?
The structure you described isn't a tree, it's a directed graph. As it would be suitable for hierarchical drawing you might be tempted to think of it as a tree (which itself is an acyclic connected graph).
Typical traversal algorithms for graphs are depth-first and breadth-first. The graph implementation is only different as it records the nodes it has already visited in order to avoid visiting certain nodes multiple times. However, if your data structure guarantees that it's acyclic, you can use tree algorithms on your graph by simply treating "parents" as "children".
I made a simple sketch to illustrate what I mean (the perfect chance to try Google Docs' new drawing feature):
As you see, it's possible to treat any graph that has an acyclic directed form as a tree and apply tree algorithms on it. As soon as you can't guarantee this property you'll have to go for dedicated graph algorithms.
A tree is basically a directed unweighted graph, where each vertice has N or less edges, and no cycles can happen.
If your'e certain there are no cycles in your tree, you could just treat a parent as another child of the specified node, and preform a preorder traversal normally.
However, if cycles might happen, you need graph algorithms.
Specifically: Breadth first search.
Just checking for maybe a simple case: can the two parents have different parents?
If no you could turn them into single node (conceptually) and have a tree again.
Otherwise you will have to split the child node and duplicate a branch for the other parent.
(This can of course lead to inconsistency and/or inneficient algorithms later, depending if you will need to maintain the data structure).
The above options hold if you insist on having the tree structure, which by definition can have only one parent.
So maybe you need to step back and explain what are you trying to accomplish and why it must be a tree structure if nodes can have two parents.
You aren't describing a tree here. You can NOT call your graph a tree.
A tree is an undirected graph without cycles. Parent/child relationship is NOT an interpretation of directions drawn on the edges. They are the result of naming one vertex the root.
We name a vertex "parent" to current, because it's the next one to the path to root. All other vertexes adjacent to current one are "children".
You can't just lay out an arbitrary graph in such a way that "parents" are "above" or "point to vertex", and children are "below" or "vertex points to them". A tree is a tree because a root is picked. What you depict in your question is not a tree. And tree traversal algorithms are NOT applicable to traversing arbitrary graphs.
There are several graph traversal algorithms, such as breadth-first search or depth-first search (check side notes in those pages for more). Use them instead of trying to tie your full-featured graph into your knowledge about trees.

Resources