Terminology: Non-tree graph? - data-structures

I'm new to data structures, and had a question on terminology. Is there a term for non-tree like graphs?
I realize that bidirectional/undirected graphs are inherently non-tree like. Is that the appropriate term? I'm asking because it seems that the tree is such a common subcategory of a graph that I figured there might be a term denoting all graphs that fall outside the subcategory.
P.s.: Please feel free to hack through any vernacular above. Would love tips on appropriate terminology in general concerning data structures.

I don't think there is a single universal term for a non-tree graph (except perhaps "non-tree graph" itself).
Trees are connected, acyclic, directed graphs, with some additional rules like each node (except the root) having exactly one parent. Some kinds of trees have other additional rules that are not common among other kinds of graphs (such as there being a significance to the order of a node's children). Depending on which of those limitations a non-tree graph violates, you might describe it differently.
A tree-like graph that is not fully connected can be described as a "forest". A forest has several root nodes, each anchoring a disjoint subtree.
If you have a graph with multiple root nodes, but their descendents overlap (so that a given child node may have more than one parent node), you have a "multitree". A human family tree may be a multitree if there there are no marriages between cousins or other relatives.
The next more general term is probably a "directed acyclic graph" or "DAG". A DAG is more general than a multitree because an ancestor node may be connected to a descendent node by more than one path. Human genealogical trees are more properly though of as DAGs, since sufficiently distant relatives are generally allowed to get married and have children (but nobody can be their own ancestor). There are many algorithms designed to work on DAGs, as forbidding cycles allows better performance for many useful applications (such as path finding).
More general still is a "directed graph" or "digraph", which relaxes the restrictions cycles. A common digraph data structure is an adjacency list (a list of arcs from one node to another).
I don't think there's any more general term beyond that, other than just "graph". If you have a specific application for a graph, there might be a specialized term for the kind of graph you will use (and perhaps algorithms or even library code to go along with it), but you'd need to ask about that specifically.

Related

Efficient Algorithm for subgraph enumeration

I have searched related issues about subgraph enumeration. However, they didn't meet my requirement(*). (If I misunderstood something, please tell me.)
Is there an efficient algorithm or tools for the enumeration of all "connected, and unlabelled" subgraphs of a undirected parent graph.
In my case, the parent graph is an Internet topology so the amount of nodes could be large. And I would like to enumerate all of the connected unlabelled patterns (i.e. subgraphs) of the parent graph.
(*) I have searched Efficiently find all connected subgraphs and Subgraph enumeration but both of them were targeting on vertex-labelled induced and complete subgraphs respectively. But all I want is just the connected unlabelled subgraphs.
A topic name that might be helpful is "frequent subgraph mining", which is what it seems to be one name for this. There are various tools and algorithms in this area, although they may not do exactly what you want, of course.
As other point out in the answers to the two questions in your links, the number of subgraphs of large graphs can be very large. Assuming you actually want to list them, not just count them then it might take a long time.
Edit : OP has pointed out that the input here is ONE large graph, not a set of smaller ones, which will not work with standard graph mining
I still think the general approach can work here. The input set of graphs for mining is some subset of the subgraphs of your data graph. But that subgraph-set is what you want in the first place!
So lets say you pick a size of subgraph that you want (let's say 6 vertices) then you randomly pick starting vertices in your parent (the internet topology) and 'grow' these seeds, weeding out at each growth step those that don't match. Then repeat for different sizes of subgraph.
Of course, this is a probabilistic algorithm, but it could give you some idea.

Is there a name for this data structure that is kind of "opposite" of a tree?

We all know what a tree is: on the first level of a tree we have a root, and from the root come branches that are trees as well. But how do I name the "opposite" structure: on the i-th level we have a set of "leaf" nodes, and those nodes form groups of 1+ nodes, and a group points to a "trunk" node on i+1th level. If you want a visual example, imagine raindrops flowing down a window and combining as they collide.
A lot of tree data structures are actually constructed from leaf to root, and can be stored to allow for going one or both directions.
I don't think it really has a special name as it's more a convention than a requirement for trees typically to go from root to leaf rather than the other way or both ways. Also there are a number of tree data structures that allow for going both ways.
Every tree is a DAG, a directed acyclic graph, and so is the data-structure that you describe. What you describe is also a multitree, a subset of DAGs. Possibly there is a more precise real subset of multitrees that describes your graph, but I am not aware of it. Hope this helps.

Significance of various graph types

There are a lot of named graph types. I am wondering what is the criteria behind this categorization. Are different types applicable in different context? Moreover, can a business application (from design and programming perspective) benefit anything out of these categorizations? Is this analogous to design patterns?
We've given names to common families of graphs for several reasons:
Certain families of graphs have nice, simple properties. For example, trees have numerous useful properties (there's exactly one path between any pair of nodes, they're maximally acyclic, they're minimally connected, etc.) that don't hold of arbitrary graphs. Directed acyclic graphs can be topologically sorted, which normal graphs cannot. If you can model a problem in terms of one of these types of graphs, you can use specialized algorithms on them to extract properties that can't necessarily be obtained from an arbitrary graph.
Certain algorithms run faster on certain types of graphs. Many NP-hard problems on graphs, which as of now don't have any polynomial-time algorithms, can be solved very easily on certain types of graphs. For example, the maximum independent set problem (choose the largest collection of nodes where no two nodes are connected by an edge) is NP-hard, but can be solved in polynomial time for trees and bipartite graphs. The 4-coloring problem (determine whether the nodes of a graph can be colored one of four different colors without assigning the same color to adjacent nodes) is NP-hard in general, but is immediately true for planar graphs (this is the famous four-color theorem).
Certain algorithms are easier on certain types of graphs. A matching in a graph is a collection of edges in the graph where no two edges share an endpoint. Maximum matchings can be used to represent ways of pairing people up into groups. In a bipartite graph, a maximum matching can be used to represent a way of assigning people to tasks such that no person is assigned two tasks and no task is assigned to two people. There are many fast algorithms for finding maximum matchings in bipartite graphs that work quickly and are easy to understand. The corresponding algorithms for general graphs are significantly more complicated and slightly less efficient.
Certain graphs are historically significant. Many named graphs are named after someone who used the graph to disprove a conjecture about properties of arbitrary graphs. The Petersen graph, for example, is a counterexample to many theorems that seem true about graphs but are actually not.
Certain graphs are useful in theoretical computer science. An expander graph is a graph where, intuitively, any collection of nodes must be connected to a proportionally larger collection of nodes in the graph. Not all graphs are expander graphs. Expander graphs are used in many results in theoretical computer science, such as one proof of the PCP theorem and in the proof that SL = L.
This is not an exhaustive list of why we care about different graph families, but hopefully it helps motivate their usage and study.
Hope this helps!

Odd generalization of trees?

When dealing with directed graphs, a tree is a graph in which every node except one (the root) has a single incoming edge? Are there any examples of treelike structures in which every node has at most some constant number of incoming edges; say, at most two, or at most three? I haven't come across any graphs specifically described this way; is there a particular application in which they are used?
In graph theory, a tree is a connected acyclic graph. There is no requirement that every node have one incoming edge. In computer science, we often deal with rooted trees that agree with your definition.
Here is one description of a tree where some of the nodes have a constant number of incoming edges: an assignment of projects to employees, where each employee can be assigned at most three projects.
The most common generalization of a tree is a "DAG" (Directed Acyclic Graph) which is tangentially related but does not set a maximum on the size of in-neighborhoods (arcs which lead into a vertex) and specification of a single source (vertices with empty in-neighborhood).
From what I know, there's no neat term for what you're looking for. You'll need to find a true mathematician with a deep interest in graph theory to know with any certainty!
Lattices (partially ordered sets) have that property.

How to modify preorder tree traversal algorithm to handle nodes with multiple parents?

I've been searching for a while now and can't seem to find an alternative solution. I need the tree traversal algorithm in such a way that a node can have more than 1 parent, if it's possible (found a great article here: Storing Hierarchical Data in a Database). Are there any algorithms so that, starting from a root node, we can determine the sequence and dependencies of nodes (currently reading topological sorting)?
The structure you described isn't a tree, it's a directed graph. As it would be suitable for hierarchical drawing you might be tempted to think of it as a tree (which itself is an acyclic connected graph).
Typical traversal algorithms for graphs are depth-first and breadth-first. The graph implementation is only different as it records the nodes it has already visited in order to avoid visiting certain nodes multiple times. However, if your data structure guarantees that it's acyclic, you can use tree algorithms on your graph by simply treating "parents" as "children".
I made a simple sketch to illustrate what I mean (the perfect chance to try Google Docs' new drawing feature):
As you see, it's possible to treat any graph that has an acyclic directed form as a tree and apply tree algorithms on it. As soon as you can't guarantee this property you'll have to go for dedicated graph algorithms.
A tree is basically a directed unweighted graph, where each vertice has N or less edges, and no cycles can happen.
If your'e certain there are no cycles in your tree, you could just treat a parent as another child of the specified node, and preform a preorder traversal normally.
However, if cycles might happen, you need graph algorithms.
Specifically: Breadth first search.
Just checking for maybe a simple case: can the two parents have different parents?
If no you could turn them into single node (conceptually) and have a tree again.
Otherwise you will have to split the child node and duplicate a branch for the other parent.
(This can of course lead to inconsistency and/or inneficient algorithms later, depending if you will need to maintain the data structure).
The above options hold if you insist on having the tree structure, which by definition can have only one parent.
So maybe you need to step back and explain what are you trying to accomplish and why it must be a tree structure if nodes can have two parents.
You aren't describing a tree here. You can NOT call your graph a tree.
A tree is an undirected graph without cycles. Parent/child relationship is NOT an interpretation of directions drawn on the edges. They are the result of naming one vertex the root.
We name a vertex "parent" to current, because it's the next one to the path to root. All other vertexes adjacent to current one are "children".
You can't just lay out an arbitrary graph in such a way that "parents" are "above" or "point to vertex", and children are "below" or "vertex points to them". A tree is a tree because a root is picked. What you depict in your question is not a tree. And tree traversal algorithms are NOT applicable to traversing arbitrary graphs.
There are several graph traversal algorithms, such as breadth-first search or depth-first search (check side notes in those pages for more). Use them instead of trying to tie your full-featured graph into your knowledge about trees.

Resources