Data Structures: Wikipedia-like Tree - performance

I am currently in the process of developing an ontology, a web hierarchy of categories of everything (think persons, places, things). The finished product should be something that allows me to navigate from Technology->Computers->Laptops->USB Ports, but also from Movies->Minority Report->Computers->etc.
I need an efficient data structure to group these. I need a tree-like graph, but a special tree that allows child nodes to have multiple parent nodes.
In thinking over this, I have realized that Wikipedia is an imperfect model for this. In fact, they have a hierarchy starting here that is essentially exactly what I need. I see that they used a directed graph, but I am wondering what the differences/drawbacks between this directed graph, a directed acyclic graph, and a polytree are. I have tried researching it, but I don't quite understand the differences. Any help would be greatly appreciated. Thank you!

I think the articles at Wikipedia give a good overview:
A directed graph is a set of nodes connected by edges which have a direction associated with them.
A directed acyclic graph (DAG) is a directed graph with no directed cycles.
A polytree (also called directed tree) is a directed graph with exactly one undirected path between any two vertices. In other words, a polytree is a directed graph whose underlying undirected graph is a tree, or equivalently, a connected directed acyclic graph for which there are no undirected cycles either.
So I think you search for a connected directed acyclic graph. Altough the Wikipedia category system allows cycles, they are unwanted.

Related

Minimize number of routes to cover all EDGEs in directed graph

Overview
I have a directed graph with about 35'000 nodes and about 400'000 edge (a node is usually connected with multiple nodes).
some edges are unidirectional while others are bidirectional.
Some nodes can be Source and/or Sink but never both at the same time.
Objective
Develop an algorithm that minimizes number of routes to cover all edges of the graph also using different couples of source and sink; of course saving these routes.
Constraints
Nodes can't be visited more than once in a single route but they can appear in different routes.
Starting from a node connected with more than one nodes, algorithm can add to the route only one edge starting from it. Others edge cannot be used in this route
here there is a simply representation of the graph. Algorithm has to find some paths to cover all edges. It will be better find the minimum number of paths.
I don't want use a force brute search of all possible paths because it will be too slow.
It would be nice use a different source and sink for each route so that can be crossed in parallel (if these haven't nodes in common).
Maybe this problem can be resolved with the Graph Coloring Technique, but I can't realize in my mind an algorithm to adapt it.

Are Trees Directed or Undirected Graphs?

I have read that Trees are special cases of Graphs.
Graphs can be directed or undirected. But if we consider tree as a data structure is it directed or undirected graph?
Unless qualified otherwise, trees in Mathematics or Graph Theory are usually assumed to be undirected, but in Computer Science or Programming or Data Structure, trees are usually assumed to be directed and rooted.
You need to be aware of the context of discussion.
See Tree on Wikipedia :
A tree is an undirected graph.
Both are acceptable.
You may have some cases where you want to be able to go up from a leaf and then go back down (usually in another branch), or you may want to be able to go only down.
Trees are connected acyclic graphs. that means you should be able to traverse from any node u to any node v. If we say trees are directed then it may not be possible to traverse from every node u to every node v.
In context of rooted trees, direction just tells which node of tree is treated as root (starting point) or to show parent child relationship between nodes and that's it all it says ... this direction does not limit the connectivity of graph or a connection between any node u to node v of tree.[1]
[1] if we considered directions in rooted as actual path which can be traversed in tree to go from node u to node v, then connectivity would be broken and that graph would not be a tree any more.

directed graphs with a given root node - match another directed graph for equality

There is a directed graph having a single designated node called root from which all other nodes are reachable. Each terminal node (no outgoing edges) takes an string value.
Intermediate nodes have one or more outgoing edges but no value associated with them. Edges connecting a node to its neighbor have a string label. The labels for edges emanating from a single node are unique. There could be possible cycles in the graph!
What is the best graph algorithm for checking if two such directed (possibly having cycles) graphs (as described above) are equal?
The graph isomorphism problem is one of the more intriguing problems in TCS. There is an entire page dedicated to it on the wiki.
In your particular case you have two rooted directed graph with a source and a sink.
You could start two BFS in parallel, and check for isomorphism level by level; i.e. levelize the graph and check whether the subset of nodes at each level are isomorphic across the two graphs.
Note that since you have a Directed, Rooted graph you should still be able to levelize it for the purpose of finding isomorphism. Do not enque nodes already visited during the BFS; i.e. levelize using the shortest path to the node from the root when determining the level to group in.
Within a set the comparison should be relatively easy. You have many properties to distinguish nodes at the same level (degree, labels) and should be able to create suitable signatures to sort them. Since you are looking for perfect isomorphism, you should get an exact match.

How do I determine whether a graph is singly connected or not?

If a graph has back edges, is it singly connected or not? By back edges I mean connections from child node to one of its ancestors, under the same root. If a node is connected to a node higher than it, but not its ancestor, then it's a cross node.
http://en.wikipedia.org/wiki/Polytree
This link clarifies the concept of singly connected graph.
If a graph has back edges, that doesn't prevent it from being singly-connected. But it might not be singly-connected for other reasons. For example, if the graph is undirected.
It seems that you are trying to make an analogy with linked lists (where singly-connected and doubly-connected are common terms with an usual meaning).
However, this isn't a big deal for graphs, and the term connectivity is more usually associated with reachability (ie.: is there a path from a node to another?)
If I understand you question correctly, you want to know whether a Polytree can contain back edges (edges from a node to one of its ancestors).
From the wikipedia article you linked to, a Polytree is a DAG that remains a tree even if the edges are made undirected. If a directed graph contained back edges, it would mean there would be a cycle in the graph (you can reach the node from its ancestor and then go back to the ancestor using the back edge). Thus it would no longer be a DAG, let alone a tree. If it isn;t a DAG, it cannot be a Polytree. So, no a Polytree cannot have a back edge.

Odd generalization of trees?

When dealing with directed graphs, a tree is a graph in which every node except one (the root) has a single incoming edge? Are there any examples of treelike structures in which every node has at most some constant number of incoming edges; say, at most two, or at most three? I haven't come across any graphs specifically described this way; is there a particular application in which they are used?
In graph theory, a tree is a connected acyclic graph. There is no requirement that every node have one incoming edge. In computer science, we often deal with rooted trees that agree with your definition.
Here is one description of a tree where some of the nodes have a constant number of incoming edges: an assignment of projects to employees, where each employee can be assigned at most three projects.
The most common generalization of a tree is a "DAG" (Directed Acyclic Graph) which is tangentially related but does not set a maximum on the size of in-neighborhoods (arcs which lead into a vertex) and specification of a single source (vertices with empty in-neighborhood).
From what I know, there's no neat term for what you're looking for. You'll need to find a true mathematician with a deep interest in graph theory to know with any certainty!
Lattices (partially ordered sets) have that property.

Resources