How to visualize an AST(abstract syntax tree) with less edge across? - algorithm

I wonder know how to visualize an AST(abstract syntax tree) with less across on edge which connected nodes. Or how to generate a AST graph with less mess or more readable?
By the way, the across happens in a special situation: I have multiple ASTs in one graph and they may share nodes(e.g. a circuit module with input and output pins could be a good example, it has many across as wires, gate as node).
I search about it. But most result I got is about AST definition but little about visualization.
I want to know is there any algorithm to optimize node placement in the graph?

Related

Travelling around a graph and making predictions along the way

I'm a student doing research and am having trouble proceeding because I don't have much experience in this field, so even if you can't answer the questions but do know important terms I should be looking for that would be a big help.
If we have an agent travelling around a graph, going from node to node, are there algorithms (and if so, the names of algorithms) that do the following:
Predict the topology of the graph using bayesian statistics, which assumes the graph is finite and what has been seen before increasingly represents what will be seen in the future.
Can predict labellings from previous labellings. So if we have a chain of 26 nodes, with the first node being A, the second B, so on and so, then at some point you should be able to predict that the labellings are in alphabetical order long before we reach the end.
Use the labellings to predict the topology of the graph; so if I have a graph where so far it is a chain and the labellings are clearly alphabetical order, then a good guess for the rest of the graph would be that it is a chain.
Relevant API would be fantastic

Graph Topology Profiling

Can anyone suggest me some algorithms that can be used to analyze the graph topology classification?
Input: Adjacency list with raw graph information.
Output : What kind of graph is it? Currently I want to focus only on Pure Types - Daisy chain, Mesh, Ring, Star, Tree.
Which area of algorithm study is responsible for such algorithm? Is it Computational Geometry?
Edit - The size of graph will not exceed 32 nodes. However, there will be redundant links between nodes.
Edit - I understand that my question might be too broad, but at least give me the clue of what is wrong with the question before down-voting it. Or is it because of my reputation :-(
Start by checking that your graph is fully connected.
Then, check the distribution of the nodes' degree:
Ring: All nodes would have degree 2
Daisy chain: all nodes would have degree 2 except for 2 nodes with degree 1 (there are alternative definitions for what a daisy chain is).
Star: Each node would have degree 1, except for one node with degree n-1
Tree: The sum of the degrees is 2*(number of nodes-1). Also, if the highest degree is k, then there are at least k nodes with degree 1.
Mesh: Anything goes...
I don't think there is a 'area' of algorithms that deals with such problems, but the term 'graph classes' is quite common (See for example here), though it is not a formal term.
To classify a new instance, you need a classification system in the first place!
Putting it another way, your graph (the item to classify) fits somewhere in some kind of data structure of graph topologies (the classification system). The system could be as simple as a list; in which case, you carry out the simple algorithm outlined in this other post where the list of topologies is keyed by degree distribution.
A more complex system could be a hierarchical one, similar to biological classification systems. This would only really be necessary for very large numbers of graph topologies, where it would make it faster to classify based on a series of decisions. Essentially a decision tree.
It may be difficult to find much research in this area (for pure graphs) as it's a little hard to think of applications. There are applications for protein fold topologies, but that may not be of interest.

Terminology: Non-tree graph?

I'm new to data structures, and had a question on terminology. Is there a term for non-tree like graphs?
I realize that bidirectional/undirected graphs are inherently non-tree like. Is that the appropriate term? I'm asking because it seems that the tree is such a common subcategory of a graph that I figured there might be a term denoting all graphs that fall outside the subcategory.
P.s.: Please feel free to hack through any vernacular above. Would love tips on appropriate terminology in general concerning data structures.
I don't think there is a single universal term for a non-tree graph (except perhaps "non-tree graph" itself).
Trees are connected, acyclic, directed graphs, with some additional rules like each node (except the root) having exactly one parent. Some kinds of trees have other additional rules that are not common among other kinds of graphs (such as there being a significance to the order of a node's children). Depending on which of those limitations a non-tree graph violates, you might describe it differently.
A tree-like graph that is not fully connected can be described as a "forest". A forest has several root nodes, each anchoring a disjoint subtree.
If you have a graph with multiple root nodes, but their descendents overlap (so that a given child node may have more than one parent node), you have a "multitree". A human family tree may be a multitree if there there are no marriages between cousins or other relatives.
The next more general term is probably a "directed acyclic graph" or "DAG". A DAG is more general than a multitree because an ancestor node may be connected to a descendent node by more than one path. Human genealogical trees are more properly though of as DAGs, since sufficiently distant relatives are generally allowed to get married and have children (but nobody can be their own ancestor). There are many algorithms designed to work on DAGs, as forbidding cycles allows better performance for many useful applications (such as path finding).
More general still is a "directed graph" or "digraph", which relaxes the restrictions cycles. A common digraph data structure is an adjacency list (a list of arcs from one node to another).
I don't think there's any more general term beyond that, other than just "graph". If you have a specific application for a graph, there might be a specialized term for the kind of graph you will use (and perhaps algorithms or even library code to go along with it), but you'd need to ask about that specifically.

What data structure to use for digraph paths?

I'm trying to represent a transitive relation (in a database) and having a hard time working out the best data structure.
Basically, the data structure is a series of pairs A → B such that if A → B and B → C, then implicitly A → C. It's important to me to be able to identify which entries are original input and which entries exist implicitly. Asking if A → C is equivalent to me having a digraph and asking if there exists a path from A to C in that digraph.
I could just represent the original entries, but if I do than then it takes a lot of time to determine if two items are related, since I need to search for all possible paths and this is rather slow.
Alternatively, I can store the original edges, as well as a listing of all paths. This makes adding a new edge easy, because when I add A → B I can just take the Cartesian product of paths ending in A and the paths ending in B and put them together. This has some significant space overhead of O(n2) in the worst case, but has the nice property that lookups, by far the most common operation, will be constant time. The issue is deleting, where I cannot think of anything really other than recalculating all paths that may or may not run through the edge deleted, and this can be really nasty.
Does anyone have any better ideas?
Technical notes: the digraph may be cyclic, but the relation is reflexive so I don't need to represent the reflexivity or store anything about it.
This is called the Reachability problem.
It would seem that you want an efficient online algorithm, which is an open problem, and an area of much research.
See my similar question on cs.SE: An incrementally-condensed transitive-reduction of a DAG, with efficient reachability queries, where I reference several related querstions across stackexchange:
Related:
What is the fastest deterministic algorithm for dynamic digraph reachability with no edge deletion?
What is the fastest deterministic algorithm for incremental DAG reachability?
Does an algorithm exist to efficiently maintain connectedness information for a DAG in presence of inserts/deletes?
Is there an online-algorithm to keep track of components in a changing undirected graph?
Dynamic shortest path data structure for DAG
Note that even though some algorithm might be for a DAG only, if it supports condensation (that is, collapsing strongly connected components into one node, since they are considered equal, ie. they relate back and forth), it is equivalent; after condensation, you can query the graph for the representative node in place of any of the condensed nodes (because they were both reachable from each-other, and thusly related to the rest of the graph in exactly the same way).
My conclusion is that as-of-yet there does not seem to be an efficient way to do this (on the order of O(log n) queries for a dynamic graph, with output-sensitive update times on the condensed graph). For less efficient ways, see the related links above.
The closest practical algorithm I found was here (source), which is an interesting read. I am not sure how easy/practical this data-structure or any data structure in any paper you will find, would be to adapt it to a database.
PS. Consider asking CS-related questions on cs.stackexchange.com in the future.

Finding a minimum/maximum weight Steiner tree

I asked this question on reddit, but haven't converged on a solution yet. Since many of my searches bring me to Stack Overflow, I decided I would give this a try. Here is a simple formulation of my problem:
Given a weighted undirected graph G(V,E,w) and a subset of vertices S in G, find the min/max weight tree that spans S. Adding vertices is not allowed. An extension of the basic model is adding edges with 0 weight, and vertices that must be excluded. This seems similar to the question asked here:
Algorithm to find minimum spanning tree of chosen vertices
There is also more insight into what values the edges can take. Each edge is actually a correlation probability, which I can encode in several ways, so the main questions I want to ask the graph are:
Given k vertices that must be connected, what are the top X min/max spanning trees that connect them, and what vertices do they pass through? As I understand it, this is the same question as asking the graph what is the highest probability of connecting all of the k vertices.
Getting more vague, is there a logical way to cluster the nodes?
As for implementation, I have the boost libraries installed, and once I get the framework rolling on this problem, I can deal with how to multi-thread it (if appropriate), what kind of graph to use, and how to store/cache the data, since the number of vertices and edges is going to be quite large.
Update
Looking at the problem I am trying to solve, it makes sense that it would be NP-complete. The real world problem that I am trying to solve involves medical diagnoses; specifically when the medical community is working on a problem with a specific idea in mind, and they need to take a step back and reconsider how they got there. What I want from the program I am trying to design is:
Given several conditions, tests, symptoms, age, gender, season, confirmed diagnosis, timeline, how can you relate them? What cells/tissues/organs/systems are touched? Are they even related?
Along with the defined groups that conditions/symptoms can belong to, is there a way to logically group the conditions/symptoms?
Example
Flu-like symptoms, red eyes, early pneumonia, and some of the signs of diabetes. Is there a way to relate all of the symptoms? Are there some tests that could be done to make it easier to determine? What systems are involved?
It just seemed natural to try and map this to a graph, or several graphs, and use probabilities as the correlation between different symptoms/conditions.
I have seen models for your problem that were mostly based on Bayesian inference and fuzzy logic. Bayesian inference networks express the relation between causes and effects e.g. smoking and lung cancer. Look here for a quick tutorial. You can apply fuzzy logic to that modelling to try to take into account the variablility in real life (as not everyone gets lung cancer).

Resources