Data Structure for troubleshooting flow? - data-structures

What data structure would be best for a non-tree (in the graph theory sense), path-saved, troubleshooting flow for a web troubleshooting site?
In plainer words, if I'm wanting to model a troubleshooting flow, such that it's not strictly in a "downward" direction towards a resolution or contact point, is there a more specific structure than just a graph? I'm not worried about efficiency, instead I'm worried about easy of description; the goal is for this to be defined in YAML or some other markup that non-programmers can implement and maintain.

a graph: each node represents a specific state of knowledge (what is known: for example the result of a test to partially identify the problem) then you have directed edges for adding knowledge to arrive at a different node. Optionally use undirected edges to "unknow" or forget a previous assumption on the problem at hand.

Related

Travelling around a graph and making predictions along the way

I'm a student doing research and am having trouble proceeding because I don't have much experience in this field, so even if you can't answer the questions but do know important terms I should be looking for that would be a big help.
If we have an agent travelling around a graph, going from node to node, are there algorithms (and if so, the names of algorithms) that do the following:
Predict the topology of the graph using bayesian statistics, which assumes the graph is finite and what has been seen before increasingly represents what will be seen in the future.
Can predict labellings from previous labellings. So if we have a chain of 26 nodes, with the first node being A, the second B, so on and so, then at some point you should be able to predict that the labellings are in alphabetical order long before we reach the end.
Use the labellings to predict the topology of the graph; so if I have a graph where so far it is a chain and the labellings are clearly alphabetical order, then a good guess for the rest of the graph would be that it is a chain.
Relevant API would be fantastic

Implementing shortcuts (reach) pruning while using A*

I am working on a project for shortest path finding. I have looked at alot of resources online to come up with a good algorithm.
I am working with openstreetmap data and it's clear to me that I have to use A* algorithm.
While looking for different solutions, I have found that because a way is made of different nodes, one can prune away the intermediate nodes that are not junctions.
How can I do this in a programming language? If anyone has an idea or a further article that can help me, that would be really grateful.
The exact information I found about this pruning that's relevant to osm is this
parse all ways a second time; a way will normally become one edge,
but if any nodes apart from the first and the last have a link counter
greater than one, then split the way into two edges at that point.
Nodes with a link counter of one and which are neither first nor last
can be thrown away unless you need to compute the length of the edge.
Have a look into the GraphHopper project (where I'm the author of) or other routing projects for OSM already doing this. The idea is to count the number of ways one node is member of and mark nodes as junctions if they have a count of three or more (or just one if an endstanding 'junction').
Still the nodes in-between should be accessible as you need to plot the route for the end results after calculating the route. In GraphHopper we call them pillar nodes (nodes between junctions) and tower nodes (junctions). Here is more detailed information.
Another problem is that you have to calculate GPS precise routes and not just routes from junction to junction. Look into this change how we fixed this via virtual nodes and edges.

Finding a minimum/maximum weight Steiner tree

I asked this question on reddit, but haven't converged on a solution yet. Since many of my searches bring me to Stack Overflow, I decided I would give this a try. Here is a simple formulation of my problem:
Given a weighted undirected graph G(V,E,w) and a subset of vertices S in G, find the min/max weight tree that spans S. Adding vertices is not allowed. An extension of the basic model is adding edges with 0 weight, and vertices that must be excluded. This seems similar to the question asked here:
Algorithm to find minimum spanning tree of chosen vertices
There is also more insight into what values the edges can take. Each edge is actually a correlation probability, which I can encode in several ways, so the main questions I want to ask the graph are:
Given k vertices that must be connected, what are the top X min/max spanning trees that connect them, and what vertices do they pass through? As I understand it, this is the same question as asking the graph what is the highest probability of connecting all of the k vertices.
Getting more vague, is there a logical way to cluster the nodes?
As for implementation, I have the boost libraries installed, and once I get the framework rolling on this problem, I can deal with how to multi-thread it (if appropriate), what kind of graph to use, and how to store/cache the data, since the number of vertices and edges is going to be quite large.
Update
Looking at the problem I am trying to solve, it makes sense that it would be NP-complete. The real world problem that I am trying to solve involves medical diagnoses; specifically when the medical community is working on a problem with a specific idea in mind, and they need to take a step back and reconsider how they got there. What I want from the program I am trying to design is:
Given several conditions, tests, symptoms, age, gender, season, confirmed diagnosis, timeline, how can you relate them? What cells/tissues/organs/systems are touched? Are they even related?
Along with the defined groups that conditions/symptoms can belong to, is there a way to logically group the conditions/symptoms?
Example
Flu-like symptoms, red eyes, early pneumonia, and some of the signs of diabetes. Is there a way to relate all of the symptoms? Are there some tests that could be done to make it easier to determine? What systems are involved?
It just seemed natural to try and map this to a graph, or several graphs, and use probabilities as the correlation between different symptoms/conditions.
I have seen models for your problem that were mostly based on Bayesian inference and fuzzy logic. Bayesian inference networks express the relation between causes and effects e.g. smoking and lung cancer. Look here for a quick tutorial. You can apply fuzzy logic to that modelling to try to take into account the variablility in real life (as not everyone gets lung cancer).

Implementing Kruskal's algorithm in Ada, not sure where to start

With reference to Kruskal's algorithm in Ada, I'm not sure where to start.
I'm trying to think through everything before I actually write the program, but am pretty lost as to what data structures I should be using and how to represent everything.
My original thought is to represent the full tree in an adjacency list, but reading Wikipedia the algorithm states to create a forest F (a set of trees), where each vertex in the graph is a separate tree and I'm not sure how to implement this without getting really messy quickly.
The next thing it says to do is create a set S containing all the edges in the graph, but once again I'm not sure what the best way to do this would be. I was thinking of an array of records, with a to, from and weight, but I'm lost on the forest.
Lastly, I'm trying to figure out how I would know if an edge connects two trees, but again am not sure what the best way to do all of this is.
I can see where their algorithm description would leave you confused as how to start. It left me the same way.
I'd suggest reading over the later Example section instead. That makes it pretty clear how to proceed, and you can probably come up with the data structures you would need to do it just from that.
It looks like the basic idea is the following:
Take the graph, find the shortest edge that introduces at least one new vertex, and put it in your "spanning tree".
Repeat the step above until you have every vertex.
The "create a forest part" really means: implement the pseudocode from the page Disjoint-set data structure. If you can read C++, then I have a pretty straightforward implementation here. (That implementation works, I've used it to implement Kruskal's algo myself :)

Algorithm to computer the optimal layout of n-ary tree?

I am looking for an algorithm that will automatically arrange all the nodes in an n-tree so that no nodes overlap, and not too much space is wasted. The user will be able to add nodes at runtime and the tree must auto arrange itself. Also note it is possible that the tree's could get fairly large ( a few thousand nodes ).
The algorithm has to work in real time, meaning the user cannot notice any pausing.
I have tried Google but I haven't found any substantial resources, any help is appreciated!
I took a look at this problem a while back and decided ultimately to change my goals from a Directed acyclic graph (DAG) to a general graph only due to complexities of what I encountered.
That being said, have you looked at the Sugiyama algorithm for graph layout?
If you're not looking to roll your own, I came across yFiles that did the job quite nicely (a bit on the pricy side though, so I did end up doing exactly that - rolling my own).

Resources