Finding a minimum/maximum weight Steiner tree - algorithm

I asked this question on reddit, but haven't converged on a solution yet. Since many of my searches bring me to Stack Overflow, I decided I would give this a try. Here is a simple formulation of my problem:
Given a weighted undirected graph G(V,E,w) and a subset of vertices S in G, find the min/max weight tree that spans S. Adding vertices is not allowed. An extension of the basic model is adding edges with 0 weight, and vertices that must be excluded. This seems similar to the question asked here:
Algorithm to find minimum spanning tree of chosen vertices
There is also more insight into what values the edges can take. Each edge is actually a correlation probability, which I can encode in several ways, so the main questions I want to ask the graph are:
Given k vertices that must be connected, what are the top X min/max spanning trees that connect them, and what vertices do they pass through? As I understand it, this is the same question as asking the graph what is the highest probability of connecting all of the k vertices.
Getting more vague, is there a logical way to cluster the nodes?
As for implementation, I have the boost libraries installed, and once I get the framework rolling on this problem, I can deal with how to multi-thread it (if appropriate), what kind of graph to use, and how to store/cache the data, since the number of vertices and edges is going to be quite large.
Update
Looking at the problem I am trying to solve, it makes sense that it would be NP-complete. The real world problem that I am trying to solve involves medical diagnoses; specifically when the medical community is working on a problem with a specific idea in mind, and they need to take a step back and reconsider how they got there. What I want from the program I am trying to design is:
Given several conditions, tests, symptoms, age, gender, season, confirmed diagnosis, timeline, how can you relate them? What cells/tissues/organs/systems are touched? Are they even related?
Along with the defined groups that conditions/symptoms can belong to, is there a way to logically group the conditions/symptoms?
Example
Flu-like symptoms, red eyes, early pneumonia, and some of the signs of diabetes. Is there a way to relate all of the symptoms? Are there some tests that could be done to make it easier to determine? What systems are involved?
It just seemed natural to try and map this to a graph, or several graphs, and use probabilities as the correlation between different symptoms/conditions.

I have seen models for your problem that were mostly based on Bayesian inference and fuzzy logic. Bayesian inference networks express the relation between causes and effects e.g. smoking and lung cancer. Look here for a quick tutorial. You can apply fuzzy logic to that modelling to try to take into account the variablility in real life (as not everyone gets lung cancer).

Related

Travelling around a graph and making predictions along the way

I'm a student doing research and am having trouble proceeding because I don't have much experience in this field, so even if you can't answer the questions but do know important terms I should be looking for that would be a big help.
If we have an agent travelling around a graph, going from node to node, are there algorithms (and if so, the names of algorithms) that do the following:
Predict the topology of the graph using bayesian statistics, which assumes the graph is finite and what has been seen before increasingly represents what will be seen in the future.
Can predict labellings from previous labellings. So if we have a chain of 26 nodes, with the first node being A, the second B, so on and so, then at some point you should be able to predict that the labellings are in alphabetical order long before we reach the end.
Use the labellings to predict the topology of the graph; so if I have a graph where so far it is a chain and the labellings are clearly alphabetical order, then a good guess for the rest of the graph would be that it is a chain.
Relevant API would be fantastic

matching two graphs with the lowest error

I have two graphs that I would like to match (I am not sure this is the world I'm looking for).
In my first graph nodes represents teams (node value represents the number of people in the team) and links represent how close teams are on a scale of 1 to 5. Two teams working a lot together will have a stronger link than two teams that work sometimes together.
In my second graph nodes represent spaces (node value represents the available places in the space) and links represent how close the spaces are. If two spaces are on the same floor they will have a stronger link than two spaces that are not on the same floor.
I need to distribute the teams in the available spaces minimizing the distance between each linked team (two teams that have a strong link would be at the same floor for example).
My first question is : do you have a magic recipe that would solve this problem ?
My second question : If not, do you know in what direction I need to check (algorithm that could be reworked, lectures, articles ...).
Thank you very much.
Thoma
To answer the question in part, apparently there is no known polynomial-time algorithm to solve the problem, as the problem includes the graph isomorphism problem as a subproblem. This problem is neither known to be NP-complete nor has a polynomial algorithm been found.
More precisely, suppose that the room graph is exactly the team graph, where edges are not weighted. As an optimal solution would match each team to the corresponding room, an algorithm for the problem in the question would be able to identify the input graphs to be isomorphic.
After talking with some people, it seems that it might not be the best solution.
I will look in the direction of solvers to be abble to have define constraints.
Thank you.

Find all cycles in a directed Graph Golang

I'm trying to generate all the cycles contained in a directed graph using Golang (or at least a few).
I currently have two structs :
Node : { ID (string), resolved (bool), edges ([]Edge) }
Edge : { ID (string), start (Node), end (Node), weight (Float64)}
The cycle weight is not an issue (for the moment).
I've found some answers regarding how to detect cycles, or find shortest path etc. but i didn't find an algorithm that can quite help me.
How shall I proceed? (any suggestion is welcome)
There are two parts to the question.
Regarding algorithms to detect all cycles in a graph, take a look at this related question (since this is not go-specific), there are useful explanations and pseudo-code that you can use to implement your solution.
Finding all cycles in a directed graph
As per specific go code, there are several libraries out there that work with graphs, you can take a look at their documentation and source code (they might even provide functionality that you can use out-of-the-box to solve your problem).
For example: https://godoc.org/github.com/twmb/algoimpl/go/graph
I would suggest starting with defining what a cycle is - for example let's suppose it is a traversal through the graph that starts and ends in the same node.
To enumerate all cycles with this definition, you'll need to consider all paths starting from all nodes, and check if any of those paths go back to their start point.
However, observe that this definition can actually count each cyclical subgraph many times - any node along a cyclical path - is that one cycle or several? And things get even more complicated if the paths of several cycles intersect, the number of cyclical paths increases drastically, and it's not very clear which cycles are "the same".
I hope it's easy to see that a brute force approach is intractable for anything but very small and simple graphs, and that something concerned with say minimal cycles or even just identifying cyclic subgraphs is enough for your purposes.
As already mentioned by #eugenioy, this has been asked before and you can probably narrow down your question by looking at the answers in that thread.
So, depending on what you mean by "all" and what you mean by "cycles", you can probably find an algorithm that defines cycles in the same way that you are interested in, and, and ask a more focused question if you're having trouble translating it to Go, which I don't think your question is really about at the moment.

How to divide a connected weighted graph to N semi-equal subgraphs

I have a graph of many hundred nodes that are mainly connected with each other. I can do processing on entire graph but it really takes a lot of time, so I would like to divide it to smaller sub-graphs of approximately similar size.
With other words. I have a collection of aerial images and I do pairwise image matching on all of them. As a result I get a set of matches for each pair (pixel from first image matched with pixel on second image). Number of matches is considered as weight of this (undirected) edge. These edges then form a graph mentioned above.
I'm not so familiar with graph theory (as it is a very broad topic). What is the best algorithm for this job?
Thank you.
Edit:
This problem has a perfect analogy which I think is easier to understand. Imagine you have a set of people and their connections/friendships, like I social network. Each friendship has a numeric value/weight representing how good friends they are. So in a large group of people I want to get k most interconnected sub-groups .
Unfortunately, the problem you're describing is almost certainly NP-hard. From a graph perspective, you have a graph where each edge has a weight on it. You're trying to split the graph into relatively equal pieces while cutting the lowest total cost of edges cut. This problem is called the maximum k-cut problem and is NP-hard. If you introduce the constraint that you also want to try to make the pieces roughly even in size, you have the balanced k-cut problem, which is also NP-hard.
The good news is that there are nice approximation algorithms for these problems, so if you're looking for solutions that are just "good enough," then you can probably find a library somewhere that implements them. There are also other techniques like spectral clustering which work well in practice and are really fast, but which don't have any guarantees on how well they'll do.

Graph Topology Profiling

Can anyone suggest me some algorithms that can be used to analyze the graph topology classification?
Input: Adjacency list with raw graph information.
Output : What kind of graph is it? Currently I want to focus only on Pure Types - Daisy chain, Mesh, Ring, Star, Tree.
Which area of algorithm study is responsible for such algorithm? Is it Computational Geometry?
Edit - The size of graph will not exceed 32 nodes. However, there will be redundant links between nodes.
Edit - I understand that my question might be too broad, but at least give me the clue of what is wrong with the question before down-voting it. Or is it because of my reputation :-(
Start by checking that your graph is fully connected.
Then, check the distribution of the nodes' degree:
Ring: All nodes would have degree 2
Daisy chain: all nodes would have degree 2 except for 2 nodes with degree 1 (there are alternative definitions for what a daisy chain is).
Star: Each node would have degree 1, except for one node with degree n-1
Tree: The sum of the degrees is 2*(number of nodes-1). Also, if the highest degree is k, then there are at least k nodes with degree 1.
Mesh: Anything goes...
I don't think there is a 'area' of algorithms that deals with such problems, but the term 'graph classes' is quite common (See for example here), though it is not a formal term.
To classify a new instance, you need a classification system in the first place!
Putting it another way, your graph (the item to classify) fits somewhere in some kind of data structure of graph topologies (the classification system). The system could be as simple as a list; in which case, you carry out the simple algorithm outlined in this other post where the list of topologies is keyed by degree distribution.
A more complex system could be a hierarchical one, similar to biological classification systems. This would only really be necessary for very large numbers of graph topologies, where it would make it faster to classify based on a series of decisions. Essentially a decision tree.
It may be difficult to find much research in this area (for pure graphs) as it's a little hard to think of applications. There are applications for protein fold topologies, but that may not be of interest.

Resources