I have a computational problem where, given a set of observations, I want to determine the minimum set of phenomena (explanations) that account for all observations. Phenomena can cause one another, that is all phenomena can be represented as an unweighed directed graph with causal relationships as edges.
I am given the following:
An exhaustive list of all possible observations O_1 ... O_N
An exhaustive list of all possible phenomena (causes/explanations) C_1, ... C_N
For each observation O_N, a list of the phenomena that can cause it
For each phenomenon C_N, a list of the other phenomena that can cause it
The problem is represented below in graph form (sorry for the quality of the picture). Each node is a phenomenon, and each edge represents a causal relationship between phenomena. Edges are unweighted. Each region outlined by a bigger "bubble" represents a possible observation, with all phenomena lying within the bubble being the subset of phenomena that are known to cause that observation.
The problem, restated, is to find the shortest path that crosses all regions in the graph. (For simplicity, assume there is a unique path that explains all observations - no branching, no need for multiple paths).
My questions are as follows:
Is this a known computational problem, or a variant of a known computational problem?
Are there known algorithms for solving this specific problem (beyond just "use existing shortest path" algorithms)?
If not, how should I approach this problem? Specifically, how do I decompose the problem into simpler (i.e. simple shortest path) problems?
If it helps regarding computational feasibility, the number of observations is on the order of 10,000 and the number of possible phenomena on the order of 100,000.
An phenomenon that causes neither observation nor another phenomenon won't appear in any minimal answer, so we can assume there aren't any of them. In other words, pass one of any algorithm is get rid of these "useless" phenomena.
With that assumption, we treat observations just like any other vertex. Since observations cause nothing, all observations are leaf vertices. Since all phenomenon cause something (see step 1), no phenomenon is a leaf vertex. Thus we can simplify the problem statement and simply talk about the leaf vertices of a directed graph.
In general, there's going to be no single path that hits at least one branch vertex of each leaf. Instead, a better way of posing the problem is seek some kind of minimal graph that spans all the leaf vertices, but need not span the phenomena.
This a variation on the Steiner Tree Problem on graphs. It's NP-complete. Most variations are also NP-complete. The best hope you've got is something good enough, that is, an approximation algorithm.
You don't state this assumption explicitly, but it seems like you're assuming that there's no cyclic causation of phenomena (such as A causes B causes C causes A again). In this case your problem is on a directed acyclic graph, but that doesn't help. The directed problem is as hard as the undirected one.
This is a set-cover probem combined with Hamiltonian path problem. Let me explain: Since each phenomena is related to a group of observations, you can look at each phenomena as a set in the set-cover problem. We need to check each group of phenomena which together cover all observations, to see if a Hamiltonian path exists for this group, that is - there is a simple path which includes all the phenomena in the group.
One approach is to find the smallest set cover (=group of phenomena) and check if a Hamiltonian Path exists for this group. Then continue to the next (equal or larger) set cover, and do the same check, and so on until we find a set cover which has a hamiltonian path. This will be the smallest group of phenomena, which cover all observations, and that has a simple path going over all phenomena in the group.
Related
Can DP algorithm for Matrix Chain Multiplication be modeled as shortest path in DAG? I read somewhere that every DP problem is a walk on an implicit DAG but I am unable to visualize those problems in which a transition leads to more than one state ( or sub-state ).
One more example where I fail to visulize the same is UVA 10003. A DP solution of the above is discussed here: Cutting a stick such that cost is minimized.
Imagine that there is a directed edge between two states if we can go from the first state to the second one(of course, a state can consist of several parameters). There are no cycles in this graph, so it is DAG. So visualizing a DAG itself is not hard(you can just write down all states and edges between them). But is not necessary can modeled as a shortest path search. For example, in a problem about cutting a rope the value for a state is a sum of values for two other states, so it is not even a path. Anyway, it might impractical to visualize a solution if the number of parameters is very big. And there is no need to do any visualization to solve a problem and prove the correctness of your solution.
I have n points and I need to connect all of them minimizing the final distance. The image above represents an algorithm that in each node it connects to the nearest one but the final output might be really of.
I've been searching a lot, I know some pathfinding algos but unaware of one that solves exactly this case. I found a question on Math Stackexchange but the answer is not providing any algorithm - https://math.stackexchange.com/a/581844/156584.
Is there any algorithm that solves exactly this problem? Otherwise I can bruteforce it.
Edit: Some clarification regarding the result I'm expecting: each node can be connected to 2 other nodes, creating a continuous path (like taking a pen and without ever lifting it, connect the nodes minimizing the final distance). I don't want to create a cycle (that being the travelling salesman problem).
PS: this question can also be translated to "complete graph with n vertices, and wanting to choose the set of edges such that the graph is connected, but the sum of the edge weights is minimized"
This problem is known as the shortest Hamiltonian path problem and it is NP-hard. So if the number of points is small, you can use backtracking or dynamic programming to find an optimal solution. If the number of points is large, you can use heuristics and/or approximations to obtain a relatively good answer(it is not always possible to find the best one in this case, though).
For example, we have a graph consisting of vertices (cities) and edges (roads) and each edge(road) has a particular cost, find the minimal cost to visit all cities ATLEAST ONCE. Cost is the sum of the edge costs of the edges traversed.
The part "ATLEAST ONCE" caught me. In a TSP we can visit a node only once according to Wiki. Consider the graph,
A-B 11
A-C 5
B-C 2
B-E 4
C-E 3
C-D 20
D-E 100
In a TSP, The cyclic path would be A-B-E-D-C-A cost- 140 (or) A-C-D-E-B-A cost- 140. Where as from my problem description we can visit each vertex ATLEAST ONCE so we can have a cyclic path A-C-D-C-E-B-A cost- 63 which is << a TSP. This is where I had a problem. Any specific algorithm here? I'm pretty sure TSP wont work well here.
Pointers or pseudo code will be very helpful.
For each pair of nodes, you can apply the shortest path algorithm and calculate the shortest distance. This will be the new cost matrix for each pair.
Now it is reduced to Travelling Salesman Problem.
Then you can apply TSP solving technique.
Given that you are allowing a vertex to be visited multiple times, this effectively turns your incomplete graph into a complete graph (all vertices connected), which is what TSP requires. Solving your problem in the general case is exactly the same as solving the metric TSP. The good news is that this is a heavily researched topic. The bad news is that you aren't able to sidestep the TSP - since your problem is identical to a form of the TSP.
As pointed out by others, you complete the graph by computing the shortest cost between each pair of vertices and adding those edges where missing. You also need to replace any existing direct edge for which you've found a lower indirect path cost so that you have a Metric TSP. You can store with the new synthetic edges their actual paths (through intermediate vertices) so you can recover those for your final answer, or you can recompute those paths as needed upon receiving the result of the TSP.
Now you can solve this as a TSP. However, solving TSP optimally is too expensive in the general case, so you'll likely want to use an approximate solution algorithm. A variety of these (e.g. Christofides algorithm, Lin–Kernighan heuristic) are available which make differing tradeoffs between guaranteed levels of optimality and performance of the algorithm.
If you actually don't care about completing the cycle, and just want a minimum path that visits all vertices, starting and ending at any vertex, this is a somewhat different problem. For this, read my answer here: https://stackoverflow.com/a/33601043/5237297
i'm trying to make multiple agents move at the same time to a specified point on a 2d map and have an upper limit for the maximum distance one agent can move.
If possible, all agents should move the maximum distance, else less.
The paths of different agents shouldn't cross if possible, but if not, they can still cross.
My idea was some sort of adjusted A* algorithm.
Would this be a good approach or is there a better algorithm for this kind of problem?
(to be honest,i currently have A* and dijkstra on my radar as possiblities for solving this, so if there is anything better,a push in the right direction would be great)
Thanks for your help already
PS: i don't have any kind of underlying graph yet, so i'm still open to any idea, but can of course create a graph that works for dijkstra/A*
Your problem is close to vertex/edge disjoint path problem, which is NP-Complete in general, also your restricted version seems to be NP-Complete because shortest disjoint path in grid graph is NP-Hard, which is related to your restricted version. But there are lots of algorithms for disjoint paths in grid (even if you have different layers), so best option that I can suggest is use one of the exact algorithms, to find the vertex disjoint path, after that increase the size of paths (if is needed), by traversing some adjacent vertices.
Also for grid you don't need Dijkstra for finding path between two nodes (even shortest path or path with specific length), you can do it simply by running a BFS and is O(n) (start BFS from vertex v, and set the number of its adjacent to 1, and then for each adjacent of 1's set the new value to 2, ... see this answer and numbering algorithm part).
May be this question also helps if you looking for some heuristics in dynamic situation.
i´m obviously missing the forest through the trees ...
i know about the traveling salesman problem, but is there any other algorithm/problem which better fits my needs/description? I need to describe my problem with the help of such a mathematical description.
I have up to five points with known start- and endpoint. So i just need to calculate the shortest way to visit all the three points between that two. Dijkstra and similar algorithms try to find the shortest path between two points, so here they probably won´t visit all points between. Or is there a algorithm which finds shortest way and visit all points between two points?
You are overthinking it. There are only six (3*2*1) possible paths through the three intermediate nodes. Just check them all.
For larger instances, you could reduce your problem to the TSP as follows:
If s is the starting node and t is the final node, add a zero-weight edge between s and t and an infinitely-heavy edge between s and every other node, and between t and every other node.
The problem is NP-hard, but is extremely well-researched. There is a plethora of exact and approximate algorithms that you could explore.