Minimum sum of multiple source-destination path - algorithm

Given an undirected and unweighted graph, I want to find out the minimum sum of k pair of source-destination path, under the constraint that each vertex is used at most once.
For example, the following graph with 2 pair source-destination (A, E) and (B, F) has minimum sum of 7 steps. That is, (A > G > H > I > E) = 4 steps and (B > C > D > F) = 3 steps.
http://i.imgur.com/W4uNiac.png
It is obviously that greedy method of finding all these k path sequentially will result in suboptimal solution. For example, there exists a solution of 8 steps with (A > C > D > E) = 3 steps and (B > J > K > L > M > F) = 5 steps.
I have consider to model this problem into minimum cost maximum flow problem. However, such multiple source-destination pair cannot always be distinguished. For example, if k = 2 and the two pairs are (A, B) and (C, D), the solution applying MCMF may result in (A > ... > D) and (B > ... > C) under some situation, which is apparently not the solution I want.
Is there any idea about this problem?
Thanks in advance! :)

This looks like a modified multi-agent path planning problem. Normally agents are just restricted not to collide or cross paths, but in your version they cannot share vertices anywhere. It wouldn't be hard to modify the CBS algorithm to solve this problem.
I'm going to call your k paths agents to help describe a solution.
The basic idea is that you use a binary search tree to find the solution. Start by finding the shortest path for each of the k agents independently. Then, you check to see if you have any collisions between agents in the solved paths. When you find a node shared by two paths, you branch in the tree. On the right side of the tree the first agent is forbidden from using that node. On the left side of the tree the second agent is forbidden from using the node. Then, the agent that has been given a new constraint is required to re-search without using the given node.
Nodes in the tree are expanded in a best-first order according to the total path cost of all agents. When you find a solution, it is guaranteed to be optimal.
There are other algorithms you could use, but this one is pretty simple as it keeps the planning decoupled. The efficiency depends on the exact nature of the problems you are solving.
If I had to guess, the problem is NP-Hard or NP-Complete - easy to verify solutions, but the given algorithm runs in exponential time.

Related

Dynamic Programming / Subproblems + Transition

I am kind of stuck, I decided to try this problem https://icpcarchive.ecs.baylor.edu/external/71/7113.pdf
to prevent it 404'ing here's the basic assignment
a hopper only visits arrays with integer entries,
• a hopper always explores a sequence of array elements using the following rules:
– a hopper cannot jump too far, that is, the next element is always at most D indices away
(how far a hopper can jump depends on the length of its legs),
– a hopper doesn't like big changes in values — the next element differs from the current
element by at most M, more precisely the absolute value of the difference is at most M (how
big a change in values a hopper can handle depends on the length of its arms), and
– a hopper never visits the same element twice.
• a hopper will explore the array with the longest exploration sequence.
n is the length of the array (as described above, D is the maximum length of a jump the
hopper can make, and M is the maximum difference in values a hopper can handle). The next line
contains n integers — the entries of the array. We have 1 ≤ D ≤ 7, 1 ≤ M ≤ 10, 000, 1 ≤ n ≤ 10, 000
and the integers in the array are between -1,000,000 and 1,000,000.
EDIT: I am doing this out of pure curiosity this is not a assignment I need to do for any particular reason other than challenging myself
basically its building a sparse graph out of an array,
the graph is undirected and due to the symmetry of the -d ... d jumps, its also either a complete graph (all edges are included) or mutually disjoint graph components
As first step I tried to simply exhaustive DFS search the graph, which works but has the infamous O(n!) runtime, the first iteration of this was written in F# which was horrible slow the second in C which still plateaus pretty fast too
so I know the longest path problem is NP hard but I thought I would give it a try with dynamic programming
The next approach was to simply use the common DP solution (bitmasked path) to DFS on the graph but at this at this point I already traversed the array and built the entire graph which may contain up to 1000 nodes so its not feasible
My next approach was to build a DFS Tree (tree of all the paths) which is a bit faster but then needs to store all entire path in memory for each iteration already which isn't what I really want, I am thinking I can reduce it to substates while already traversing the array
next I tried to memoize all paths I've already walked by simply using a bitmask and a simple memoization functions as seen here:
let xf = memoizedEdges (fun r i' p mask ->
let mask' = (addBit i' mask)
let nbs = [-d .. -1] # [ 1 .. d]
|> Seq.map (fun f -> match f with
| x when (i' + x) < 0 -> None
| x when (i' + x) >= a.Length -> None
| x when (diff a.[i'+x] a.[i']) > m -> None
| x when (i' + x) = i -> None
| x when (isSet (i'+x) mask') -> None
| x -> Some (i' + x )
)
let ec = nbs
|> Seq.choose id
|> Seq.toList
|> List.map (fun f ->
r f i' mask'
)
max (bitcount mask) (ec |> mxOrZero)
)
So memoized edges works by 3 int parameters the current index (i'), the previous (p) and the path as bitmask, the momizedEdges function itself will check on each recursive call it if has seen i' and p and the mask ... or p and i' and the mask with the i' and p bits flipped to mask the path in the other way (basically if we have seen this path coming from the other side already)
this works as I would expect, but the assignment states its up to 1000 indices which would cause the int32 mask to be too short
so I've been thinking for days now and there must be a way to encode each of the -d ... d steps into a start and end vertice and calculate the path for each step in that window based on the previous steps
I've come up with basically this
0.) Create a container to hold starting and endvertex as key with the current pathlength as value
1.) Check neighbors of i
2.) Have I seen either this combination either as (from -> to) or (to -> from) then I do not add or increase
3.) Check whatever any other predecessors to this node exist and increase the path of those by 1
but this would lead to having all paths stored and I would basically result in tuples and then I am back at my graph with DFS in another form
I am very thankful for any pointers (I just need some new ideas I am really stuck rn) how I could encode each subproblem from -d..d that I can use just intermediate results for calculating the next step (if this is even possible)
Partial answer
This is a difficult problem. Indeed, on competitive programming problem compendium Kattis it is (at the time of writing) in the top 5 of most difficult problems.
Only you know if this sort of problem is possible for you to solve, but there is a fair chance no one on this site can help you completely, hence this partial answer.
Longest path
What we're asked to do here is solve the longest path problem for a particular graph. This problem is known to be NP-complete in general, even for undirected unweighted graphs as ours is. Because the graph can have 1000 vertices, a (sub-)exponential algorithm in N will not work, and we're likely not asked to prove that P=NP, so the only option we have left is to somehow exploit the structure of the graph.
The most promising avenue is through D. D is at most 7, because of which the maximum degree of the graph is at most 14, and all edges are—in a sense—local.
Now, according to Wikipedia, the longest path problem can be solved polynomially on various classes of graphs, such as noncyclic ones. Our graph is of course not noncyclic, but unfortunately this is largely where my knowledge ends. I am not sufficiently familiar with graph theory to see whether the implied graph of the problem is in any of the classes Wikipedia mentions.
Of particular note is that the longest path problem can be solved in polynomial time given bounded-by-a-constant clique-width (or tree-width, which implies the former). I am unable to confirm or prove that our graph has bounded clique-width because of the bound on D, but perhaps you yourself know more about this, or you could try asking on the math or CS stackexchange, as at this point we're pretty far from any actual programming.
Regardless, if you're able to confirm that the graph is clique-width-bounded, this paper may help you further.
I hope this answer is of some use despite not being entirely fulfilling, and good luck!
Citation for the paper in case of link decay
Fomin, F. V., Golovach, P. A., Lokshtanov, D., & Saurabh, S. (2009, January). Clique-width: on the price of generality. In Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms (pp. 825-834). Society for Industrial and Applied Mathematics.

is this the best complexity I can get?

The problem goes as follows:
you have n domino pieces and the two numbers on the every domino piece (n pieces), also there is an extra set of m domino pieces but you can use only one piece from this set (at most) to help you do the following:
calculate the minimum number of domino pieces that you can use to go from a given starting point S to an ending point D.
meaning that the starting piece should have the number (S) and the ending piece should have the number (D).
Input:
n and the domino pieces' numbers (n pairs).
m and the extra domino pieces' numbers (m pairs).
starting point S and a destination D.
Output:
the minimum number of domino pieces.
I am thinking of using BFS for this problem where I can start from S and find the minimum path to D with constantly removing node m(i) from the graph and adding node m(i+1)
but doing this the time complexity will be O(n * m).
but not only this, there could be multiple starting points S so the complexity would be O(|S| * n * m).
can it be solved in a better way?
The teacher said it could be solved in a Linear Time but I am just very confused.
I initially missed that your question allows multiple sources, and wrote a somewhat long answer explaining how to approach that problem. Let me post it here anyway, because it might still be helpful. Scroll further for the solution to the original question.
Finding shortest paths from single S to D in linear time
Let's build the idea incrementally. First, let's try to solve a simpler version of the problem, where we just need to find out whether you can get from a single S to a single D at all by using at most one domino from the set of extra M dominoes.
I suggest to approach it this way: do some preprocessing on the N dominoes that will let you, for each of the M additional dominoes, quickly (in constant time) answer whether there exists a path from S to D that goes through this domino. (And of course we need to remember the edge case when we don't need an extra domino at all, but it's easy to cover in linear time.)
What kind of information lets you answer this question? Let's say you are looking at a domino with numbers A and B on its ends. If you knew that you can get from S to A, and from B to D, you use this domino to get from S to D, right? Alternatively, if there was a path from S to B and from A to D, it would also work. If neither is true, then there is no way this domino can help you to get from S to D.
That's great, but if we run BFS from every possible B, we won't achieve linear time complexity. However, note that you can reverse the second problem (detecting if a path from B's to D exists) and pose it as "can I get from D to every possible B"? That is easily answered with a single BFS.
Can you see how this approach can be adapted to find length of the shortest path through each domino, as opposed to just detecting if a path exists?
Finding shortest paths from multiple S to D in linear time
Let's reverse the problem and say we want to find the shortest paths from D to multiple S. You could create a directed graph with twice as many nodes as there were unique numbers written on dominoes. That is, for each number there are nodes V and V', and if you are in V, it means you haven't used an extra domino yet, but if you are in V', it means you already used one. Each core (that is, one of the original N) domino (A, B) corresponds to 4 edges in this graph: (A -> B), (B -> A), (A' -> B'), (B' -> A'). Each extra domino corresponds to 2 edges: (A -> B'), (B -> A'). Note that once we get into a node with ', we can never get out of it, so we will only use at most one extra domino this way. A single BFS from D in this graph will answer the problem.

Queries on Tree Path with Modifications

Question:
You are given a Tree with n Nodes(can be upto 10^5) and n-1 bidirectional edges. And lets say each node contains two values:
It's index(Just a unique number for node), lets say it will be from 1 to n.
And It's value Vi, which can vary from 1 to 10^8
Now there will be multiple same type of queries(number of queries can be upto 10^5) on this same tree, as follows:
You are given node1, node2 and a value P(can vary from 1 to 10^8).
And for every this type of query, you just have to find number of nodes in path from node1 to node2 whose value is less than P.
NOTE: There will be unique path between all the nodes and no two edges belong to same pair of nodes.
Required Time Complexity O(nLog(n)) or can be in other terms but should be solvable in 1 Sec with given constraints.
What I have Tried:
(A). I could solve it easily if value of P would be fixed, using LCA approach in O(nLog(n)) by storing following info at each node:
Number of nodes whose value is less than P, from root to given node.
But here P is varying way too much so this will not help.
(B). Other approach I was thinking is, using simple DFS. But that will also take O(nq), where q is number of queries. Again as n and q both are varying between 1 to 10^5 so this will not help too in given time constraint.
I could not think anything else. Any help would be appreciated. :)
Source:
I read this problem somewhere on SPOJ I guess. But cannot find it now. Tried searching it on Web but could not find solution for it anywhere (Codeforces, CodeChef, SPOJ, StackOverflow).
Let ans(v, P) be the answer on a vertical path from the root to v and the given value of P.
How can we compute it? There's a simple offline solution: we can store all queries for the given node in a vector associated with it, run the depth-first search keep all values on the current path from the path in data structure that can do the following:
add a value
delete a value
count the number elements smaller than X
Any balanced binary-search tree would do. You can make it even simpler: if you know all the queries beforehand, you can compress the numbers so that they're in the [0..n - 1] range and use a binary index tree.
Back to the original problem: the answer to a (u, v, P) query is clearly ans(v, P) + ans(u, P) - 2 * ans(LCA(u, v), P).
That's it. The time complexity is O((N + Q) log N).

Is dynamic programming backtracking with cache

I've always wondered about this. And no books state this explicitly.
Backtracking is exploring all possibilities until we figure out one possibility cannot lead us to a possible solution, in that case we drop it.
Dynamic programming as I understand it is characterized by overlapping sub-problems. So, can dynamic programming can be stated as backtracking with cache (for previously explored paths) ?
Thanks
This is one face of dynamic programming, but there's more to it.
For a trivial example, take Fibonacci numbers:
F (n) =
n = 0: 0
n = 1: 1
else: F (n - 2) + F (n - 1)
We can call the above code "backtracking" or "recursion".
Let us transform it into "backtracking with cache" or "recursion with memoization":
F (n) =
n in Fcache: Fcache[n]
n = 0: 0, and cache it as Fcache[0]
n = 1: 1, and cache it as Fcache[1]
else: F (n - 2) + F (n - 1), and cache it as Fcache[n]
Still, there is more to it.
If a problem can be solved by dynamic programming, there is a directed acyclic graph of states and dependencies between them.
There is a state that interests us.
There are also base states for which we know the answer right away.
We can traverse that graph from the vertex that interests us to all its dependencies, from them to all their dependencies in turn, etc., stopping to branch further at the base states.
This can be done via recursion.
A directed acyclic graph can be viewed as a partial order on vertices. We can topologically sort that graph and visit the vertices in sorted order.
Additionally, you can find some simple total order which is consistent with your partial order.
Also note that we can often observe some structure on states.
For example, the states can be often expressed as integers or tuples of integers.
So, instead of using generic caching techniques (e.g., associative arrays to store state->value pairs), we may be able to preallocate a regular array which is easier and faster to use.
Back to our Fibonacci example, the partial order relation is just that state n >= 2 depends on states n - 1 and n - 2.
The base states are n = 0 and n = 1.
A simple total order consistent with this order relation is the natural order: 0, 1, 2, ....
Here is what we start with:
Preallocate array F with indices 0 to n, inclusive
F[0] = 0
F[1] = 1
Fine, we have the order in which to visit the states.
Now, what's a "visit"?
There are again two possibilities:
(1) "Backward DP": When we visit a state u, we look at all its dependencies v and calculate the answer for that state u:
for u = 2, 3, ..., n:
F[u] = F[u - 1] + F[u - 2]
(2) "Forward DP": When we visit a state u, we look at all states v that depend on it and account for u in each of these states v:
for u = 1, 2, 3, ..., n - 1:
add F[u] to F[u + 1]
add F[u] to F[u + 2]
Note that in the former case, we still use the formula for Fibonacci numbers directly.
However, in the latter case, the imperative code cannot be readily expressed by a mathematical formula.
Still, in some problems, the "forward DP" approach is more intuitive (no good example for now; anyone willing to contribute it?).
One more use of dynamic programming which is hard to express as backtracking is the following: Dijkstra's algorithm can be considered DP, too.
In the algorithm, we construct the optimal paths tree by adding vertices to it.
When we add a vertex, we use the fact that the whole path to it - except the very last edge in the path - is already known to be optimal.
So, we actually use an optimal solution to a subproblem - which is exactly the thing we do in DP.
Still, the order in which we add vertices to the tree is not known in advance.
No. Or rather sort of.
In backtracking, you go down and then back up each path. However, dynamic programming works bottom-up, so you only get the going-back-up part not the original going-down part. Furthermore, the order in dynamic programming is more breadth first, whereas backtracking is usually depth first.
On the other hand, memoization (dynamic programming's very close cousin) does very often work as backtracking with a cache, as you describede.
Yes and no.
Dynamic Programming is basically an efficient way to implement a recursive formula, and top-down DP is many times actually done with recursion + cache:
def f(x):
if x is in cache:
return cache[x]
else:
res <- .. do something with f(x-k)
cahce[x] <- res
return res
Note that bottom-up DP is implemented completely different however - but still pretty much follows the basic principles of the recursive approach, and at each step 'calculates' the recursive formula on the smaller (already known) sub-problems.
However, in order to be able to use DP - you need to have some characteristics for the problem, mainly - an optimal solution to the problem consists of optimal solutions to its sub-problems. An example where it holds is shortest-path problem (An optimal path from s to t that goes through u must consist of an optimal path from s to u).
It does not exist on some other problems such as Vertex-Cover or Boolean satisfiability Problem , and thus you cannot replace the backtracking solution for it with DP.
No. What you call backtracking with cache is basically memoization.
In dynamic programming, you go bottom-up. That is, you start from a place where you don't need any subproblems. In particular, when you need to calculate the nth step, all the n-1 steps are already calculated.
This is not the case for memoization. Here, you start off from the kth step (the step you want) and go on solving the previous steps wherever required. And obviously keep these values stored somewhere so that you may access these later.
All these being said, there are no differences in running time in case of memoization and dynamic programming.

Checking if A is a part of binary tree B

Let's say I have binary trees A and B and I want to know if A is a "part" of B. I am not only talking about subtrees. What I want to know is if B has all the nodes and edges that A does.
My thoughts were that since tree is essentially a graph, and I could view this question as a subgraph isomorphism problem (i.e. checking to see if A is a subgraph of B). But according to wikipedia this is an NP-complete problem.
http://en.wikipedia.org/wiki/Subgraph_isomorphism_problem
I know that you can check if A is a subtree of B or not with O(n) algorithms (e.g. using preorder and inorder traversals to flatten the trees to strings and checking for substrings). I was trying to modify this a little to see if I can also test for just "parts" as well, but to no avail. This is where I'm stuck.
Are there any other ways to view this problem other than using subgraph isomorphism? I'm thinking there must be faster methods since binary trees are much more restricted and simpler versions of graphs.
Thanks in advance!
EDIT: I realized that the worst case for even a brute force method for my question would only take O(m * n), which is polynomial. So I guess this isn't a NP-complete problem after all. Then my next question is, is there an algorithm that is faster than O(m*n)?
I would approach this problem in two steps:
Find the root of A in B (either BFS of DFS)
Verify that A is contained in B (giving that starting node), using a recursive algorithm, as below (I concocted same crazy pseudo-language, because you didn't specify the language. I think this should be understandable, no matter your background). Note that a is a node from A (initially the root) and b is a node from B (initially the node found in step 1)
function checkTrees(node a, node b) returns boolean
if a does not exist or b does not exist then
// base of the recursion
return false
else if a is different from b then
// compare the current nodes
return false
else
// check the children of a
boolean leftFound = true
boolean rightFound = true
if a.left exists then
// try to match the left child of a with
// every possible neighbor of b
leftFound = checkTrees(a.left, b.left)
or checkTrees(a.left, b.right)
or checkTrees(a.left, b.parent)
if a.right exists then
// try to match the right child of a with
// every possible neighbor of b
leftFound = checkTrees(a.right, b.left)
or checkTrees(a.right, b.right)
or checkTrees(a.right, b.parent)
return leftFound and rightFound
About the running time: let m be the number of nodes in A and n be the number of nodes in B. The search in the first step takes O(n) time. The running time of the second step depends on one crucial assumption I made, but that might be wrong: I assumed that every node of A is equal to at most one node of B. If that is the case, the running time of the second step is O(m) (because you can never search too far in the wrong direction). So the total running time would be O(m + n).
While writing down my assumption, I start to wonder whether that's not oversimplifying your case...
you could compare the trees in bottom-up as follows:
for each leaf in tree A, identify the corresponding node in tree B.
start a parallel traversal towards the root in both trees from the nodes just matched.
specifically, move to the parent of a node in A and subsequently move towards the root in B until you either encounter the corresponding node in B (proceed) or a marked node in A (see below, if a match in B is found proceed, else fail) or the root of B (fail)
mark all nodes visited in A.
you succeed, if you haven't failed ;-).
the main part of the algorithm runs in O(e_B) - in the worst case, all edges in B are visited a constant number of times. the leaf node matching will run in O(n_A * log n_B) if there the B vertices are sorted, O(n_A * log n_A + n_B * log n_B + n) = O(n_B * log n_B) (sort each node set, lienarly scan the results thereafter) otherwise.
EDIT:
re-reading your question, abovementioned step 2 is even easier, as for matching nodes in A, B, their parents must match too (otheriwse there would be a mismatch between the edge sets). no effect on worst-case run time, of course.

Resources