merging linear lists - reconstruct railway network - algorithm

I need to reconstruct the sequence of stations in a railway network from the sequences of single trips requested from a arbitrary station. There's no direction given in the data. But every request returns an terminal stop. The sequences of single trips can have gaps.
The (end-) result is always a linear list - forking is not allowed.
For example:
Result trips from requested station "4" :
4 - 3 - 2 - 1
4 - 1
4 - 5 - 6
4 - 8 - 9
4 - 6 - 7 - 8 - 9
manually reordered:
1 - 2 - 3 - 4
1 - 4
- 4 - 5 - 6
- 4 - 8 - 9
- 4 - 6 - 7 - 8 - 9
After merging result should be:
1 - 2 - 3 - 4 - 5 - 6 - 7 - 8 - 9
start/stop: 1, 9
Is there an algorithm to calculate the resulting "rope of pearls" list? I tried to figure it out with perls graph-module, but no luck. My books on algorithms doesn't help either.
I think, there are pathologic cases, where multiple solutions are possible, depending on input data.
Maybe someone has an idea to solve it!
As you see in the answers, there is more than one solution. So here's a real-world dataset:
2204236 -> 2200007 -> 2200001
2204236 -> 2203095 -> 2203976 -> 2200225 -> 2200007 -> 2200001
2204236 -> 2204805 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2202508 -> 2202110 -> 2202026 -> 3011047 -> 3011048 -> 3011049
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2202110 -> 2202352 -> 2202026
2204236 -> 2204813 -> 2204401 -> 2219633 -> 2204476 -> 2202024 -> 2202508 -> 2209637 -> 2202110
solution of the example data with perl:
use Graph::Directed;
use Graph::Traversal::DFS;
my $g = Graph::Directed->new;
$g->add_path(1,2,3,4);
$g->add_path(1,4);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
print "The graph is $g\n";
my #topo = $g->toposort;
print "g toposorted = #topo\n";
Output
> The graph is 1-2,1-4,2-3,3-4,4-5,4-6,4-8,5-6,6-7,7-8,8-9
> g toposorted = 1 2 3 4 5 6 7 8 9
Using the other direction
$g->add_path(4,3,2,1);
$g->add_path(4,1);
$g->add_path(4,5,6);
$g->add_path(4,8,9);
$g->add_path(4,6,7,8,9);
reveals the second solution
The graph is 2-1,3-2,4-1,4-3,4-5,4-6,4-8,5-6,6-7,7-8,8-9
g toposorted = 4 3 2 1 5 6 7 8 9

Treat the lists node links in a graph. 4-3-2-1 should mean 4 must come before 3, 3 before 2 and 2 before 1. So add arcs from 4 to 3, 3 to 2, 2 to 1.
Once you have all of those you run a topological sort(look it up on wikipedia) on the resulting graph. This will guarantee that the order you get will always respect the partial orderings you are given.
The only case when you are not going to find a solution is when the data is contradicting itself (if you have 4-3-2 and 4-2-3 there's no possible ordering).
You are right, there are multiple cases. Another good solution is 4-5-6-7-8-9-3-2-1, for your example.

Terminal stop station is articulation node and it splits graph into multiple partitions: all nodes inside partition are reachable from one another, nodes in different partitions are reachable only via known terminal stop station. Number of partitions is 2 in your example, but may be much larger, e.g. consider star-like structure 1 - 2, 1 - 3, 1 - 4, 1 - 5.
First of all you need to enumerate partitions. You treat your graph as undirected graph and run DFS from stop station in each of directions. At first run you discover partition #1, at second run partition #2 and so on.
Then you treat you graph as directed with stop station as root node for all partitions and run topological sorting (TS) for each of partitions.
Possible outcomes:
TS for one of partitions fails. This means there is no solution.
Number of partitions is one and TS for it succeeds. Solution is unique.
Number of partitions is more than one and TS succeeds for all of them. This means there are multiple solutions. To get any single valid result, you choose some partition and declare that it contains another terminal station. All other partitions are inserted into the first one in between arbitrary pair of nodes.

Related

How to implement least cost path through matrix in Haskell

Hello I have a particular question I cant find any resources on for Haskell. I'm looking to create a function that takes a mmatrix in as a parameter and returns an array for haskell. something like:
returnPossiblePaths :: [[Int]] -> [Int]
The condition though, is that I return the the array with the 'least cost path' or the path that has the lowest sum. So if I have the matrix:
[6 9 3
2 5 7]
I want to iterate from the head to the tail, add the numbers up in that path and return the array with the smallest sum. e.g:
6 -> 9 -> 3 -> 7 = 25
6 -> 9 -> 5 -> 7 = 27
6 -> 2 -> 5 -> 7 = 20
6 -> 2 -> 5 -> 9 -> 3 -> 7 = 32
So here my result array would be: [6, 2, 5, 7]. I need help on how to go about doing this. I have no idea how I would go about iterating from head to tail in different 'paths' without going through all the elements. My general plan was to get all the paths into arrays, map sum to al of them then compare the results and return the array with the smallest sum. So I would first get all the arrays (paths) from the matrix then apply this function to them:
addm::[Int]->Int
addm (x:xs) = sum(x:xs)
store those values in a variable, compare them then return the lowest one. I know haskell has amazing functions that make this way easier and I was wondering if I could get help on how to go about doing this. Any advice is greatly appreciated, thanks!

Find the number of substrings in a string containing equal numbers of a, b, c

I'm trying to solve this problem. Now, I was able to get a recursive solution:
If DP[n] gives the number of beautiful substrings (defined in problem) ending at the nth character of the string, then to find DP[n+1], we scan the input string backward from the (n+1)th character until we find an ith character such that the substring beginning at the ith character and ending at the (n+1)th character is beautiful. If no such i can be found, DP[n+1] = 0.
If such a string is found then, DP[n+1] = 1 + DP[i-1].
The trouble is, this solution gives a timeout on one testcase. I suspect it is the scanning backward part that is problematic. The overall time complexity for my solution seems to be O(N^2). The size of the input data seems to indicate that the problem expects an O(NlogN) solution.
You don't really need dynamic programming for this; you can do it by iterating over the string once and, after each character, storing the state (the relative number of a's, b's and c's that were encountered so far) in a dictionary. This dictionary has maximum size N+1, so the overall time complexity is O(N).
If you find that at a certain point in the string there are e.g. 5 more a's than b's and 7 more c's than b's, and you find the same situation at another point in the string, then you know that the substring between those two points contains an equal number of a's, b's and c's.
Let's walk through an example with the input "dabdacbdcd":
a,b,c
-> 0,0,0
d -> 0,0,0
a -> 1,0,0
b -> 1,1,0
d -> 1,1,0
a -> 2,1,0
c -> 2,1,1 -> 1,0,0
b -> 1,1,0
d -> 1,1,0
c -> 1,1,1 -> 0,0,0
d -> 0,0,0
Because we're only interested in the difference between the number of a's, b'a and c's, not the actual number, we reduce a state like 2,1,1 to 1,0,0 by subtracting the lowest number from all three numbers.
We end up with a dictionary of these states, and the number of times they occur:
0,0,0 -> 4
1,0,0 -> 2
1,1,0 -> 4
2,1,0 -> 1
States which occur only once don't indicate an abc-equal substring, so we can discard them; we're then left with these repetitions of states:
4, 2, 4
If a state occurs twice, there is 1 abc-equal substring between those two locations. If a state occurs 4 times, there are 6 abc-equal substrings between them; e.g. the state 1,1,0 occurs at these points:
dab|d|acb|d|cd
Every substring between 2 of those 4 points is abc-equal:
d, dacb, dacbd, acb, acbd, d
In general, if a state occurs n times, it represents 1 + 2 + 3 + ... + n-1 abc-equal substrings (or easier to calculate: n-1 × n/2). If we calculate this for every count in the dictionary, the total is our solution:
4 -> 3 x 2 = 6
2 -> 1 x 1 = 1
4 -> 3 x 2 = 6
--
13
Let's check the result by finding what those 13 substrings are:
1 d---------
2 dabdacbdc-
3 dabdacbdcd
4 -abdacbdc-
5 -abdacbdcd
6 --bdac----
7 ---d------
8 ---dacb---
9 ---dacbd--
10 ----acb---
11 ----acbd--
12 -------d--
13 ---------d

Minimum Hamiltonian path length using brute force approach

Assumption At least one Hamiltonian path exists in the graph. I am trying to find minimum path length among all Hamiltonian paths.
My approach
Let us say we have three nodes.
Possible paths are
1 -> 2 -> 3
1 -> 3 -> 2
2 -> 1 -> 3
2 -> 3 -> 1
3 -> 1 -> 2
3 -> 2 -> 1
Find path length of all tracks and take minimum among them. Time complexity of this approach will be O(N*(N!)) where N = #nodes
I am getting the wrong answer with this approach. Is the above approach correct? Please help.

Bellman-Ford algorithm proof of correctness

I'm trying to learn about Bellman-Ford algorithm but I'm stucked with the proof of the correctness.
I have used Wikipedia, but I simply can't understand the proof. I did not find anything on Youtube that's helpfull.
Hope anyone of you can explain it briefly. This page "Bellman-ford correctness can we do better" does not answer my question.
Thank you.
Let's see the problem from the perspective of dynamic programming of a graph with no negative cycle.
We can visualize the memoization table of the dynamic programming as follows:
The columns represent nodes and the rows represent update steps(node 0 is the source node), and the arrows directing from one box in a step to another in the next step are the min updates(step 0 is the initialization).
We choose one path from all shortest paths and illustrate why it is correct. Let's choose the 0 -> 3 -> 2 -> 4 -> 5. It is the shortest path from 0 to 5, we can choose any other one otherwise. We can prove the correctness by reduction. The initial is the source 0, and obviously, the distance between 0 and itself should be 0, the shortest. And we assume 0 -> 3 -> 2 is the shortest path between 0 and 2, and we are going to prove that 0 -> 3 -> 2 -> 4 is the shortest path between 0 and 4 after the third iteration.
First, we prove that after the third iteration the node 4 must be fixed/tightened. If node 4 is not fixed it means that there is at least one path other than 0 -> 3 -> 2 -> 4 that can reach 4 and that path should be shorter than 0 -> 3 -> 2 -> 4, which contradicts our assumption that 0 -> 3 -> 2 -> 4 -> 5 is the shortest path between 0 and 5. Then after the third iteration, 2 and 4 should be connected.
Second, we prove that that relaxation should be the shortest. It cannot be greater and smaller because it is the only shortest path.
Let's see a graph with a negative cycle.
And here is its memoization table:
Let's prove that at |V|'s iteration, here |V| is the number of vertices 6, the update should not be stopped.
We assume that the update stopped(and there is a negative cycle). Let's see the cycle 3 -> 2 -> 4 -> 5 -> 3.
dist(2) <= dist(3) + w(3, 2)
dist(4) <= dist(2) + w(2, 4)
dist(5) <= dist(4) + w(4, 5)
dist(3) <= dist(5) + w(5, 3)
And we can obtain the following inequlity from the above four inequalities by summing up the left-hand side and the right-hand side:
dist(2) + dist(4) + dist(5) + dist(3) <= dist(3) + dist(2) + dist(4) + dist(5) + w(3, 2) + w(2, 4) + w(4, 5) + w(5, 3)
We subtract the distances from both sides and obtain that:
w(3, 2) + w(2, 4) + w(4, 5) + w(5, 3) >= 0, which contradicts our claim that 3 -> 2 -> 4 -> 5 -> 3 is a negative cycle.
So we are certain that at |V|'s step and after that step the update would never stop.
My code is here on Gist.
Reference:
dynamic programming - bellman-ford algorithm
Lecture 14: Bellman-Ford Algorithm

Finding the root value of a binary tree?

I have an array which stores the relations of values, which makes several trees something like:
So, in this case, my array would be (root, linked to)
(8,3)
(8,10)
(3,1)
(3,6)
(6,4)
(6,7)
(10,14)
(14,13)
And i'd like to set all the root values in the array to the main root in the tree (in all trees):
(8,3)
(8,1)
(8,6)
(8,4)
(8,7)
(8,10)
(8,14)
(8,13)
What algorithm should i investigate?
1) Make a list of all the unique first elements of the tuples.
2) Remove any that also appear as the second element of a tuple.
3) You'll be left with the root (8 here). Replace the first elements of all tuples with this value.
EDIT:
A more complicated approach that will work with multiple trees would be as follows.
First, convert to a parent lookup table:
1 -> 3
3 -> 8
4 -> 6
6 -> 3
7 -> 6
10 -> 8
13 -> 14
14 -> 10
Next, run "find parent with path compression" on each element:
1)
1 -> 3 -> 8
gives
1 -> 8
3 -> 8
4 -> 6
...
3)
3 -> 8
4)
4 -> 6 -> 3 -> 8
gives
1 -> 8
3 -> 8
4 -> 8
6 -> 8
7 -> 6
...
6)
6 -> 8 (already done)
7)
7 -> 6 -> 8
etc.
Result:
1 -> 8
3 -> 8
4 -> 8
6 -> 8
7 -> 8
...
Then convert this back to the tuple list:
(8,1)(8,3)(8,4)...
The find parent with path compression algorithm is as find_set would be for disjoint set forests, e.g.
int find_set(int x) const
{
Element& element = get_element(x);
int& parent = element.m_parent;
if(parent != x)
{
parent = find_set(parent);
}
return parent;
}
The key point is that path compression helps you avoid a lot of work. In the above, for example, when you do the lookup for 4, you store 6 -> 8, which makes later lookups referencing 6 faster.
So assume you have a list of tuples representing the points:
def find_root(ls):
child, parent, root = [], [], []
for node in ls:
parent.append(node[0])
child.append(node[1])
for dis in parent:
if (!child.count(dis)):
root.append(dis)
if len(root) > 1 : return -1 # failure, the tree is not formed well
for nodeIndex in xrange(len(ls)):
ls[nodeIndex] = (root[0], ls[nodeIndex][1])
return ls

Resources