Algorithm: find connections between towns with a limit of train changes - algorithm

What algorithm would you use to create an application that given appropriate data (list of cities, train routes, train stations) is capable of returning a list of connection between any two user-selected cities? The application has to choose only those connections that fall into the limit of accepted train-changes.
Example: I ask the application which train to take if I need to travel from Paris to Moscow with max. 1 stop/switch - the application returns a route: Train 1 (Paris-Berlin) -> Train 2 (Berlin->Moscow) (No direct connection exists).
Graphical example
http://i.imgur.com/KEJ3I.png
If I ask the system about possible connections from Town A to Town G I get a response:
Brown Line (0 switches = direct)
Brown Line to Town B / Orange Line to Town G (1 switch)
Brown Line to Town B / Orange Line to Town D / Red Line to G (2 switch)
... all other possibilities
And thouh the 2nd and 3rd options are shorter than the 1st, it's the 1st that should have priority (since no train-switching is involved).

Assuming the only thing important is "number of stops/switches", then the problem is actually finding a shortest path in an unweighted directed graph.
The graph model is G = (V,E) where V = {all possible stations} and E = { (u,v) | there is a train/route from station u to station v }
Note: let's say you have a train which starts at a_0, and paths through a_1, a_2,...a_n: then E will contain: (a_0,a_1),(a_0,a_2),..,(a_0,a_n) and also (a_1,a_2),(a_1,a_3),... formally: for each i < j : (a_i,a_j) &in; E.
BFS solves this problem, and is both complete [always finds a solution if there is one] and optimal [finds the shortest path].
If the edges [routes] are weighted, something like dijkstra's algorithm will be needed instead.
If you want a list of all possible routes, Iterative-Deepening DFS could be used, without maintaining a visited set, and print all the paths found to the target up to the relevant depth. [BFS fails to return all paths with the counter example of a clique]

I think you need to compute all pairs shortest paths. Check http://en.wikipedia.org/wiki/Floyd%E2%80%93Warshall_algorithm.

Related

Longest common path between k graphs

I was looking at interview problems and come across this one, failed to find a liable solution.
Actual question was asked on Leetcode discussion.
Given multiple school children and the paths they took from their school to their homes, find the longest most common path (paths are given in order of steps a child takes).
Example:
child1 : a -> g -> c -> b -> e
child2 : f -> g -> c -> b -> u
child3 : h -> g -> c -> b -> x
result = g -> c -> b
Note: There could be multiple children.The input was in the form of steps and childID. For example input looked like this:
(child1, a)
(child2, f)
(child1, g)
(child3, h)
(child1, c)
...
Some suggested longest common substring can work but it will not example -
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
1 and 2 will give abc, 3 and 4 give pfg
now ans will be none but ans is fg
it's like graph problem, how can we find longest common path between k graphs ?
You can construct a directed graph g with an edge a->b present if and only if it is present in all individual paths, then drop all nodes with degree zero.
The graph g will have have no cycles. If it did, the same cycle would be present in all individual paths, and a path has no cycles by definition.
In addition, all in-degrees and out-degrees will be zero or one. For example, if a node a had in-degree greater than one, there would be two edges representing two students arriving at a from two different nodes. Such edges cannot appear in g by construction.
The graph will look like a disconnected collection of paths. There may be multiple paths with maximum length, or there may be none (an empty path if you like).
In the Python code below, I find all common paths and return one with maximum length. I believe the whole procedure is linear in the number of input edges.
import networkx as nx
path_data = """1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g"""
paths = [line.split(" ")[1].split("-") for line in path_data.split("\n")]
num_paths = len(paths)
# graph h will include all input edges
# edge weight corresponds to the number of students
# traversing that edge
h = nx.DiGraph()
for path in paths:
for (i, j) in zip(path, path[1:]):
if h.has_edge(i, j):
h[i][j]["weight"] += 1
else:
h.add_edge(i, j, weight=1)
# graph g will only contain edges traversed by all students
g = nx.DiGraph()
g.add_edges_from((i, j) for i, j in h.edges if h[i][j]["weight"] == num_paths)
def longest_path(g):
# assumes g is a disjoint collection of paths
all_paths = list()
for node in g.nodes:
path = list()
if g.in_degree[node] == 0:
while True:
path.append(node)
try:
node = next(iter(g[node]))
except:
break
all_paths.append(path)
if not all_paths:
# handle the "empty path" case
return []
return max(all_paths, key=len)
print(longest_path(g))
# ['f', 'g']
Approach 1: With Graph construction
Consider this example:
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Draw a directed weighted graph.
I am a lazy person. So, I have not drawn the direction arrows but believe they are invisibly there. Edge weight is 1 if not marked on the arrow.
Give the length of longest chain with each edge in the chain having Maximum Edge Weight MEW.
MEW is 4, our answer is FG.
Say AB & BC had edge weight 4, then ABC should be the answer.
The below example, which is the case of MEW < #children, should output ABC.
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-h
4 m-x-o-p-f-i
If some kid is like me, the kid will keep roaming multiple places before reaching home. In such cases, you might see MEW > #children and the solution would become complicated. I hope all the children in our input are obedient and they go straight from school to home.
Approach 2: Without Graph construction
If luckily the problem mentions that the longest common piece of path should be present in the paths of all the children i.e. strictly MEW == #children then you can solve by easier way. Below picture should give you clue on what to do.
Take the below example
1 a-b-c-d-e-f-g
2 a-b-c-x-y-f-g
3 m-n-o-p-f-g
4 m-x-o-p-f-g
Method 1:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Get longest common graph for last two: p-f-g (Result 2)
Using Result 1 & 2 we get: f-g (Final Result)
Method 2:
Get longest common graph for first two: a-b-c, f-g (Result 1)
Take Result 1 and next graph i.e. m-n-o-p-f-g: f-g (Result 2)
Take Result 2 and next graph i.e. m-x-o-p-f-g: f-g (Final Result)
The beauty of the approach without graph construction is that even if kids roam same pieces of paths multiple times, we get the right solution.
If you go a step ahead, you could combine the approaches and use approach 1 as a sub-routine in approach 2.

Minimal car refills on a graph of cities

There's a graph of cities.
n - cities count
m - two-ways roads count
k - distance that car can go after refill
Road i connects cities pi and qi and has length of ri. Between the two cities can be only one road.
A man is going from city u to city v.
There are a gas stations in l cities a1, a2, ..., al.
A car starts with a full tank. If a man gets in a city with a gas station, he can refill (full tank) a car or ignore it.
Return value is a minimal count of refills to get from a city u to city v or -1 if it's impossible.
I tried to do it using Dijkstra algorithm, so I have minimal distance and path. But I have no idea how to get minimal refills count
It is slightly subtle, but the following pseudo-code will do it.
First do a breadth-first search from v to find the distance from every city to the target. This gives us a distance_remaining lookup with distance_remaining[city] being the shortest path (without regards to fillup stations).
To implement we first need a Visit data structure with information about visiting a city on a trip. What fields do we need?
city
fillups
range
last_visit
Next we need a priority queue (just like Dijkstraa) for possible visits to consider. This queue should prioritize visits by the shortest possible overall trip that we might be able to take. Which is to say visit.fillups * max_range + (max_range - visit.range) + distance_remaining[visit.city].
And finally we need a visited[city] data structure saying whether a city is visited. In Dijkstra we only consider a node if it was not yet visited. We need to tweak that to only consider a node if it was not yet visited or was visited with a range shorter than our current one (a car that arrived on full may finish even though the empty one failed).
And now we implement the following logic:
make visit {city: u, fillups: 0, range: max_range, last_visit: None}
add to priority queue the visit we just created
while v = queue.pop():
if v.city == u:
return v.fillups # We could actually find the path at this point!
else if v not in visited or visited[v.city] < v.range:
for each road r from v.city:
if r.length < v.range:
add to queue {city: r.other_city, fillups: v.fillups, range:v.range - r.length, last_visit: v}
if v.city has fillup station:
add to queue {city: v.city, fillups: fillups + 1, range: max_range, last_visit: v}
return -1

Removing unnecessary nodes in graph

I have a graph that has two distinct classes of nodes, class A nodes and class B nodes.
Class A nodes are not connected to any other A nodes and class B nodes aren’t connected to any other B nodes, but B nodes are connected to A nodes and vice versa. Some B nodes are connected to lots of A nodes and most A nodes are connected to lots of B nodes.
I want to eliminate as many of the A nodes as possible from the graph.
I must keep all of the B nodes, and they must still be connected to at least one A node (preferably only one A node).
I can eliminate an A node when it has no B nodes connected only to it. Are there any algorithms that could find an optimal, or at least close to optimal, solution for which A nodes I can remove?
Old, Incorrect Answer, But Start Here
First, you need to recognize that you have a bipartite graph. That is, you can colour the nodes red and blue such that no edges connect a red node to a red node or a blue node to a blue node.
Next, recognize that you're trying to solve a vertex cover problem. From Wikipedia:
In the mathematical discipline of graph theory, a vertex cover (sometimes node cover) of a graph is a set of vertices such that each edge of the graph is incident to at least one vertex of the set. The problem of finding a minimum vertex cover is a classical optimization problem in computer science and is a typical example of an NP-hard optimization problem that has an approximation algorithm.
Since you have a special graph, it's reasonable to think that maybe the NP-hard doesn't apply to you. This thought brings us to Kőnig's theorem which relates the maximum matching problem to the minimum vertex cover problem. Once you know this, you can apply the Hopcroft–Karp algorithm to solve the problem in O(|E|√|V|) time, though you'll probably need to jigger it a bit to ensure you keep all the B nodes.
New, Correct Answer
It turns out this jiggering is the creation of a "constrained bipartitate graph vertex cover problem", which asks us if there is a vertex cover that uses less than a A-nodes and less than b B-nodes. The problem is NP-complete, so that's a no go. The jiggering was hard than I thought!
But using less than the minimum number of nodes isn't the constraint we want. We want to ensure that the minimum number of A-nodes is used and the maximum number of B-nodes.
Kőnig's theorem, above, is a special case of the maximum flow problem. Thinking about the problem in terms of flows brings us pretty quickly to minimum-cost flow problems.
In these problems we're given a graph whose edges have specified capacities and unit costs of transport. The goal is to find the minimum cost needed to move a supply of a given quantity from an arbitrary set of source nodes to an arbitrary set of sink nodes.
It turns out your problem can be converted into a minimum-cost flow problem. To do so, let us generate a source node that connects to all the A nodes and a sink node that connects to all the B nodes.
Now, let us make the cost of using a Source->A edge equal to 1 and give all other edges a cost of zero. Further, let us make the capacity of the Source->A edges equal to infinity and the capacity of all other edges equal to 1.
This looks like the following:
The red edges have Cost=1, Capacity=Inf. The blue edges have Cost=0, Capacity=1.
Now, solving the minimum flow problem becomes equivalent to using as few red edges as possible. Any red edge that isn't used allocates 0 flow to its corresponding A node and that node can be removed from the graph. Conversely, each B node can only pass 1 unit of flow to the sink, so all B nodes must be preserved in order for the problem to be solved.
Since we've recast your problem into this standard form, we can leverage existing tools to get a solution; namely, Google's Operation Research Tools.
Doing so gives the following answer to the above graph:
The red edges are unused and the black edges are used. Note that if a red edge emerges from the source the A-node it connects to generates no black edges. Note also that each B-node has at least one in-coming black edge. This satisfies the constraints you posed.
We can now detect the A-nodes to be removed by looking for Source->A edges with zero usage.
Source Code
The source code necessary to generate the foregoing figures and associated solutions is as follows:
#!/usr/bin/env python3
#Documentation: https://developers.google.com/optimization/flow/mincostflow
#Install dependency: pip3 install ortools
from __future__ import print_function
from ortools.graph import pywrapgraph
import matplotlib.pyplot as plt
import networkx as nx
import random
import sys
def GenerateGraph(Acount,Bcount):
assert Acount>5
assert Bcount>5
G = nx.DiGraph() #Directed graph
source_node = Acount+Bcount
sink_node = source_node+1
for a in range(Acount):
for i in range(random.randint(0.2*Bcount,0.3*Bcount)): #Connect to 10-20% of the Bnodes
b = Acount+random.randint(0,Bcount-1) #In the half-open range [0,Bcount). Offset from A's indices
G.add_edge(source_node, a, capacity=99999, unit_cost=1, usage=1)
G.add_edge(a, b, capacity=1, unit_cost=0, usage=1)
G.add_edge(b, sink_node, capacity=1, unit_cost=0, usage=1)
G.node[a]['type'] = 'A'
G.node[b]['type'] = 'B'
G.node[source_node]['type'] = 'source'
G.node[sink_node]['type'] = 'sink'
G.node[source_node]['supply'] = Bcount
G.node[sink_node]['supply'] = -Bcount
return G
def VisualizeGraph(graph, color_type):
gcopy = graph.copy()
for p, d in graph.nodes(data=True):
if d['type']=='source':
source = p
if d['type']=='sink':
sink = p
Acount = len([1 for p,d in graph.nodes(data=True) if d['type']=='A'])
Bcount = len([1 for p,d in graph.nodes(data=True) if d['type']=='B'])
if color_type=='usage':
edge_color = ['black' if d['usage']>0 else 'red' for u,v,d in graph.edges(data=True)]
elif color_type=='unit_cost':
edge_color = ['red' if d['unit_cost']>0 else 'blue' for u,v,d in graph.edges(data=True)]
Ai = 0
Bi = 0
pos = dict()
for p,d in graph.nodes(data=True):
if d['type']=='source':
pos[p] = (0, Acount/2)
elif d['type']=='sink':
pos[p] = (3, Bcount/2)
elif d['type']=='A':
pos[p] = (1, Ai)
Ai += 1
elif d['type']=='B':
pos[p] = (2, Bi)
Bi += 1
nx.draw(graph, pos=pos, edge_color=edge_color, arrows=False)
plt.show()
def GenerateMinCostFlowProblemFromGraph(graph):
start_nodes = []
end_nodes = []
capacities = []
unit_costs = []
min_cost_flow = pywrapgraph.SimpleMinCostFlow()
for node,neighbor,data in graph.edges(data=True):
min_cost_flow.AddArcWithCapacityAndUnitCost(node, neighbor, data['capacity'], data['unit_cost'])
supply = len([1 for p,d in graph.nodes(data=True) if d['type']=='B'])
for p, d in graph.nodes(data=True):
if (d['type']=='source' or d['type']=='sink') and 'supply' in d:
min_cost_flow.SetNodeSupply(p, d['supply'])
return min_cost_flow
def ColorGraphEdgesByUsage(graph, min_cost_flow):
for i in range(min_cost_flow.NumArcs()):
graph[min_cost_flow.Tail(i)][min_cost_flow.Head(i)]['usage'] = min_cost_flow.Flow(i)
def main():
"""MinCostFlow simple interface example."""
# Define four parallel arrays: start_nodes, end_nodes, capacities, and unit costs
# between each pair. For instance, the arc from node 0 to node 1 has a
# capacity of 15 and a unit cost of 4.
Acount = 20
Bcount = 20
graph = GenerateGraph(Acount, Bcount)
VisualizeGraph(graph, 'unit_cost')
min_cost_flow = GenerateMinCostFlowProblemFromGraph(graph)
# Find the minimum cost flow between node 0 and node 4.
if min_cost_flow.Solve() != min_cost_flow.OPTIMAL:
print('Unable to find a solution! It is likely that one does not exist for this input.')
sys.exit(-1)
print('Minimum cost:', min_cost_flow.OptimalCost())
ColorGraphEdgesByUsage(graph, min_cost_flow)
VisualizeGraph(graph, 'usage')
if __name__ == '__main__':
main()
Despite this is an old question, I see it has not been correctly answered yet.
An analogous question to this one has also been answered earlier in this post.
The problem you are presenting here is indeed the Minimum Set Cover Problem, which is one of the well-known NP-hard problems. From the Wikipedia, the Minimum Set Cover Problem can be formulated as:
Given a set of elements {1,2,...,n} (called the universe) and a collection S of m sets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe. For example, consider the universe U={1,2,3,4,5} and the collection of sets S={{1,2,3},{2,4},{3,4},{4,5}}. Clearly the union of S is U. However, we can cover all of the elements with the following, smaller number of sets: {{1,2,3},{4,5}}.
In your formulation, B nodes represent the elements in the universe, A nodes represent the sets and edges between A nodes and B nodes determine which elements (B nodes) belong to each set (A node). Then, the minimum set cover is equivalent to the minimum number of A nodes so that they are connected to all B nodes. Consequently, the maximum number of A nodes which can be removed from the graph while being connected to every B node are those which do not belong to the minimum set cover.
Since it is NP-hard, there is no polinomial time algorithm for computing the optimum, but a simple greedy algorithm can efficiently provide approximate solutions with tight bounds to the optimum. From the Wikipedia:
There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements.

Maximum matching for assigning 2 items

There are N people at a party. Each one has some preferences of food and drinks. Given all the types of foods and drinks that a particular person prefers, find the maximum number of people that can be assigned a drink and a food of their choice.
A person may have several choices for both food and drinks, for example, a person may like Foods A,B,C and Drinks X,Y,Z. If we assign (A,Z) to the person, we consider the person to have been correctly assigned.
How do we solve this problem, considering that there are 2 constraints that we need to handle.
Let F be the set of all food there is, D be the set of all drink and P be the set of all people there is.
Build 2 bipartite graphs G and G' such that: for G: the first partite set is P and the second partite set is F, for G': the first partite set is P and the second partite set is D. Do maximal matching on both G and G' separately. Call M the maximum matching on G and M' the maximum matching on G'. M is a list of vertex-pair: (p1, f1), (p2,f2)... where pi and fi are people and food respectively. M' is also a list of vertex pair: (p1,d1), (p3,d3) ...
Now, merge M and M' by merging the pair with the same person: (p1,f1) + (p1,d1) = (p1,f1,d1) and that is the food-drink combo for p1. Say if p2 has a matching with f2 but p2 has no matching in G' (no drink), then ignore it.
A good algorithm for bipartite graph matching is Hopcroft-Karp algorithm. http://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm.

Seating people in a movie theater

This is based on an article I read about puzzles and interview questions asked by large software companies, but it has a twist...
General question:
What is an algorithm to seat people in a movie theater so that they sit directly beside their friends but not beside their enemies.
Technical question:
Given an N by M grid, fill the grid with N * M - 1 items. Each item has an association Boolean value for each of the other N * M - 2 items. In each row of N, items directly beside other items should have a positive association value for the other. Columns however do not matter, i.e. an item can be "enemies" with the item in front of it. Note: If item A has a positive association value for B, then that means B also has a positive association value for A. It works the same for negative association values. An item is guarenteed to have a positive association with atleast one other item. Also, you have access to all of the items and their association values before you start placing them in the grid.
Comments:
I have been researching this problem and thinking about it since yesterday, from what I have found it reminds me of the bin packing problem with some added requirements. In some free time I attempted to implement it, but large groups of "enemies" were sitting next to each other. I am sure that most situations will have to have atleast one pair of enemies sitting next to each other, but my solution was far from optimal. It actually looked as if I had just randomized it.
As far as my implementation went, I made N = 10, M = 10, the number of items = 99, and had an array of size 99 for EACH item that had a randomized Boolean value that referred to the friendship of the corresponding item number. This means that each item had a friendship value that corresponded with their self as well, I just ignored that value.
I plan on trying to reimplement this again later and I will post the code. Can anyone figure out a "good" way to do this to minimize seating clashes between enemies?
This problem is NP-Hard.
Define L={(G,n,m)|there is a legal seating for G in m×m matrix,(u,v) in E if u is friend of v} L is a formal definition of this problem as a language.
Proof:
We will show Hamiltonian-Problem ≤ (p) 2-path ≤ (p) This-problem in 2 steps [Hamiltonian and 2-path defined below], and thus we conclude this problem is NP-Hard.
(1) We will show that finding two paths covering all vertices without using any vertex twice is NP-Hard [let's call such a path: 2-path and this problem as 2-path problem]
A reduction from Hamiltonian Path problem:
input: a graph G=(V,E)
Output: a graph G'=(V',E) where V' = V U {u₀}.
Correctness:
if G has Hamiltonian Path: v₁→v₂→...→vn, then G' has 2-path:
v₁→v₂→...→vn,u₀
if G' has 2-path, since u₀ is isolated from the rest vertices, there is a
path: v₁→...→vn, which is Hamiltonian in G.
Thus: G has Hamiltonian path 1 ⇔ G' has 2-path, and thus: 2-path problem is NP-Hard.
(2)We will now show that our problem [L] is also NP-Hard:
We will show a reduction from the 2-path problem, defined above.
input: a graph G=(V,E)
output: (G,|V|+1,1) [a long row with |V|+1 sits].
Correctness:
If G has 2-path, then we can seat the people, and use the 1 sit gap to
use as a 'buffer' between the two paths, this will be a legal perfect seating
since if v₁ is sitting next to v₂, then v₁ v₁→v₂ is in the path, and thus
(v₁,v₂) is in E, so v₁,v₂ are friends.
If (G,|V|+1,1) is legal seat:[v₁,...,vk,buffer,vk+1,...,vn] , there is a 2-path in G,
v₁→...→vk, vk+1→...→vn
Conclusion: This problem is NP-Hard, so there is not known polynomial solution for it.
Exponential solution:
You might want to use backtracking solution: which is basically: create all subsets of E with size |V|-2 or less, check which is best.
static best <- infinity
least_enemies(G,used):
if |used| <= |V|-2:
val <- evaluate(used)
best <- min(best,val)
if |used| == |V|-2:
return
for each edge e in E-used: //E without used
least_enemies(G,used + e)
in here we assume evaluate(used) gives the 'score' for this solution. if this solution is completely illegal [i.e. a vertex appear twice], evaluate(used)=infinity. an optimization can of course be made, trimming these cases. to get the actual sitting we can store the currently best solution.
(*)There are probably better solutions, this is just a simple possible solution to start with, the main aim in this answer is proving this problem is NP-Hard.
EDIT: simpler solution:
Create a graph G'=(V U { u₀ } ,E U {(u₀,v),(v,u₀) | for each v in V}) [u₀ is a junk vertex for the buffer] and a weight function for edges:
w((u,v)) = 1 u is friend of v
w((u,v)) = 2 u is an enemy v
w((u0,v)) = w ((v,u0)) = 0
Now you got yourself a classic TSP, which can be solved in O(|V|^2 * 2^|V|) using dynamic programming.
Note that this solution [using TSP] is for one lined theatre, but it might be a good lead to find a solution for the general case.
One algorithm used for large "search spaces" such as this is simulated annealing

Resources