Determine parent, children from pairwise data - algorithm

I need to determine parent/child relationships from some unusual data.
Flight numbers are marketing creations and they are odd. Flight # 22 by Airline X may refer to a singular trip between X and Y. Flight # 44 from the same airline may actually refer to multiple flights between city pairs. Example:
Flight 44: Dallas - Paris
Flight 44: Dallas - Chicago
Flight 44: Chicago - New York
Flight 44: New York - Paris
Flight 44: Chicago - Paris
Flight 44: Dallas - New York
Reality -- this is the way they work. When I pull the data from the "big list of flight numbers and city pairs" I get those 6 combinations for flight 44. I have passenger counts for each, so if there are 10 people flying Dallas - Paris, I need to take those 10 passengers and add them to the DAL - CHI, CHI - NY, and NY - PAR segments.
From a list of all the segments, I need to figure out "ahhh, this is a flight that goes from Dallas to Paris" --then when I see passenger loads I can increment the city-to-city actual loads accordingly like so:
- Value associated with AD -- > increment segments AB, BC, CD
- value associated with AC --> increment only segments AB, BC
- value associated with AB --> increment only segment AB
etc.
Assume I get a list of values in no order for flight 44 like this: (DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR). How do I figure out the parent child structure comparing these 4 values in these 6 combinations?

Formulation
Let a_i -> b_i be the ith entry in your list of pairs for flight 44, i = 1..M.
Let V be the set of all unique a_i and b_i values:
V = {a_i | i = 1..M} U {b_i | i = 1..M}
Let E be the set of all pairs (a_i, b_i):
E = {(a_i, b_i) | i = 1..M}
Then G = (V, E) is a directed acyclic graph where vertices V are cities and directed edges E correspond to entries a_i -> b_i in your list.
Algorithm
What you are looking for is a topological sort of the graph G. The linked Wikipedia page has pseudocode for this algorithm.
This will give you a linear ordering of the cities (in your example: [Dallas, Chicago, New York, Paris]) that is consistent with all of the ordering constraints present in your initial list. If your initial list contains fewer than |V| choose 2 pairs (meaning there is not a full set of constraints) then there will potentially be multiple consistent topological orderings of the cities in your set V.

Note: This is a common-sense analysis, but see Timothy Shields solution where he identified the problem as Topological Sorting problem, thus having known computation complexity & known conditions on uniqueness.
I will try to extract the core of the problem from your answer in order to describe it formally.
In the above example, you actually have four nodes (cities), for brevity denoted as D, P, C, and NY. You have a set of ordered pairs (x, y), which are interpreted as "on that flight, node x precedes node y". Writing this as x<y, we actually have the following:
(for flight 044):
D < P
D < C
C < NY
NY < P
C < P
D < NY
From these constraints, we want to find an ordered tuple (x, y, z, w) such that x < y < z < w and the above constraints hold. We know that the solution is (x=D, y=C, z=NY, w=P).
Note: It might be that in your database, the first element in your set is always the "origin-destination pair" (in our case, D<P). But it does not change much on the analysis which follows.
How to find this ordered tuple programatically? I have relatively fair knowledge of algorithms, but am not aware of a standard method for solving this (other users may help here). I am concerned about the uniqueness of the result. It could be a good unit test of the integrity of your data that you should require that the solution for that ordered tuple is unique, otherwise you might be, subsequently, incrementing the wrong segments.
As we deal with uniqueness issue, I would suggest generating all the permutations of nodes, and displaying all the solutions which are feasible w.r.t the given constraints.
A naive implementation could look like this:
import itertools
nodes = ['D', 'P', 'NY', 'C']
result = [ot
for ot in itertools.permutations(nodes) # ot = ordered tuple
if ot.index('D') < ot.index('P')
if ot.index('D') < ot.index('C')
if ot.index('C') < ot.index('NY')
if ot.index('NY') < ot.index('P')
if ot.index('C') < ot.index('P')
if ot.index('D') < ot.index('NY')
]
print result
# displays: [('D', 'C', 'NY', 'P')]
If the number of nodes is low, this type of "naive" implementation may be sufficient. If the number is higher, I would suggest to implement it in such a way that the constrains are used effectively to prune the solution space (ask me if you would need hints for this).

Construct a list of all cities that are either departures or destinations from your flight list. This gives four
cities:
Dallas
Paris
Chicago
New York
Iterate over the flight list again and count the number of occurances of each destination city:
0 Dallas
3 Paris
1 Chicago
2 New York
Sort the list by the destination count and you have the route:
Dallas -> Chicago -> New York -> Paris
Note: If the destination counts are not contiguous starting with zero (eg. 0, 1, 2, 3...) it points to either an inconsistent or incomplete departure/destination list for that flight.

Okay, take two: here's a function that will take a string such as the one you provided and topologically sort it as per that wikipedia article.
import re
import itertools
def create_nodes(segments):
remaining_segments = re.findall(r'(\w*?)-(\w*?)[,)]', segments)
nodes = []
while len(remaining_segments) > 1:
outgoing, incoming = zip(*remaining_segments)
point = next(node for node in outgoing if node not in incoming)
nodes.append(point)
remaining_segments = [segment for segment in remaining_segments if segment[0] != point]
last_segment = remaining_segments.pop()
nodes.extend(last_segment)
return nodes
Test:
>>> foo = '(DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR)'
>>> scratch.create_nodes(foo)
['DAL', 'CHI', 'NYC', 'PAR']
Note that this is not a perfect topological sort function for every use; however, for your specific use case of multiple-stop one-way journeys it should be effective.

Have you looked into using a dictionary with a list store in it.
a dictionary is basically a hash table and you can store a key (the begging and end point ex, AD) and a value (the segments it needs to go through [AB, BC, CD])

Related

Optimization - distribute participants far from each other

This is my first question. I tried to find an answer for 2 days but I couldn't find what I was looking for.
Question: How can I minimize the amount of matches between students from the same school
I have a very practical case, I need to arrange a competition (tournament bracket)
but some of the participants might come from the same school.
Those from the same school should be put as far as possible from each other
for example: {A A A B B C} => {A B}, {A C}, {A B}
if there are more than half participants from one school, then there would be no other way but to pair up 2 guys from the same school.
for example: {A A A A B C} => {A B}, {A C}, {A A}
I don't expect to get code, just some keywords or some pseudo code on what you think would be a way of making this would be of great help!
I tried digging into constraint resolution algorithms and tournament bracket algorithms, but they don't consider minimising the amount of matches between students from same school.
Well, thank you so much in advance!
A simple algorithm (EDIT 2)
From the comments below: you have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
The idea
Sort the students by school, the schools with the more students before the ones with the less students. e.g A B B B B C C -> B B B B C C A.
Distribute the students in two groups A and B as in a war card game: 1st student in A, 2nd student in B, 3rd student in A, 4th student in B, ...
Continue with groups A and B.
You have a recursion: the position of a player in the level k-1 (k=n-1 to 0) is ((pos at level k) % 2) * 2^k + (pos at level k) // 2 (every even goes to the left, every odd goes to the right)
Python code
Sort array by number of schools:
assert 2**math.log2(len(players)) == len(players) # n is the number of rounds
c = collections.Counter([p.school for p in players])
players_sorted_by_school_count = sorted(players, key=lambda p:-c[p.school])
Find the final position of every player:
players_sorted_for_tournament = [-1] * 2**n
for j, player in enumerate(players_sorted_by_school_count):
pos = 0
for e in range(n-1,-1,-1):
if j % 2 == 1:
pos += 2**e # to the right
j = j // 2
players_sorted_for_tournament[pos] = player
This should give groups that are diverse enough, but I'm not sure whether it's optimal or not. Waiting for comments.
First version: how to make pairs from students of different schools
Just put the students from a same school into a stack. You have as many stack as schools. Now, sort your stacks by number of students. In your first example {A A A B B C}, you get:
A
A B
A B C
Now, take the two top elements from the two first stacks. The stack sizes have changed: if needed, reorder the stacks and continue. When you have only one stack, make pairs from this stack.
The idea is to keep as many "schools-stacks" as possible as long as possible: you spare the students of small stacks until you have no choice but to take them.
Steps with your second example, {A A A A B C}:
A
A
A
A B C => output A, B
A
A
A C => output A, C
A
A => output A A
It's a matching problem (EDIT 1)
I elaborate on the comments below. You have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
Your solution is to start with the set of all players and split it into two sets that are as diverse a possible. "Diverse" means here: the maximum number of different schools. To do so, you check all possible combinations of elements that split the set into two subsets of equals size. Then you perform recursively the same operation on those sets, until you arrive to the player level.
Another idea is to start with players and try to make pairs with other players from other school. Let's define a distance: 1 if two players are in the same school, 0 if they are in a different school. You want to make pairs with the minimum global distance.
This distance may be generalized for the pairs of players: take the number of common schools. That is: A B A B -> 2 (A & B), A B A C -> 1 (A), A B C D -> 0. You can imagine the distance between two sets (players, pairs, pairs of pairs, ...): the number of common schools. Now you can see this as a graph whose vertices are the sets (players, pairs, pairs of pairs, ...) and whose edges connect every pair of vertices with a weight that is the distance defined above. You are looking for a perfect matching (all vertices are matched) with a minimum weight.
The blossom algorithm or some of its variants seems to fit your needs, but it's probably overkill if the number of players is limited.
Create a two-dimensional array, where the first dimension will be for each school and the second dimension will be for each participant in this take-off.
Load them and you'll have everything you need linearly.
For example:
School 1 ------- Schol 2 -------- School 3
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B
A ------------ B
A
A
In the example above, we will have 3 schools (first dimension), with school 1 having 7 participants (second dimension), school 2 having 5 participants and school 3 having 3 participants.
You can also create a second array containing the resulting combinations and, for each chosen pair, delete this pair from the initial array in a loop until it is completely empty and the result array is completely full.
I think the algorithm in this answer could help.
Basically: group the students by school, and use the error tracking idea behind Bresenham's Algorithm to distribute the schools as far apart as possible. Then you pull out pairs from the list.

algorithm to find unique, non equivalent configurations given the height, the width, and the number of states each element can be

SO recently, I have been attempting to solve a code challenge and can not find the answer. The issue is not the implementation, but rather what to implement. The prompt can be found here http://pastebin.com/DxQssyKd
the main useful information from the prompt is as follows
"Write a function answer(w, h, s) that takes 3 integers and returns the number of unique, non-equivalent configurations that can be found on a star grid w blocks wide and h blocks tall where each celestial body has s possible states. Equivalency is defined as above: any two star grids with each celestial body in the same state where the actual order of the rows and columns do not matter (and can thus be freely swapped around). Star grid standardization means that the width and height of the grid will always be between 1 and 12, inclusive. And while there are a variety of celestial bodies in each grid, the number of states of those bodies is between 2 and 20, inclusive. The answer can be over 20 digits long, so return it as a decimal string."
The equivalency is in a way that
00
01
is equivalent to
01
00
and so on.
The problem is, what algorithm(s) should I use? i know this is somewhat related to permutations, combinations, and group theory, but I can not find anything specific.
The key weapon is Burnside's lemma, which equates the number of orbits of the symmetry group G = Sw × Sh acting on the set of configurations X = ([w] × [h] → [s]) (i.e., the answer) to the sum 1/|G| ∑g&in;G |Xg|, where Xg = {x | g.x = x} is the set of elements fixed by g.
Given g, it's straightforward to compute |Xg|: use g to construct a graph on vertices [w] × [h] where there is an edge between (i, j) and g(i, j) for all (i, j). Count c, the number of connected components, and return sc. The reasoning is that every vertex in a connected component must have the same state, but vertices in different components are unrelated.
Now, for 12 × 12 grids, there are far too many values of g to do this calculation on. Fortunately, when g and g' are conjugate (i.e., there exists some h such that h.g.h-1 = g') we find that |Xg'| = |{x | g'.x = x}| = |{x | h.g.h-1.x = x}| = |{x | g.h-1.x = h-1.x}| = |{h.y | g.y = y}| = |{y | g.y = y}| = |Xg|. We can thus sum over conjugacy classes and multiply each term by the number of group elements in the class.
The last piece is the conjugacy class structure of G = Sw × Sh. The conjugacy class structure of this direct product is really just the direct product of the conjugacy classes of Sw and Sh. The conjugacy classes of Sn are in one-to-one correspondence with integer partitions of n, enumerable by standard recursive methods. To compute the size of the class, you'll divide n! by the product of the partition terms (because circular permutations of the cycles are equivalent) and also by the product of the number of symmetries between cycles of the same size (product of the factorials of the multiplicities). See https://groupprops.subwiki.org/wiki/Conjugacy_class_size_formula_in_symmetric_group.

Maximum matching for assigning 2 items

There are N people at a party. Each one has some preferences of food and drinks. Given all the types of foods and drinks that a particular person prefers, find the maximum number of people that can be assigned a drink and a food of their choice.
A person may have several choices for both food and drinks, for example, a person may like Foods A,B,C and Drinks X,Y,Z. If we assign (A,Z) to the person, we consider the person to have been correctly assigned.
How do we solve this problem, considering that there are 2 constraints that we need to handle.
Let F be the set of all food there is, D be the set of all drink and P be the set of all people there is.
Build 2 bipartite graphs G and G' such that: for G: the first partite set is P and the second partite set is F, for G': the first partite set is P and the second partite set is D. Do maximal matching on both G and G' separately. Call M the maximum matching on G and M' the maximum matching on G'. M is a list of vertex-pair: (p1, f1), (p2,f2)... where pi and fi are people and food respectively. M' is also a list of vertex pair: (p1,d1), (p3,d3) ...
Now, merge M and M' by merging the pair with the same person: (p1,f1) + (p1,d1) = (p1,f1,d1) and that is the food-drink combo for p1. Say if p2 has a matching with f2 but p2 has no matching in G' (no drink), then ignore it.
A good algorithm for bipartite graph matching is Hopcroft-Karp algorithm. http://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm.

Neo4J - Traveling Salesman

I'm trying to solve an augmented TSP problem using a graph database, but I'm struggling. I'm great with SQL, but am a total noob on cypher. I've created a simple graph with cities (nodes) and flights (relationships).
THE SETUP: Travel to 8 different cities (1 city per week, no duplicates) with the lowest total flight cost. I'm trying to solve an optimal path to minimize the cost of the flights, which changes each week.
Here is a file on pastebin containing my nodes & relationships. Just run it against Neo4JShell to insert the data.
I started off using this article as a basis but it doesn't handle the changing distances (or in my case flight costs)
I know this is syntactically terrible/non-executable, but here's what I've done so far to get just two flights;
MATCH (a:CITY)-[F1:FLIGHT{week:1}]->(b:CITY) -[F2:FLIGHT{week:2}]->(c:CITY)
RETURN a,b,c;
But that doesn't run.
Next, I thought I'd just try to find all the cities & flights from week one, but it's not working right either as I get flights where week <> 1 as well as =1
MATCH (n) WHERE (n)-[:FLIGHT { week:1 }]->() RETURN n
Can anyone help out?
PS - I'm not married to using a graph DB to solve this, I've just read about them, and thought it would be well fitted to try it, plus gave me a reason to work with them, but so far, I'm not having much (or any) success.
Maybe this Cypher query will give you some ideas.
MATCH (from:Node {name: "Source node" })
MATCH path = (from)-[:CONNECTED_TO*6]->()
WHERE ALL(n in nodes(path) WHERE 1 = length(filter(m in nodes(path) WHERE m = n)))
AND length(nodes(path)) = 7
RETURN path,
reduce(distance = 0, edge in relationships(path) | distance + edge.distance)
AS totalDistance
ORDER BY totalDistance ASC
LIMIT 1
It does all permutations of available routes which are equal to the number of nodes (for this example it is 7), calculates lengths of all these paths and returns the shortest one.
neo4j may be a fine piece of software, but I wouldn't expect it to be of much help in solving this NP-hard problem. Instead, I would point you to an integer program solver (this one, perhaps, but I can't vouch for it) and suggest that you formulate this problem as an integer program as follows.
For each flight f, we create a 0-1 variable x(f) that is 1 if flight f is taken and 0 if flight f is not taken. The objective is to minimize the total cost of the flights (I'm going to assume that each purchase is an independent decision; if not, then you have some more work to do).
minimize sum_{flights f} cost(f) x(f)
Now we need some constraints. Each week, we purchase exactly one flight.
for all weeks i, sum_{flights f in week i} x(f) = 1
We can be in only one place at a time, so if we fly into city v for week i, then we fly out of city v for week i+1. We express this constraint with a strange but idiomatic linear equation.
for all weeks i, for all cities v,
sum_{flights f in week i to city v} x(f) -
sum_{flights f in week i+1 from city v} x(f) = 0
We can fly into each city at most once. We can fly out of each city at most once. This is how we enforce the constraint of visiting only once.
for all cities v,
sum_{flights f to city v} x(v) <= 1
for all cities v,
sum_{flights f from city v} x(v) <= 1
We're almost done. I'm going to assume at this point that the journey begins and ends in a home city u known ahead of time. For the first week, delete all flights not departing from u. For the last week, delete all flights not arriving at u. The flexibility of integer programming, however, means that it's easy to make other arrangements.

Algorithm to establish ordering amongst a set of items

I have a set of students (referred to as items in the title for generality). Amongst these students, some have a reputation for being rambunctious. We are told about a set of hate relationships of the form 'i hates j'. 'i hates j' does not imply 'j hates i'. We are supposed to arrange the students in rows (front most row numbered 1) in a way such that if 'i hates j' then i should be put in a row that is strictly lesser numbered than that of j (in other words: in some row that is in front of j's row) so that i doesn't throw anything at j (Turning back is not allowed). What would be an efficient algorithm to find the minimum number of rows needed (each row need not have the same number of students)?
We will make the following assumptions:
1) If we model this as a directed graph, there are no cycles in the graph. The most basic cycle would be: if 'i hates j' is true, 'j hates i' is false. Because otherwise, I think the ordering would become impossible.
2) Every student in the group is at least hated by one other student OR at least hates one other student. Of course, there would be students who are both hated by some and who in turn hate other students. This means that there are no stray students who don't form part of the graph.
Update: I have already thought of constructing a directed graph with i --> j if 'i hates j and doing topological sorting. However, since the general topological sort would suit better if I had to line all the students in a single line. Since there is a variation of the rows here, I am trying to figure out how to factor in the change into topological sort so it gives me what I want.
When you answer, please state the complexity of your solution. If anybody is giving code and you don't mind the language, then I'd prefer Java but of course any other language is just as fine.
JFYI This is not for any kind of homework (I am not a student btw :)).
It sounds to me that you need to investigate topological sorting.
This problem is basically another way to put the longest path in a directed graph problem. The number of rows is actually number of nodes in path (number of edges + 1).
Assuming the graph is acyclic, the solution is topological sort.
Acyclic is a bit stronger the your assumption 1. Not only A -> B and B -> A is invalid. Also A -> B, B -> C, C -> A and any cycle of any length.
HINT: the question is how many rows are needed, not which student in which row. The answer to the question is the length of the longest path.
It's from a project management theory (or scheduling theory, I don't know the exact term). There the task is about sorting jobs (vertex is a job, arc is a job order relationship).
Obviously we have some connected oriented graph without loops. There is an arc from vertex a to vertex b if and only if a hates b. Let's assume there is a source (without incoming arcs) and destination (without outgoing arcs) vertex. If that is not the case, just add imaginary ones. Now we want to find length of a longest path from source to destination (it will be number of rows - 1, but mind the imaginary verteces).
We will define vertex rank (r[v]) as number of arcs in a longest path between source and this vertex v. Obviously we want to know r[destination]. Algorithm for finding rank:
0) r_0[v] := 0 for all verteces v
repeat
t) r_t[end(j)] := max( r_{t-1}[end(j)], r_{t-1}[start(j)] + 1 ) for all arcs j
until for all arcs j r_{t+1}[end(j)] = r_t[end(j)] // i.e. no changes on this iteration
On each step at least one vertex increases its rank. Therefore in this form complexity is O(n^3).
By the way, this algorithm also gives you student distribution among rows. Just group students by their respective ranks.
Edit: Another code with the same idea. Possibly it is better understandable.
# Python
# V is a list of vertex indices, let it be something like V = range(N)
# source has index 0, destination has index N-1
# E is a list of edges, i.e. tuples of the form (start vertex, end vertex)
R = [0] * len(V)
do:
changes = False
for e in E:
if R[e[1]] < R[e[0]] + 1:
changes = True
R[e[1]] = R[e[0]] + 1
while changes
# The answer is derived from value of R[N-1]
Of course this is the simplest implementation. It can be optimized, and time estimate can be better.
Edit2: obvious optimization - update only verteces adjacent to those that were updated on the previous step. I.e. introduce a queue with verteces whose rank was updated. Also for edge storing one should use adjacency lists. With such optimization complexity would be O(N^2). Indeed, each vertex may appear in the queue at most rank times. But vertex rank never exceeds N - number of verteces. Therefore total number of algorithm steps will not exceed O(N^2).
Essentailly the important thing in assumption #1 is that there must not be any cycles in this graph. If there are any cycles you can't solve this problem.
I would start by seating all of the students that do not hate any other students in the back row. Then you can seat the students who hate these students in the next row and etc.
The number of rows is the length of the longest path in the directed graph, plus one. As a limit case, if there is no hate relationship everyone can fit on the same row.
To allocate the rows, put everyone who is not hated by anyone else on the row one. These are the "roots" of your graph. Everyone else is put on row N + 1 if N is the length of the longest path from any of the roots to that person (this path is of length one at least).
A simple O(N^3) algorithm is the following:
S = set of students
for s in S: s.row = -1 # initialize row field
rownum = 0 # start from first row below
flag = true # when to finish
while (flag):
rownum = rownum + 1 # proceed to next row
flag = false
for s in S:
if (s.row != -1) continue # already allocated
ok = true
foreach q in S:
# Check if there is student q who will sit
# on this or later row who hates s
if ((q.row == -1 or q.row = rownum)
and s hated by q) ok = false; break
if (ok): # can put s here
s.row = rownum
flag = true
Simple answer = 1 row.
Put all students in the same row.
Actually that might not solve the question as stated - lesser row, rather than equal row...
Put all students in row 1
For each hate relation, put the not-hating student in a row behind the hating student
Iterate till you have no activity, or iterate Num(relation) times.
But I'm sure there are better algorithms - look at acyclic graphs.
Construct a relationship graph where i hates j will have a directed edge from i to j. So end result is a directed graph. It should be a DAG otherwise no solutions as it's not possible to resolve circular hate relations ship.
Now simply do a DFS search and during the post node callbacks, means the once the DFS of all the children are done and before returning from the DFS call to this node, simply check the row number of all the children and assign the row number of this node as row max row of the child + 1. Incase if there is some one who doesn't hate anyone basically node with no adjacency list simply assign him row 0.
Once all the nodes are processed reverse the row numbers. This should be easy as this is just about finding the max and assigning the row numbers as max-already assigned row numbers.
Here is the sample code.
postNodeCb( graph g, int node )
{
if ( /* No adj list */ )
row[ node ] = 0;
else
row[ node ] = max( row number of all children ) + 1;
}
main()
{
.
.
for ( int i = 0; i < NUM_VER; i++ )
if ( !visited[ i ] )
graphTraverseDfs( g, i );`enter code here`
.
.
}

Resources