Finding proper subset of elements using relational algebra - relational-algebra

I have one table called Exams composed by the columns student and exam and i need to find all the students that took a proper subset of the Exams taken by the student A.
A sample of data could be:
student exam
A 1
A 2
B 1
B 3
C 1
C 2
D 1
And the result should be
student
D
Because only D took a proper subset of the exams taken by A, B is not in the result because he took an exam that A has not taken
What i came up with so far is:
take all the exams that the student A has taken
examsA ← π exam (σ student='A' (Exams))
divide the Exams relation by the exams the student A as taken
studentsNoGood ← Exams ÷ examsA
now i have all the students that took exactly the same exams and those who took more exams,
by substracting i find only those who took less exams and those who did not take a subset of the student 'A' exams.
lessExamsOrNotSubset ← Exams - studentNoGood
And then im stuck on how to differentiate those with less exams and those who took unrelated exams
With 'a proper subset' i mean that with 2 sets D and E, D is a proper subset of E iff D is contained in E and D is not equal to E, so there is an element of E that is not in D.
I am using the relational algebra found in the book Fundamentals of Database Systems (Elmasri, Navathe). page 239

To find the students that took exams unrelated to those of A first find those exams:
R1 ← π exam (Exams) - examsA
then find the students with at least one exam in this set:
studentsUnrelated ← π student (Exams ⨝(Exams.exam = R1.exam) R1)
Then you can remove also these students to find those that have taken only a proper subset of the exams of A.

Related

Algorithm for sequential reward points

I want to write an algorithm to find the sequential reward points.
The inviter gets (1/2)^k points for each confirmed invitation, where k is the level of the invitation: level 0
(people directly invited) yields 1 point, level 1 (people invited by someone invited by the original customer)
gives 1/2 points, level 2 invitations (people invited by someone on level 1) awards 1/4 points and so on.
Only the first invitation counts: multiple invites sent to the same person don't produce any further points,
even if they come from different inviters and only the first invitation counts.
For instance:
Input:
A recommends B
B accepts
B recommends C
C accepts
C recommends D
B recommends D
D accepts
would calculate as:
A receives 1 Point from the recommendation of B, 0.5 Point from the recommendation of C by B and
another 0.25 Point by the recommendation of D by C. A gets a total score of 1.75 Points.
B receives 1 Point from the recommendation of C and 0.5 Point from the recommendation of D by C. B receives no Points from the recommendation of D because D was invited by C before. B gets a total score of 1.5 Points.
C receives 1 Point from the recommendation of D. C gets a total score of 1 Point.
Output:
{ “A”: 1.75, “B”: 1.5, “C”: 1 }
What should be the algorithm for that? I think Dynamic Programing has to be use here.
This is simply an ancestors search in a tree. By keeping track of the depth, you know how many points to award.
Pseudocode
def add_points(accepter):
depth = 0
while accepter has an inviter:
accepter.inviter.points += (0.5)^depth
accepter = accepter.inviter
depth += 1
This algorithm is O(number of parents) and since you need to traverse all parents to update, you know you can't do any better complexity-wise.

Optimization - distribute participants far from each other

This is my first question. I tried to find an answer for 2 days but I couldn't find what I was looking for.
Question: How can I minimize the amount of matches between students from the same school
I have a very practical case, I need to arrange a competition (tournament bracket)
but some of the participants might come from the same school.
Those from the same school should be put as far as possible from each other
for example: {A A A B B C} => {A B}, {A C}, {A B}
if there are more than half participants from one school, then there would be no other way but to pair up 2 guys from the same school.
for example: {A A A A B C} => {A B}, {A C}, {A A}
I don't expect to get code, just some keywords or some pseudo code on what you think would be a way of making this would be of great help!
I tried digging into constraint resolution algorithms and tournament bracket algorithms, but they don't consider minimising the amount of matches between students from same school.
Well, thank you so much in advance!
A simple algorithm (EDIT 2)
From the comments below: you have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
The idea
Sort the students by school, the schools with the more students before the ones with the less students. e.g A B B B B C C -> B B B B C C A.
Distribute the students in two groups A and B as in a war card game: 1st student in A, 2nd student in B, 3rd student in A, 4th student in B, ...
Continue with groups A and B.
You have a recursion: the position of a player in the level k-1 (k=n-1 to 0) is ((pos at level k) % 2) * 2^k + (pos at level k) // 2 (every even goes to the left, every odd goes to the right)
Python code
Sort array by number of schools:
assert 2**math.log2(len(players)) == len(players) # n is the number of rounds
c = collections.Counter([p.school for p in players])
players_sorted_by_school_count = sorted(players, key=lambda p:-c[p.school])
Find the final position of every player:
players_sorted_for_tournament = [-1] * 2**n
for j, player in enumerate(players_sorted_by_school_count):
pos = 0
for e in range(n-1,-1,-1):
if j % 2 == 1:
pos += 2**e # to the right
j = j // 2
players_sorted_for_tournament[pos] = player
This should give groups that are diverse enough, but I'm not sure whether it's optimal or not. Waiting for comments.
First version: how to make pairs from students of different schools
Just put the students from a same school into a stack. You have as many stack as schools. Now, sort your stacks by number of students. In your first example {A A A B B C}, you get:
A
A B
A B C
Now, take the two top elements from the two first stacks. The stack sizes have changed: if needed, reorder the stacks and continue. When you have only one stack, make pairs from this stack.
The idea is to keep as many "schools-stacks" as possible as long as possible: you spare the students of small stacks until you have no choice but to take them.
Steps with your second example, {A A A A B C}:
A
A
A
A B C => output A, B
A
A
A C => output A, C
A
A => output A A
It's a matching problem (EDIT 1)
I elaborate on the comments below. You have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
Your solution is to start with the set of all players and split it into two sets that are as diverse a possible. "Diverse" means here: the maximum number of different schools. To do so, you check all possible combinations of elements that split the set into two subsets of equals size. Then you perform recursively the same operation on those sets, until you arrive to the player level.
Another idea is to start with players and try to make pairs with other players from other school. Let's define a distance: 1 if two players are in the same school, 0 if they are in a different school. You want to make pairs with the minimum global distance.
This distance may be generalized for the pairs of players: take the number of common schools. That is: A B A B -> 2 (A & B), A B A C -> 1 (A), A B C D -> 0. You can imagine the distance between two sets (players, pairs, pairs of pairs, ...): the number of common schools. Now you can see this as a graph whose vertices are the sets (players, pairs, pairs of pairs, ...) and whose edges connect every pair of vertices with a weight that is the distance defined above. You are looking for a perfect matching (all vertices are matched) with a minimum weight.
The blossom algorithm or some of its variants seems to fit your needs, but it's probably overkill if the number of players is limited.
Create a two-dimensional array, where the first dimension will be for each school and the second dimension will be for each participant in this take-off.
Load them and you'll have everything you need linearly.
For example:
School 1 ------- Schol 2 -------- School 3
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B
A ------------ B
A
A
In the example above, we will have 3 schools (first dimension), with school 1 having 7 participants (second dimension), school 2 having 5 participants and school 3 having 3 participants.
You can also create a second array containing the resulting combinations and, for each chosen pair, delete this pair from the initial array in a loop until it is completely empty and the result array is completely full.
I think the algorithm in this answer could help.
Basically: group the students by school, and use the error tracking idea behind Bresenham's Algorithm to distribute the schools as far apart as possible. Then you pull out pairs from the list.

Neo4J - Traveling Salesman

I'm trying to solve an augmented TSP problem using a graph database, but I'm struggling. I'm great with SQL, but am a total noob on cypher. I've created a simple graph with cities (nodes) and flights (relationships).
THE SETUP: Travel to 8 different cities (1 city per week, no duplicates) with the lowest total flight cost. I'm trying to solve an optimal path to minimize the cost of the flights, which changes each week.
Here is a file on pastebin containing my nodes & relationships. Just run it against Neo4JShell to insert the data.
I started off using this article as a basis but it doesn't handle the changing distances (or in my case flight costs)
I know this is syntactically terrible/non-executable, but here's what I've done so far to get just two flights;
MATCH (a:CITY)-[F1:FLIGHT{week:1}]->(b:CITY) -[F2:FLIGHT{week:2}]->(c:CITY)
RETURN a,b,c;
But that doesn't run.
Next, I thought I'd just try to find all the cities & flights from week one, but it's not working right either as I get flights where week <> 1 as well as =1
MATCH (n) WHERE (n)-[:FLIGHT { week:1 }]->() RETURN n
Can anyone help out?
PS - I'm not married to using a graph DB to solve this, I've just read about them, and thought it would be well fitted to try it, plus gave me a reason to work with them, but so far, I'm not having much (or any) success.
Maybe this Cypher query will give you some ideas.
MATCH (from:Node {name: "Source node" })
MATCH path = (from)-[:CONNECTED_TO*6]->()
WHERE ALL(n in nodes(path) WHERE 1 = length(filter(m in nodes(path) WHERE m = n)))
AND length(nodes(path)) = 7
RETURN path,
reduce(distance = 0, edge in relationships(path) | distance + edge.distance)
AS totalDistance
ORDER BY totalDistance ASC
LIMIT 1
It does all permutations of available routes which are equal to the number of nodes (for this example it is 7), calculates lengths of all these paths and returns the shortest one.
neo4j may be a fine piece of software, but I wouldn't expect it to be of much help in solving this NP-hard problem. Instead, I would point you to an integer program solver (this one, perhaps, but I can't vouch for it) and suggest that you formulate this problem as an integer program as follows.
For each flight f, we create a 0-1 variable x(f) that is 1 if flight f is taken and 0 if flight f is not taken. The objective is to minimize the total cost of the flights (I'm going to assume that each purchase is an independent decision; if not, then you have some more work to do).
minimize sum_{flights f} cost(f) x(f)
Now we need some constraints. Each week, we purchase exactly one flight.
for all weeks i, sum_{flights f in week i} x(f) = 1
We can be in only one place at a time, so if we fly into city v for week i, then we fly out of city v for week i+1. We express this constraint with a strange but idiomatic linear equation.
for all weeks i, for all cities v,
sum_{flights f in week i to city v} x(f) -
sum_{flights f in week i+1 from city v} x(f) = 0
We can fly into each city at most once. We can fly out of each city at most once. This is how we enforce the constraint of visiting only once.
for all cities v,
sum_{flights f to city v} x(v) <= 1
for all cities v,
sum_{flights f from city v} x(v) <= 1
We're almost done. I'm going to assume at this point that the journey begins and ends in a home city u known ahead of time. For the first week, delete all flights not departing from u. For the last week, delete all flights not arriving at u. The flexibility of integer programming, however, means that it's easy to make other arrangements.

Determine parent, children from pairwise data

I need to determine parent/child relationships from some unusual data.
Flight numbers are marketing creations and they are odd. Flight # 22 by Airline X may refer to a singular trip between X and Y. Flight # 44 from the same airline may actually refer to multiple flights between city pairs. Example:
Flight 44: Dallas - Paris
Flight 44: Dallas - Chicago
Flight 44: Chicago - New York
Flight 44: New York - Paris
Flight 44: Chicago - Paris
Flight 44: Dallas - New York
Reality -- this is the way they work. When I pull the data from the "big list of flight numbers and city pairs" I get those 6 combinations for flight 44. I have passenger counts for each, so if there are 10 people flying Dallas - Paris, I need to take those 10 passengers and add them to the DAL - CHI, CHI - NY, and NY - PAR segments.
From a list of all the segments, I need to figure out "ahhh, this is a flight that goes from Dallas to Paris" --then when I see passenger loads I can increment the city-to-city actual loads accordingly like so:
- Value associated with AD -- > increment segments AB, BC, CD
- value associated with AC --> increment only segments AB, BC
- value associated with AB --> increment only segment AB
etc.
Assume I get a list of values in no order for flight 44 like this: (DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR). How do I figure out the parent child structure comparing these 4 values in these 6 combinations?
Formulation
Let a_i -> b_i be the ith entry in your list of pairs for flight 44, i = 1..M.
Let V be the set of all unique a_i and b_i values:
V = {a_i | i = 1..M} U {b_i | i = 1..M}
Let E be the set of all pairs (a_i, b_i):
E = {(a_i, b_i) | i = 1..M}
Then G = (V, E) is a directed acyclic graph where vertices V are cities and directed edges E correspond to entries a_i -> b_i in your list.
Algorithm
What you are looking for is a topological sort of the graph G. The linked Wikipedia page has pseudocode for this algorithm.
This will give you a linear ordering of the cities (in your example: [Dallas, Chicago, New York, Paris]) that is consistent with all of the ordering constraints present in your initial list. If your initial list contains fewer than |V| choose 2 pairs (meaning there is not a full set of constraints) then there will potentially be multiple consistent topological orderings of the cities in your set V.
Note: This is a common-sense analysis, but see Timothy Shields solution where he identified the problem as Topological Sorting problem, thus having known computation complexity & known conditions on uniqueness.
I will try to extract the core of the problem from your answer in order to describe it formally.
In the above example, you actually have four nodes (cities), for brevity denoted as D, P, C, and NY. You have a set of ordered pairs (x, y), which are interpreted as "on that flight, node x precedes node y". Writing this as x<y, we actually have the following:
(for flight 044):
D < P
D < C
C < NY
NY < P
C < P
D < NY
From these constraints, we want to find an ordered tuple (x, y, z, w) such that x < y < z < w and the above constraints hold. We know that the solution is (x=D, y=C, z=NY, w=P).
Note: It might be that in your database, the first element in your set is always the "origin-destination pair" (in our case, D<P). But it does not change much on the analysis which follows.
How to find this ordered tuple programatically? I have relatively fair knowledge of algorithms, but am not aware of a standard method for solving this (other users may help here). I am concerned about the uniqueness of the result. It could be a good unit test of the integrity of your data that you should require that the solution for that ordered tuple is unique, otherwise you might be, subsequently, incrementing the wrong segments.
As we deal with uniqueness issue, I would suggest generating all the permutations of nodes, and displaying all the solutions which are feasible w.r.t the given constraints.
A naive implementation could look like this:
import itertools
nodes = ['D', 'P', 'NY', 'C']
result = [ot
for ot in itertools.permutations(nodes) # ot = ordered tuple
if ot.index('D') < ot.index('P')
if ot.index('D') < ot.index('C')
if ot.index('C') < ot.index('NY')
if ot.index('NY') < ot.index('P')
if ot.index('C') < ot.index('P')
if ot.index('D') < ot.index('NY')
]
print result
# displays: [('D', 'C', 'NY', 'P')]
If the number of nodes is low, this type of "naive" implementation may be sufficient. If the number is higher, I would suggest to implement it in such a way that the constrains are used effectively to prune the solution space (ask me if you would need hints for this).
Construct a list of all cities that are either departures or destinations from your flight list. This gives four
cities:
Dallas
Paris
Chicago
New York
Iterate over the flight list again and count the number of occurances of each destination city:
0 Dallas
3 Paris
1 Chicago
2 New York
Sort the list by the destination count and you have the route:
Dallas -> Chicago -> New York -> Paris
Note: If the destination counts are not contiguous starting with zero (eg. 0, 1, 2, 3...) it points to either an inconsistent or incomplete departure/destination list for that flight.
Okay, take two: here's a function that will take a string such as the one you provided and topologically sort it as per that wikipedia article.
import re
import itertools
def create_nodes(segments):
remaining_segments = re.findall(r'(\w*?)-(\w*?)[,)]', segments)
nodes = []
while len(remaining_segments) > 1:
outgoing, incoming = zip(*remaining_segments)
point = next(node for node in outgoing if node not in incoming)
nodes.append(point)
remaining_segments = [segment for segment in remaining_segments if segment[0] != point]
last_segment = remaining_segments.pop()
nodes.extend(last_segment)
return nodes
Test:
>>> foo = '(DAL-CHI, CHI-NYC, NYC-PAR, DAL-NYC, DAL-PAR, CHI-PAR)'
>>> scratch.create_nodes(foo)
['DAL', 'CHI', 'NYC', 'PAR']
Note that this is not a perfect topological sort function for every use; however, for your specific use case of multiple-stop one-way journeys it should be effective.
Have you looked into using a dictionary with a list store in it.
a dictionary is basically a hash table and you can store a key (the begging and end point ex, AD) and a value (the segments it needs to go through [AB, BC, CD])

Algorithm: Explanation of a graph based

In Futaba Kindergarten, where Shinchan studies, there are N students, s_0, s_1...s_(N-1), including Shinchan. Every student knows each other directly or indirectly. Two students knows each other directly if they are friends. Indirectly knowing each other means there is a third students who knows both of them. Knowing each other is a symmetric relation, i.e., if student s_a knows student s_b then student s_b also knows student s_a.
Ai-chan is a new admission in the class. She wants to be friend with all of them. But it will be very cumbersome to befriend each of the N students there. So she decided to befriend some of them such that every student in the class is either a friend of her or friend of friend of her.
Help her to select those students such that befriending them will complete her objective. The lesser number of students the better it is.
Input
First line of input will contain two space separated integer, N M, number of students at Futaba Kindergarten, excluding Ai-chan, and number of pairs of students who are friend to each other, i.e. they knows each other directly. Then follows M lines. In each line there are two space separated integer, s_u s_v, such that student s_u and s_v are friend to each other.
Output
In first line print the total number, P, of such students. Then in next line print P space separated index of students, such that befriending them will help Ai-chan to achieve her objective.
Constraints:
1 <= N <= 10^5
1 <= M <= min(10^5, N*(N-1)/2)
0 <= s_u, s_v <= N-1
s_u != s_v
Each pair of students (s_u, s_v) knows each other, directly or indirectly.
Score: ((N-P)/N)*200
**Sample Input**
6 7
0 1
0 2
1 2
1 3
2 4
3 4
3 5
**Sample Output**
4
0 2 3 5
Im My opinion be friending with only 1 and 3 will do the job. Am i missing something ?
I am not looking for the solution , just the explanation of sample input and output.
The solution is a simple greedy algorithm. Suppose that C is the set of students.
S = {}
R = {}
while (C != {}) {
- sort the students based on their number of friends
- pick the student s with the highest number of friends
- add R = R + {s}
- add s and friends of s to the set S and remove them from C
}
print(R)

Resources