Related
This is my first question. I tried to find an answer for 2 days but I couldn't find what I was looking for.
Question: How can I minimize the amount of matches between students from the same school
I have a very practical case, I need to arrange a competition (tournament bracket)
but some of the participants might come from the same school.
Those from the same school should be put as far as possible from each other
for example: {A A A B B C} => {A B}, {A C}, {A B}
if there are more than half participants from one school, then there would be no other way but to pair up 2 guys from the same school.
for example: {A A A A B C} => {A B}, {A C}, {A A}
I don't expect to get code, just some keywords or some pseudo code on what you think would be a way of making this would be of great help!
I tried digging into constraint resolution algorithms and tournament bracket algorithms, but they don't consider minimising the amount of matches between students from same school.
Well, thank you so much in advance!
A simple algorithm (EDIT 2)
From the comments below: you have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
The idea
Sort the students by school, the schools with the more students before the ones with the less students. e.g A B B B B C C -> B B B B C C A.
Distribute the students in two groups A and B as in a war card game: 1st student in A, 2nd student in B, 3rd student in A, 4th student in B, ...
Continue with groups A and B.
You have a recursion: the position of a player in the level k-1 (k=n-1 to 0) is ((pos at level k) % 2) * 2^k + (pos at level k) // 2 (every even goes to the left, every odd goes to the right)
Python code
Sort array by number of schools:
assert 2**math.log2(len(players)) == len(players) # n is the number of rounds
c = collections.Counter([p.school for p in players])
players_sorted_by_school_count = sorted(players, key=lambda p:-c[p.school])
Find the final position of every player:
players_sorted_for_tournament = [-1] * 2**n
for j, player in enumerate(players_sorted_by_school_count):
pos = 0
for e in range(n-1,-1,-1):
if j % 2 == 1:
pos += 2**e # to the right
j = j // 2
players_sorted_for_tournament[pos] = player
This should give groups that are diverse enough, but I'm not sure whether it's optimal or not. Waiting for comments.
First version: how to make pairs from students of different schools
Just put the students from a same school into a stack. You have as many stack as schools. Now, sort your stacks by number of students. In your first example {A A A B B C}, you get:
A
A B
A B C
Now, take the two top elements from the two first stacks. The stack sizes have changed: if needed, reorder the stacks and continue. When you have only one stack, make pairs from this stack.
The idea is to keep as many "schools-stacks" as possible as long as possible: you spare the students of small stacks until you have no choice but to take them.
Steps with your second example, {A A A A B C}:
A
A
A
A B C => output A, B
A
A
A C => output A, C
A
A => output A A
It's a matching problem (EDIT 1)
I elaborate on the comments below. You have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
Your solution is to start with the set of all players and split it into two sets that are as diverse a possible. "Diverse" means here: the maximum number of different schools. To do so, you check all possible combinations of elements that split the set into two subsets of equals size. Then you perform recursively the same operation on those sets, until you arrive to the player level.
Another idea is to start with players and try to make pairs with other players from other school. Let's define a distance: 1 if two players are in the same school, 0 if they are in a different school. You want to make pairs with the minimum global distance.
This distance may be generalized for the pairs of players: take the number of common schools. That is: A B A B -> 2 (A & B), A B A C -> 1 (A), A B C D -> 0. You can imagine the distance between two sets (players, pairs, pairs of pairs, ...): the number of common schools. Now you can see this as a graph whose vertices are the sets (players, pairs, pairs of pairs, ...) and whose edges connect every pair of vertices with a weight that is the distance defined above. You are looking for a perfect matching (all vertices are matched) with a minimum weight.
The blossom algorithm or some of its variants seems to fit your needs, but it's probably overkill if the number of players is limited.
Create a two-dimensional array, where the first dimension will be for each school and the second dimension will be for each participant in this take-off.
Load them and you'll have everything you need linearly.
For example:
School 1 ------- Schol 2 -------- School 3
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B
A ------------ B
A
A
In the example above, we will have 3 schools (first dimension), with school 1 having 7 participants (second dimension), school 2 having 5 participants and school 3 having 3 participants.
You can also create a second array containing the resulting combinations and, for each chosen pair, delete this pair from the initial array in a loop until it is completely empty and the result array is completely full.
I think the algorithm in this answer could help.
Basically: group the students by school, and use the error tracking idea behind Bresenham's Algorithm to distribute the schools as far apart as possible. Then you pull out pairs from the list.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
There are N cars in a row numbered from 1 to N.
A person takes M photographs of cars. For each photograph, the cars appearing in it are given by tuple (i, j), which means all the cars from ith car to jth car will appear in that photograph.
Note that, all the photos need not cover every car. A car can appear in more than one photograph.
It is given that each photograph contains exactly 1 purple car. Find the maximum number of purple cars possible. If not possible print -1.
Input : First line contains N and M. Next line contains M pairs (x, y) which represent a photograph containing cars from xth car to yth car.
Output : Maximum number of purple cars possible.
Example :
Input:
5 1
(3 5)
Output : 3
Explanation : only one car from 3 to 5 can be purple. Car 1 and car 2 will be purple for maximizing the number of purple cars.
Input:
5 1
(4 4)
Output : 5
Input:
5 3
(1 4), (3 5), (3, 4)
Output : 1
Explanation : Either 3 or 4 can be a purple car.
Input:
5 2
(1, 4), (2, 5)
Output : 2
Explanation: Car 1 and Car 5 can be purple.
Input :
10 3
(1 5), (6, 10), (1, 10)
Output: -1
Explanation: It is impossible in this case for each interval to have exactly 1 purple car.
If I'm not mistaken, the problem can be solved as a network problem as follows.
The node set of the network has a source node s and a terminal t; imagine the sink at a leftmost position and the terminal in the righmost position and the flow goes from left to right. Next to s put a node for each car and connect s to every car. Next to the car nodes create a node for each interval. Now start at the car nodes and create paths to t, traversing the interval node set; the path goes through every interval which contains the car.
Except for s and t, the flow in each node is restricted to be exactly 1 which models that exactly 1 car per interval must be colored purple; the arcs do not explicitly need to be constrained. Finally via some network flow algorithm, maximize the intensity of the flow from s to t. Color each car purple for which its node has a nonzero flow. If the instance does not permit a feasible flow, the initial problem instance is infeasible.
Notice that, if there is a photograph which is totally covered by another photograph, we only need to care about the outer photograph(trivial).
Second observation:
For now, after removing all covered photographs (as stated above), we will be left with cases like this:
We have n photographs, it can be divided into several clusters, with each cluster is a group of m photographs: (a1, b1) , (a2, b2) ... , (am, bm) with a1 < a2 and b1 > a2 ... or ai < a(i + 1) and bi > a(i + 1)
We notice that, it is always optimal if we select the first purple car for the first photograph to be in the range (a1, a2), and continue to select car in this manner (the first range which is not yet covered by any car). Prove:
if we select car in range (a1, a2), so the next car can be selected will be in range (b1 , bm), otherwise, if we choose other car (in range (a2,b1)), the range to select next car will be smaller range (trivial to see), so , select car in range (a1, a2) will give optimal result.
If we implement a sweep line algorithm, time complexity will be O(m log m)
Note: this algorithm is valid only if the set of photographs is valid, otherwise, we need to check for its validity.
Update : As pointed out by PeterdeRivaz, we need to take care of one special case: when there are two or more photographs, both cover one photographs, so, we need to join all of them.
Let's say we have a directed graph. We want to visit every node exactly once by traveling on the edges of this graph. Every node is annotated with one or more tags; some nodes may share tags, and even have the exact same set of tags. As we go along our walk, we are collecting a list of every distinct tag we have encountered - our objective is to find the walk which postpones acquisition of new tags as much as possible.
To restate this as a traveler analogy, let's say that a carpet salesman is trying to decide which supplier he should acquire his carpets from. He makes a list of all the carpet factories in the city. He makes an appointment witht every factory, and collect samples of the kinds of carpet they make.
Let's say we have 3 factories, producing the following kinds of carpet:
F1: C1, C2, C3
F2: C1, C4
F3: C1, C4, C5
The salesman could take the following routes:
Start at F1, collect C1, C2, C3. Go to F2, collect C4 (since he already has C1). Go to F3, collect C5 (he already has C1 and C4).
Start at F1, collect C1, C2, C3. Go to F3, collect C4 and C5. Go to F2, collect nothing (since it turns out he already has all their carpets).
Start at F2, collect C1, C4. Go to F1, collect C2, C3. Go to F3 and collect C5.
Start at F2, collect C1, C4. Go to F3, collect C5. Go to F1 and collect C3.
Start at F3, collect C1, C4, C5. Go to F1, collect C2, C3. Go to F2, collect nothing.
Start at F3, collect C1, C4, C5. Go to F2, collect nothing. Go to F1, collect C2, C3.
Note how sometimes, the salesman visits a factory even though he knows he has already collected a sample for every kind of carpet they produce. The analogy breaks down here a bit, but let's say he must visit them because it would be rude to not show up for his appointment.
Now, the carpet samples are heavy, and our salesman is traveling on foot. Distance by itself isn't hugely important (assume every edge has cost 1), but he doesn't want to carry around a whole bunch of samples any more than he needs to. So, he needs to plan his trip such that he visits the factories which have a lot of rare carpets (and where he will have to pick up a lot of new samples) last.
For the example paths above, here are the numbers of samples carried at each leg of the journey (columns 2-4), and the sum (column 5).
1 0 3 4 7
2 0 3 5 8
3 0 2 4 6
4 0 2 3 5
5 0 3 5 8
6 0 3 3 6
We can see now that route 2 is very bad: first he had to carry 3 sample from F1 to F3, then he had to carry 5 samples from F3 to F2! Instead, he could have went with route 4 - he would carry first 2 samples from F2 to F3, and then 3 samples from F3 to F1.
Also, as shown in the last column, the sum of the samples carried through every edge is a good metric for how many samples he had to carry overall: The number of samples he is carrying cannot decrease, so visiting varied factories early on will necessarily inflate the sum, and a low sum is only possible by visiting similar factories with few carpets.
Is this a known problem? Is there an algorithm to solve it?
Note: I would recommend being careful about making assumptions based on my example problem. I came up with it on the spot, and deliberately kept it small for brevity. It is certain there are many edge cases that it fails to catch.
As the size of the Graph is small, we can consider using bit-mask and dynamic programming to solve this problem (Similar with how we solve the traveling salesman problem)
Assume that we have total 6 cities to visit. So the starting state is 0 and the ending is 111111b or 127 in decimal.
From each step, if the state is x, we can easily calculate the number of sampling the salesman is carrying, and the cost from state x to state y will be the number of newly added samples from x to y times the number of unvisited cities .
public int cal(int mask) {
if (/*Visit all city*/) {
return 0;
}
HashSet<Integer> sampleSet = new HashSet();//Store current samples
int left = 0;//Number of unvisited cities
for (int i = 0; i < numberOfCity; i++) {
if (((1 << i) & mask) != 0) {//If this city was visited
sampleSet.addAll(citySample[i]);
} else {
left++;
}
}
int cost;
for (int i = 0; i < numberOfCity; i++) {
if (((1 << i) & mask) == 0) {
int dif = number of new sample from city i;
cost = min(dif * left + cal(mask | (1 << i));
}
}
return cost;
}
In the case where there are edges between every pair of nodes, and each carpet is only available at one location, this looks tractable. If you pick up X carpets when there are Y steps to go, then the contribution from this to the final cost is XY. So you need to minimise SUM_i XiYi where Xi is the number of carpets picked up when you have Yi steps to go. You can do this by visiting the factories in increasing order of the number of carpets to be picked up at that factory. If you provide a schedule in which you pick up more carpets at A than B, and you visit A before B, I can improve it by swapping the times at which you visit A and B, so any schedule that does not follow this rule is not optimal.
This question was asked in TopCoder - SRM 577. Given 1 <= a < b <= 1000000, what is the minimum count of numbers to be inserted between a & b such that no two consecutive numbers will share a positive divisor greater than 1.
Example:
a = 2184; b = 2200. We need to insert 2 numbers 2195 & 2199 such that the condition holds true. (2184,2195,2199,2200)
a = 7; b= 42. One number is sufficient to insert between them. The number can be 11.
a = 17;b = 42. The GCD is already 1, so no need to insert any number.
Now, the interesting part is that for the given range [1,1000000] we never require more than 2 elements to be inserted between a and b. Even more, the 2 numbers are speculated to be a+1 and b-1 though it yet to be proven.
Can anyone prove this?
Can it be extended to larger range of numbers also? Say, [1,10^18] etc
Doh, sorry. The counterexample I have is
a=3199611856032532876288673657174760
b=3199611856032532876288673657174860
(Would be nice if this stupid site allowed everyone to edit its posts)
Each number has some factorization. If a, b each have a little number of distinct prime factors (DPF), and distance between them is large, it is certain there will be at least one number between them, whose set of DPF s has no elements in common with the two. So this will be our one-number pick n, such that gcd(a,n) == 1 and gcd(n,b) == 1. The higher we go, the more prime factors there are, potentially, and the probability for even gcd(a,b)==1 is higher and higher, and also for the one-num-in-between solution.
When will one-num solution not be possible? When a and b are highly-composite - have a lot of DPF s each - and are situated not too far from each other, so each intermediate number has some prime factors in common with one or two of them. But gcd(n,n+1)==1 for any n, always; so picking one of a+1 or b-1 - specifically the one with smallest amount of DPF s - will decrease the size of combined DPF set, and so picking one number between them will be possible. (... this is far from being rigorous though).
This is not a full answer, more like an illustration. Let's try this.
-- find a number between the two, that fulfills the condition
gg a b = let fs=union (fc a) (fc b)
in filter (\n-> null $ intersect fs $ fc n) [a..b]
fc = factorize
Try it:
Main> gg 5 43
[6,7,8,9,11,12,13,14,16,17,18,19,21,22,23,24,26,27,28,29,31,32,33,34,36,37,38,39
,41,42]
Main> gg 2184 2300
[2189,2201,2203,2207,2209,2213,2221,2227,2237,2239,2243,2251,2257,2263,2267,2269
,2273,2279,2281,2287,2291,2293,2297,2299]
Plenty of possibilities for just one number to pick between 5 and 43, or between 2184 and 2300. But what about the given pair, 2184 and 2200?
Main> gg 2184 2200
[]
No one number exists to put in between them. But obviously, gcd (n,n+1) === 1:
Main> gg 2185 2200
[2187,2191,2193,2197,2199]
Main> gg 2184 2199
[2185,2189,2195]
So having picked one adjacent number, we indeed have plenty of possibilities for the 2nd number. Your question is, to prove that it is always the case.
Let's look at their factorizations:
Main> mapM_ (print.(id&&&factorize)) [2184..2200]
(2184,[2,2,2,3,7,13])
(2185,[5,19,23])
(2186,[2,1093])
(2187,[3,3,3,3,3,3,3])
(2188,[2,2,547])
(2189,[11,199])
(2190,[2,3,5,73])
(2191,[7,313])
(2192,[2,2,2,2,137])
(2193,[3,17,43])
(2194,[2,1097])
(2195,[5,439])
(2196,[2,2,3,3,61])
(2197,[13,13,13])
(2198,[2,7,157])
(2199,[3,733])
(2200,[2,2,2,5,5,11])
It is obvious that the higher the range, the easier it is to satisfy the condition, because the variety of contributing prime factors is greater.
(a+1) won't always work by itself - consider 2185, 2200 case (similarly, for 2184,2199 the (b-1) won't work).
So if we happen to get two highly composite numbers as our a and b, picking an adjacent number to either one will help, because usually it will have only few factors.
This answer addresses that part of the question which asks for a proof that a subset of {a,a+1,b-1,b} will always work. The question says: “Even more, the 2 numbers are speculated to be a+1 and b-1 though it yet to be proven. Can anyone prove this?”. This answer shows that no such proof can exist.
An example that disproves that a subset of {a,a+1,b-1,b} always works is {105, 106, 370, 371} = {3·5·7, 2·53, 2·5·37, 7·53}. Let (x,y) denote gcd(x,y). For this example, (a,b)=7, (a,b-1)=5, (a+1,b-1)=2, (a+1,b)=53, so all of the sets {a,b}; {a, a+1, b}; {a,b-1,b}; and {a, a+1, b-1,b} fail.
This example is a result of the following reasoning: We want to find a,b such that every subset of {a,a+1,b-1,b} fails. Specifically, we need the following four gcd's to be greater than 1: (a,b), (a,b-1), (a+1,b-1), (a+1,b). We can do so by finding some e,f that divide even number a+1 and then construct b such that odd b is divisible by f and by some factor of a, while even b-1 is divisible by e. In this case, e=2 and f=53 (as a consequence of arbitrarily taking a=3·5·7 so that a has several small odd-prime factors).
a=3199611856032532876288673657174860
b=3199611856032532876288673657174960
appears to be a counterexample.
I have a list of elements, each one identified with a type, I need to reorder the list to maximize the minimum distance between elements of the same type.
The set is small (10 to 30 items), so performance is not really important.
There's no limit about the quantity of items per type or quantity of types, the data can be considered random.
For example, if I have a list of:
5 items of A
3 items of B
2 items of C
2 items of D
1 item of E
1 item of F
I would like to produce something like:
A, B, C, A, D, F, B, A, E, C, A, D, B, A
A has at least 2 items between occurences
B has at least 4 items between occurences
C has 6 items between occurences
D has 6 items between occurences
Is there an algorithm to achieve this?
-Update-
After exchanging some comments, I came to a definition of a secondary goal:
main goal: maximize the minimum distance between elements of the same type, considering only the type(s) with less distance.
secondary goal: maximize the minimum distance between elements on every type. IE: if a combination increases the minimum distance of a certain type without decreasing other, then choose it.
-Update 2-
About the answers.
There were a lot of useful answers, although none is a solution for both goals, specially the second one which is tricky.
Some thoughts about the answers:
PengOne: Sounds good, although it doesn't provide a concrete implementation, and not always leads to the best result according to the second goal.
Evgeny Kluev: Provides a concrete implementation to the main goal, but it doesn't lead to the best result according to the secondary goal.
tobias_k: I liked the random approach, it doesn't always lead to the best result, but it's a good approximation and cost effective.
I tried a combination of Evgeny Kluev, backtracking, and tobias_k formula, but it needed too much time to get the result.
Finally, at least for my problem, I considered tobias_k to be the most adequate algorithm, for its simplicity and good results in a timely fashion. Probably, it could be improved using Simulated annealing.
First, you don't have a well-defined optimization problem yet. If you want to maximized the minimum distance between two items of the same type, that's well defined. If you want to maximize the minimum distance between two A's and between two B's and ... and between two Z's, then that's not well defined. How would you compare two solutions:
A's are at least 4 apart, B's at least 4 apart, and C's at least 2 apart
A's at least 3 apart, B's at least 3 apart, and C's at least 4 apart
You need a well-defined measure of "good" (or, more accurately, "better"). I'll assume for now that the measure is: maximize the minimum distance between any two of the same item.
Here's an algorithm that achieves a minimum distance of ceiling(N/n(A)) where N is the total number of items and n(A) is the number of items of instance A, assuming that A is the most numerous.
Order the item types A1, A2, ... , Ak where n(Ai) >= n(A{i+1}).
Initialize the list L to be empty.
For j from k to 1, distribute items of type Ak as uniformly as possible in L.
Example: Given the distribution in the question, the algorithm produces:
F
E, F
D, E, D, F
D, C, E, D, C, F
B, D, C, E, B, D, C, F, B
A, B, D, A, C, E, A, B, D, A, C, F, A, B
This sounded like an interesting problem, so I just gave it a try. Here's my super-simplistic randomized approach, done in Python:
def optimize(items, quality_function, stop=1000):
no_improvement = 0
best = 0
while no_improvement < stop:
i = random.randint(0, len(items)-1)
j = random.randint(0, len(items)-1)
copy = items[::]
copy[i], copy[j] = copy[j], copy[i]
q = quality_function(copy)
if q > best:
items, best = copy, q
no_improvement = 0
else:
no_improvement += 1
return items
As already discussed in the comments, the really tricky part is the quality function, passed as a parameter to the optimizer. After some trying I came up with one that almost always yields optimal results. Thank to pmoleri, for pointing out how to make this a whole lot more efficient.
def quality_maxmindist(items):
s = 0
for item in set(items):
indcs = [i for i in range(len(items)) if items[i] == item]
if len(indcs) > 1:
s += sum(1./(indcs[i+1] - indcs[i]) for i in range(len(indcs)-1))
return 1./s
And here some random result:
>>> print optimize(items, quality_maxmindist)
['A', 'B', 'C', 'A', 'D', 'E', 'A', 'B', 'F', 'C', 'A', 'D', 'B', 'A']
Note that, passing another quality function, the same optimizer could be used for different list-rearrangement tasks, e.g. as a (rather silly) randomized sorter.
Here is an algorithm that only maximizes the minimum distance between elements of the same type and does nothing beyond that. The following list is used as an example:
AAAAA BBBBB CCCC DDDD EEEE FFF GG
Sort element sets by number of elements of each type in descending order. Actually only largest sets (A & B) should be placed to the head of the list as well as those element sets that have one element less (C & D & E). Other sets may be unsorted.
Reserve R last positions in the array for one element from each of the largest sets, divide the remaining array evenly between the S-1 remaining elements of the largest sets. This gives optimal distance: K = (N - R) / (S - 1). Represent target array as a 2D matrix with K columns and L = N / K full rows (and possibly one partial row with N % K elements). For example sets we have R = 2, S = 5, N = 27, K = 6, L = 4.
If matrix has S - 1 full rows, fill first R columns of this matrix with elements of the largest sets (A & B), otherwise sequentially fill all columns, starting from last one.
For our example this gives:
AB....
AB....
AB....
AB....
AB.
If we try to fill the remaining columns with other sets in the same order, there is a problem:
ABCDE.
ABCDE.
ABCDE.
ABCE..
ABD
The last 'E' is only 5 positions apart from the first 'E'.
Sequentially fill all columns, starting from last one.
For our example this gives:
ABFEDC
ABFEDC
ABFEDC
ABGEDC
ABG
Returning to linear array we have:
ABFEDCABFEDCABFEDCABGEDCABG
Here is an attempt to use simulated annealing for this problem (C sources): http://ideone.com/OGkkc.
I believe you could see your problem like a bunch of particles that physically repel eachother. You could iterate to a 'stable' situation.
Basic pseudo-code:
force( x, y ) = 0 if x.type==y.type
1/distance(x,y) otherwise
nextposition( x, force ) = coined?(x) => same
else => x + force
notconverged(row,newrow) = // simplistically
row!=newrow
row=[a,b,a,b,b,b,a,e];
newrow=nextposition(row);
while( notconverged(row,newrow) )
newrow=nextposition(row);
I don't know if it converges, but it's an idea :)
I'm sure there may be a more efficient solution, but here is one possibility for you:
First, note that it is very easy to find an ordering which produces a minimum-distance-between-items-of-same-type of 1. Just use any random ordering, and the MDBIOST will be at least 1, if not more.
So, start off with the assumption that the MDBIOST will be 2. Do a recursive search of the space of possible orderings, based on the assumption that MDBIOST will be 2. There are a number of conditions you can use to prune branches from this search. Terminate the search if you find an ordering which works.
If you found one that works, try again, under the assumption that MDBIOST will be 3. Then 4... and so on, until the search fails.
UPDATE: It would actually be better to start with a high number, because that will constrain the possible choices more. Then gradually reduce the number, until you find an ordering which works.
Here's another approach.
If every item must be kept at least k places from every other item of the same type, then write down items from left to right, keeping track of the number of items left of each type. At each point put down an item with the largest number left that you can legally put down.
This will work for N items if there are no more than ceil(N / k) items of the same type, as it will preserve this property - after putting down k items we have k less items and we have put down at least one of each type that started with at ceil(N / k) items of that type.
Given a clutch of mixed items you could work out the largest k you can support and then lay out the items to solve for this k.