How can i define a subset in CPLEX? - set

I have Set1 that goes 1-10 and Set2 that goes 0-11
I want my Set2 to be subset of Set1 because i have a variable which is Y[Set1][Set1] and it is not beneficial to Set1 to become subset for my variables.

Related

Way to Calculate Distinct Partitions using Subsets of a Set containing only one Kind of element

We know 3 things
n(number of elements in the set)
k(no. of parts)
set s= {x,x,x,x,...,x(n times)} (here X can have any possible integral value)
we have to find the result as a number which will holds the value of number of distinct partitions possible of the set S.
Is there any kind way(formula / procedure) to find the result using given values?
EXAMPLES:
Input: n = 3, k = 2
Output: 4
Explanation: Let the set be {0,0,0} (assuming x=0), we can partition
it into 2 subsets in following ways
{{0,0}, {0}}, {{0}, {0,0}}, {{0,0,0},{}}
{{},{0,0,0}}.
further, see {{0,0}, {0}} is made up of 2 subsets namely {0,0}
and {0} And it has x(=0) used exactly n(=3) times
Input: n = 3, k = 1
Output: 1
Explanation: There is only one way {{1, 1, 1}} (assuming x=1)
Note:
I know I used word Set in the problem. but a set is defined as collection of distinct elements. So you can either consider it a Multiset, an array or You can assume a set can hold same elements for this particular problem.
I am just trying to use Same terminology as that in the problem.

Optimization - distribute participants far from each other

This is my first question. I tried to find an answer for 2 days but I couldn't find what I was looking for.
Question: How can I minimize the amount of matches between students from the same school
I have a very practical case, I need to arrange a competition (tournament bracket)
but some of the participants might come from the same school.
Those from the same school should be put as far as possible from each other
for example: {A A A B B C} => {A B}, {A C}, {A B}
if there are more than half participants from one school, then there would be no other way but to pair up 2 guys from the same school.
for example: {A A A A B C} => {A B}, {A C}, {A A}
I don't expect to get code, just some keywords or some pseudo code on what you think would be a way of making this would be of great help!
I tried digging into constraint resolution algorithms and tournament bracket algorithms, but they don't consider minimising the amount of matches between students from same school.
Well, thank you so much in advance!
A simple algorithm (EDIT 2)
From the comments below: you have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
The idea
Sort the students by school, the schools with the more students before the ones with the less students. e.g A B B B B C C -> B B B B C C A.
Distribute the students in two groups A and B as in a war card game: 1st student in A, 2nd student in B, 3rd student in A, 4th student in B, ...
Continue with groups A and B.
You have a recursion: the position of a player in the level k-1 (k=n-1 to 0) is ((pos at level k) % 2) * 2^k + (pos at level k) // 2 (every even goes to the left, every odd goes to the right)
Python code
Sort array by number of schools:
assert 2**math.log2(len(players)) == len(players) # n is the number of rounds
c = collections.Counter([p.school for p in players])
players_sorted_by_school_count = sorted(players, key=lambda p:-c[p.school])
Find the final position of every player:
players_sorted_for_tournament = [-1] * 2**n
for j, player in enumerate(players_sorted_by_school_count):
pos = 0
for e in range(n-1,-1,-1):
if j % 2 == 1:
pos += 2**e # to the right
j = j // 2
players_sorted_for_tournament[pos] = player
This should give groups that are diverse enough, but I'm not sure whether it's optimal or not. Waiting for comments.
First version: how to make pairs from students of different schools
Just put the students from a same school into a stack. You have as many stack as schools. Now, sort your stacks by number of students. In your first example {A A A B B C}, you get:
A
A B
A B C
Now, take the two top elements from the two first stacks. The stack sizes have changed: if needed, reorder the stacks and continue. When you have only one stack, make pairs from this stack.
The idea is to keep as many "schools-stacks" as possible as long as possible: you spare the students of small stacks until you have no choice but to take them.
Steps with your second example, {A A A A B C}:
A
A
A
A B C => output A, B
A
A
A C => output A, C
A
A => output A A
It's a matching problem (EDIT 1)
I elaborate on the comments below. You have a single elimination tournament. You must choose the places of the players in the tournament bracket. If you look at your bracket, you see: players, but also pairs of players (players that play the match 1 against each other), pairs of pairs of players (winner of pair 1 against winner of pair 2 for the match 2), and so on.
Your solution is to start with the set of all players and split it into two sets that are as diverse a possible. "Diverse" means here: the maximum number of different schools. To do so, you check all possible combinations of elements that split the set into two subsets of equals size. Then you perform recursively the same operation on those sets, until you arrive to the player level.
Another idea is to start with players and try to make pairs with other players from other school. Let's define a distance: 1 if two players are in the same school, 0 if they are in a different school. You want to make pairs with the minimum global distance.
This distance may be generalized for the pairs of players: take the number of common schools. That is: A B A B -> 2 (A & B), A B A C -> 1 (A), A B C D -> 0. You can imagine the distance between two sets (players, pairs, pairs of pairs, ...): the number of common schools. Now you can see this as a graph whose vertices are the sets (players, pairs, pairs of pairs, ...) and whose edges connect every pair of vertices with a weight that is the distance defined above. You are looking for a perfect matching (all vertices are matched) with a minimum weight.
The blossom algorithm or some of its variants seems to fit your needs, but it's probably overkill if the number of players is limited.
Create a two-dimensional array, where the first dimension will be for each school and the second dimension will be for each participant in this take-off.
Load them and you'll have everything you need linearly.
For example:
School 1 ------- Schol 2 -------- School 3
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B ------------- C
A ------------ B
A ------------ B
A
A
In the example above, we will have 3 schools (first dimension), with school 1 having 7 participants (second dimension), school 2 having 5 participants and school 3 having 3 participants.
You can also create a second array containing the resulting combinations and, for each chosen pair, delete this pair from the initial array in a loop until it is completely empty and the result array is completely full.
I think the algorithm in this answer could help.
Basically: group the students by school, and use the error tracking idea behind Bresenham's Algorithm to distribute the schools as far apart as possible. Then you pull out pairs from the list.

Given a set and a pairwise xor set, find the second set?

We take 2 sets of integers to generate a third set with contains the xor of every element in the first set with every element in the second set.
Now as a problem we have been given the first set and the third set i.e. the set with xors, and we need to generate the second set.
It is guaranteed that there is only one possible answer for the inputs
For Example:
(using binary here for clarity)
Inputs:
Set1: {101, 111}
Set3: {001, 011}
then Set2, the solution set will be
Set2: {110, 100}
as, if we do Set1 ^ Set2
{011, 001, 001, 011}
Important Points:
the inputs are sets, not arrays, so no repetitions
that doesn't mean that there weren't repetitions when set3 was created, a^d may be equal to b^c
there is no size constraint that set1 and set2 have to be of same size.
Also, my test case isn't that great as in this it looks like we can simply do set1^set3 to get the answer, but that is clearly not the correct way.
if you have set1 = (a, b) and set2 = (c, d)
Then you are provided with set3 = (a^c, a^d, b^c, b^d) and need to to find set2.
So we know that first element in set 3 is a^c, so to find c, we will do (a^c)^a => c
Then to find d, we will pick b^d from set3 and (b^d)^b(from set1) and we will get d.

Determine conflict-free sets?

Suppose you have a bunch of sets, whereas each set has a couple of subsets.
Set1 = { (banana, pineapple, orange), (apple, kale, cucumber), (onion, garlic) }
Set2 = { (banana, cucumber, garlic), (avocado, tomato) }
...
SetN = { ... }
The goal now is to select one subset from each set, whereas each subset must be conflict free with any other selected subset. For this toy-size example, a possible solution would be to select (banana, pineapple, orange) (from Set1) and (avocado, tomato) (from Set2).
A conflict would occur, if one would select the first subset of Set1 and Set2 because the banana would be contained in both subsets (which is not possible because it exists only once).
Even though there are many algorithms, I was unable to select a suitable algorithm. I'm somehow stuck and would appreciate answers targeting the following questions:
1) How to find a suitable algorithm and represent this problem in such a way that it can be processed by the algorithm?
2) How a possible solution for this toy-size example may look like (any language is just fine, I just want to get the idea).
Edit1: I was thinking about simulated annealing, too (return one possible solution). This could be of interest to minimize, e.g., the overall cost of selecting the sets. However, I could not figure out how to make an appropriate problem description that takes the 'conflicts' into account.
This problem can be formulated as a generalized exact cover problem.
Create a new atom for each set of sets (Set1, Set2, etc.) and turn your input into an instance like so:
{Set1, banana, pineapple, orange}
{Set1, apple, kale, cucumber}
{Set1, onion, garlic}
{Set2, banana, cucumber, garlic}
{Set2, avocado, tomato}
...
making the Set* atoms primary (covered exactly once) and the other atoms secondary (covered at most once). Then you can solve it with a generalization of Knuth's Algorithm X.
Looking at the list of sets, I had the image of a maze with multiple entrances. The task is akin to tracing paths from top to bottom that are free of subset-intersections. The example in Haskell picks all entrances, and tries each path, returning those that succeed.
My understanding of how the code works (algorithm):
For each subset in the first set, pick each subset in the next set where the intersection of that subset with each of the subsets in the accumulated result is null. If there are no subsets matching the criteria, break that strain of the loop. If there are no sets left to pick from, return that result. Call the function recursively for all chosen subsets (and corresponding accumulating-results).
import Data.List (intersect)
import Control.Monad (guard)
sets = [[["banana", "pineapple", "orange"], ["apple", "kale", "cucumber"], ["onion", "garlic"]]
,[["banana", "cucumber", "garlic"], ["avocado", "tomato"]]]
solve sets = solve' sets [] where
solve' [] result = [result]
solve' (set:rest) result = do
subset <- set
guard (all null (map (intersect subset) result))
solve' rest (result ++ [subset])
OUTPUT:
*Main> solve sets
[[["banana","pineapple","orange"],["avocado","tomato"]]
,[["apple","kale","cucumber"],["avocado","tomato"]]
,[["onion","garlic"],["avocado","tomato"]]]

ideas for algorithm? sorting a list randomly with emphasis on variety

I have a table of items with [ID, ATTR1, ATTR2, ATTR3]. I'd like to select about half of the items, but try to get a random result set that is NOT clustered. In other words, there's a fairly even spread of ATTR1 values, ATTR2 values, and ATTR3 values. This does NOT necessarily represent the data as a whole, in other words, the total table may be generally concentrated on certain attribute values, but I'd like to select a subset with more variety. The attributes are not inter-related, so there's not really a correlation between ATTR1 and ATTR2.
As an example, imagine ATTR1 = "State". I'd like each line item in my subset to be from a different state, even if in the whole set, most of my data is concentrated on a few states. And for this to simultaneously be true of the other 2 attributes, too. (I realize that some tables might not make this possible, but there's enough data that it's unlikely to have no solution)
Any ideas for an efficient algorithm? Thanks! I don't really even know how to search for this :)
(by the way, it's OK if this requires pre-calculation or -indexing on the whole set, so long as I can draw out random varied subsets quickly)
Interesting problem. Since you want about half of the list, how about this:-
Create a list of half the values chosen entirely at random. Compute histograms for the value of ATTR1, ATTR2, ATTR3 for each of the chosen items.
:loop
Now randomly pick an item that's in the current list and an item that isn't.
If the new item increases the 'entropy' of the number of unique attributes in the histograms, keep it and update the histograms to reflect the change you just made.
Repeat N/2 times, or more depending on how much you want to force it to move towards covering every value rather than being random. You could also use 'simulated annealing' and gradually change the probability to accepting the swap - starting with 'sometimes allow a swap even if it makes it worse' down to 'only swap if it increases variety'.
I don't know (and I hope someone who does will answer). Here's what comes to mind: make up a distribution for MCMC putting the most weight on the subsets with 'variety'.
Assuming the items in your table are indexed by some form of id, I would in a loop, iterate through half of the items in your table, and use a random number generator to get the number.
IMHO
Finding variety is difficult but generating it is easy.
So we can generate variety of combinations and
then seach the table for records with those combinations.
If the table is sorted then searching also becomes easy.
Sample python code:
d = {}
d[('a',0,'A')]=0
d[('a',1,'A')]=1
d[('a',0,'A')]=2
d[('b',1,'B')]=3
d[('b',0,'C')]=4
d[('c',1,'C')]=5
d[('c',0,'D')]=6
d[('a',0,'A')]=7
print d
attr1 = ['a','b','c']
attr2 = [0,1]
attr3 = ['A','B','C','D']
# no of items in
# attr2 < attr1 < attr3
# ;) reason for strange nesting of loops
for z in attr3:
for x in attr1:
for y in attr2:
k = (x,y,z)
if d.has_key(k):
print '%s->%s'%(k,d[k])
else:
print k
Output:
('a', 0, 'A')->7
('a', 1, 'A')->1
('b', 0, 'A')
('b', 1, 'A')
('c', 0, 'A')
('c', 1, 'A')
('a', 0, 'B')
('a', 1, 'B')
('b', 0, 'B')
('b', 1, 'B')->3
('c', 0, 'B')
('c', 1, 'B')
('a', 0, 'C')
('a', 1, 'C')
('b', 0, 'C')->4
('b', 1, 'C')
('c', 0, 'C')
('c', 1, 'C')->5
('a', 0, 'D')
('a', 1, 'D')
('b', 0, 'D')
('b', 1, 'D')
('c', 0, 'D')->6
('c', 1, 'D')
But assuming your table is very big (otherwise why would you need algorithm ;) and data is fairly uniformly distributed there will be more hits in actual scenario. In this dummy case there are too many misses which makes algorithm look inefficient.
Let's assume that ATTR1, ATTR2, and ATTR3 are independent random variables (over a uniform random item). (If ATTR1, ATTR2, and ATTR3 are only approximately independent, then this sample should be approximately uniform in each attribute.) To sample one item (VAL1, VAL2, VAL3) whose attributes are uniformly distributed, choose VAL1 uniformly at random from the set of values for ATTR1, choose VAL2 uniformly at random from the set of values for ATTR2 over items with ATTR1 = VAL1, choose VAL3 uniformly at random from the set of values for ATTR3 over items with ATTR1 = VAL1 and ATTR2 = VAL2.
To get a sample of distinct items, apply the above procedure repeatedly, deleting each item after it is chosen. Probably the best way to implement this would be a tree. For example, if we have
ID ATTR1 ATTR2 ATTR3
1 a c e
2 a c f
3 a d e
4 a d f
5 b c e
6 b c f
7 b d e
8 b d f
9 a c e
then the tree is, in JavaScript object notation,
{"a": {"c": {"e": [1, 9], "f": [2]},
"d": {"e": [3], "f": [4]}},
"b": {"c": {"e": [5], "f": [6]},
"d": {"e": [7], "f": [8]}}}
Deletion is accomplished recursively. If we sample id 4, then we delete it from its list at the leaf level. This list empties, so we delete the entry "f": [] from tree["a"]["d"]. If we now delete 3, then we delete 3 from its list, which empties, so we delete the entry "e": [] from tree["a"]["d"], which empties tree["a"]["d"], so we delete it in turn. In a good implementation, each item should take time O(# of attributes).
EDIT: For repeated use, reinsert the items into the tree after the whole sample is collected. This doesn't affect the asymptotic running time.
Idea #2.
Compute histograms for each attribute on the original table.
For each item compute it's uniqueness score = p(ATTR1) x p(ATTR2) x p(ATTR3) (multiply the probabilities for each attribute it has).
Sort by uniqueness.
Chose a probability distribution curve for your random numbers ranging from picking only values in the first half of the set (a step function) to picking values evenly over the entire set (a flat line). Maybe a 1/x curve might work well for you in this case.
Pick values from the sorted list using your chosen probability curve.
This allows you to bias it towards more unique values or towards more evenness just by adjusting the probability curve you use to generate the random numbers.
Taking over your example, assign every possible 'State' a numeric value (say, between 1 and 9). Do the same for the other attributes.
Now, assuming you don't have more than 10 possible values for each attribute, multiply the values for ATTR3 for 100, ATTR2 for 1000, ATTR1 for 10000. Add up the results, you will end up with what can resemble a vague hash of the item. Something like
10,000 * |ATTR1| + 1000 * |ATTR2| + 100 * |ATTR3|
the advantage here is that you know that values between 10000 and 19000 have the same 'State' value; in other words, the first digit represents ATTR1. Same for ATTR2 and the other attributes.
You can sort all values and using something like bucket-sort pick one for each type, checking that the digit you're considering hasn't been picked already.
An example: if your end values are
A: 15,700 = 10,000 * 1 + 1,000 * 5 + 100 * 7
B: 13,400 = 10,000 * 1 + 1,000 * 3 + 100 * 4
C: 13,200 = ...
D: 12,300
E: 11,400
F: 10,900
you know that all your values have the same ATTR1; 2 have the same ATTR2 (that being B and C); and 2 have the same ATTR3 (B, E).
This, of course, assuming I understood correctly what you want to do. It's saturday night, afterall.
ps: yes, I could have used '10' as the first multiplier but the example would have been messier; and yes, it's clearly a naive example and there are lots of possible optimizations here, which are left as an exercise to the reader
It's a very interesting problem, for which I can see a number of applications. Notably for testing software: you get many 'main-flow' transactions, but only one is necessary to test that it works and you would prefer when selecting to get an extremely varied sample.
I don't think you really need a histogram structure, or at least only a binary one (absent/present).
{ ATTR1: [val1, val2], ATTR2: [i,j,k], ATTR3: [1,2,3] }
This is used in fact to generate a list of predicates:
Predicates = [ lambda x: x.attr1 == val1, lambda x: x.attr1 == val2,
lambda x: x.attr2 == i, ...]
This list will contain say N elements.
Now you wish to select K elements from this list. If K is less than N it's fine, otherwise we will duplicate the list i times, so that K <= N*i and with i minimal of course, so i = ceil(K/N) (note that it works although if K <= N, with i == 1).
i = ceil(K/N)
Predz = Predicates * i # python's wonderful
And finally, pick up a predicate there, and look for an element that satisfies it... that's where randomness actually hits and I am less than adequate here.
Two remarks:
if K > N you may be willing to actually select i-1 times each predicate and then select randomly from the list of predicates only to top off your selection. Thus ensuring the over representation of even the least common elements.
the attributes are completely uncorrelated this way, you may be willing to select patterns as you could never get the tuple (1,2,3) by selecting on the third element being 3, so perhaps a refinement would be to group some related attributes together, though it would probably increase the number of predicates generated
for efficiency reasons, you should have the table by the predicate category if you wish to have an efficient select.

Resources