Cowpatibility USACO - algorithm

I'm confused with the USACO Cowpatibility solution explanation and code.
The problem is defined here: http://usaco.org/index.php?page=viewproblem2&cpid=862.
Their solution is defined here: http://usaco.org/current/data/sol_cowpatibility_gold_dec18.html.
I know their linear solution requires the property of inclusion and exclusion (PIE), and I understand this property, but I am confused about how they implemented it. I'm looking for an explanation of these lines:
"This motivates the following inclusion-exclusion solution: for every subset of flavors, count how many pairs of cows that like all flavors within each subset. We add all the counts for subsets of size 1, then to avoid double-counting, we subtract all the counts for subsets of size 2. We then add all the counts of subsets of size 3, subtract all the counts of subsets of size 4, and add the counts of subsets of size 5."
How do they determine every possible subset, and what are these subsets? Why are there only 31N subsets? It would also be helpful if someone gives examples of what the subsets would be for their sample case.

They generated and stored subsets in order to keep track of the number of pairs of cows with 1 flavor in common, 2 flavors in common, 3 flavors in common, 4 flavors in common, and all 5 flavors in common. To do this, they used a map.
Now, there are 31N subsets because for each cow, you can create 31 combinations of favorite flavors. For example, Cow 1's favorite flavors of ice cream were 1, 2, 3, 4, 5. So the different subsets were:
{1, 0, 0, 0, 0} {1, 3, 0, 0, 0} {2, 5, 0, 0, 0} {1, 2, 5, 0, 0}
{1, 2, 0, 0, 0} {1, 4, 0, 0, 0} {1, 3, 4, 0, 0} {2, 3, 5, 0, 0}
{1, 2, 3, 0, 0} {1, 5, 0, 0, 0} {1, 3, 5, 0, 0} {3, 4, 5, 0, 0}
{1, 2, 3, 4, 0} {2, 3, 0, 0, 0} {2, 3, 4, 0, 0} {1, 2, 3, 5, 0}
{1, 2, 3, 4, 5} {2, 4, 0, 0, 0} {2, 4, 5, 0, 0} {1, 3, 4, 5, 0}
{2, 3, 4, 5, 0} {2, 0, 0, 0, 0} {3, 0, 0, 0, 0} {4, 0, 0, 0, 0}
{5, 0, 0, 0, 0} {3, 4, 0, 0, 0} {3, 5, 0, 0, 0} {4, 5, 0, 0, 0}
{1, 4, 5, 0, 0} {1, 2, 4, 0, 0} {1, 2, 4, 5, 0}
As you can see, there are 31 subsets. (This is because there are 2^5 = 32 sets that can be made, including an empty set. 32 - 1 = 31.) Since N ≤ 50,000, you can generate 31N subsets. After scanning through the input, the code generated the subsets for each cow and added them to a map:
map<S5, int> subsets;
They mapped each combination to the number of times it was seen. Some examples of entries for the sample input would be:
{
[{1, 0, 0, 0, 0}, 2], # 2 cows, Cow 1 and Cow 2 both like flavor 1
[{8, 10, 0, 0, 0}, 2], # 2 cows, Cow 2 and Cow 3 both like flavors 8 and 10
[{50, 60, 80, 0, 0}, 1], # 1 cow, Cow 4 liked flavors 50, 60, 80
# and so on...
}
Finally, based the number of nonzero numbers in the subset, the algorithm applies the inclusion-exclusion principle. It simply iterates through all 31N subsets, and either adds or subtracts the count stored in the map for that subset. (If it was 1, 3, or 5 nonzero numbers the counts were added; else they were subtracted.) It then subtracts this answer from N * (N-1) / 2 to output the number of pairs of cows that aren't compatible.
I hope this explanation helps! Good luck for future contests!

There are 31N distinct subsets because each cow has five possible flavor choices. Specifically, this line explains the subsets:
we can explicitly generate all the subsets of flavors where at least
one cow likes all the flavors in that subset
The way to do that is to iterate over all N cows then construct the power set of flavors that they like, excluding the empty set. There are 2^5 sets in the power set, so removing the empty set results in 31. Therefore, there are 31N sets total.
An example is quite helpful here, taking the sample input:
4
1 2 3 4 5 # Cow 0
1 2 3 10 8 # Cow 1
10 9 8 7 6 # Cow 2
50 60 70 80 90 # Cow 3
The subsets will be:
{
{1}, {1, 2}, {1, 3}, ..., {2, 3, 4, 5}, {1, 2, 3, 4, 5}, # Cow 0
{1}, {1, 2}, {1, 3}, ..., {2, 3, 10, 8}, {1, 2, 3, 10, 8}, # Cow 1
...
}
Each cow generates 31 subsets. From there, the algorithm counts the number of cows that generate a specific subset (for example, note that {1} is generated by both cow 0 and 1, we just keep track of how many cows generate each subset), and applies inclusion-exclusion based on the subset size.
Nice problem, I used to do USACO and they had really interesting problems which still stand out amongst the "clever" interview questions a lot of companies give. :)

Related

I have to generate 10000 numbers which are the result of 12 random numbers subtracted by 0.5 and then added together

I'm stuck here, and can't generate a do loop that repeats this operation 10k times and result in a list or array
{((RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[]],
12])) - 0.5}
Do does things but it doesn't generate output. E.g.
Do[{1, 2, 3}, 2]
- no output -
Have it add to a list though...
alist = {};
Do[alist = Join[alist, {1, 2, 3}], 2]
alist
{1, 2, 3, 1, 2, 3}

What is the fastest way to join multiple subsets that have similar elements?

I have a list with 500+ thousand subsets each having from 1 to 500 values (integers). So i have something like:
{1, 2, 3 }
{2, 3}
{4, 5}
{3, 6, 7}
{7, 9}
{8, 4}
{10, 11}
After running the code I would like to get:
{1, 2, 3, 6, 7, 9}
{4, 5, 8}
{10, 11}
I wrote simple code [here] that compares each subset to each subset, if they intersect they are joined together, else not.
It is ok on small scale, but with big amount of data it takes forever.
Please, could you advise any improvements?
P.S. I am not strong in maths or logics, big O notation would be greek for me. I am sorry.
You're trying to find the connected components in a graph, with each of your input sets representing a set of nodes that's fully connected. Here's a simple implementation:
sets = [{1, 2, 3 },{2, 3},{4, 5},{3, 6, 7},{7, 9},{8, 4},{10, 11}]
allelts = set.union(*sets)
components = {X: {X} for X in allelts}
component = {X: X for X in allelts}
for S in sets:
comp = sorted({component[X] for X in S})
mergeto = comp[0]
for mergefrom in comp[1:]:
components[mergeto] |= components[mergefrom]
for X in components[mergefrom]:
component[X] = mergeto
del components[mergefrom]
That results in components having a list of components (keyed by their minimum element), and component storing the components for each element:
>>> print(components)
{1: {1, 2, 3, 6, 7, 9}, 4: {8, 4, 5}, 10: {10, 11}}
>>> print(component)
{1: 1, 2: 1, 3: 1, 4: 4, 5: 4, 6: 1, 7: 1, 8: 4, 9: 1, 10: 10, 11: 10}
>>>

Definition lookup speed: a performance issue

I have the following problem.
I need to build a very large number of definitions(*) such as
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
f[{0,0,1,0}] = 3
f[{0,0,0,1}] = 2
...
f[{2,3,1,2}] = 4
...
f[{n1,n2,n3,n4}] = some integer
...
This is just an example. The length of the argument list does not need to be 4 but can be anything.
I realized that the lookup for each value slows down with exponential complexity when the length of the argument list increases. Perhaps this is not so strange, since it is clear that in principle there is a combinatorial explosion in how many definitions Mathematica needs to store.
Though, I have expected Mathematica to be smart and that value extract should be constant time complexity. Apparently it is not.
Is there any way to speed up lookup time? This probably has to do with how Mathematica internally handles symbol definition lookups. Does it phrases the list until it finds the match? It seems that it does so.
All suggestions highly appreciated.
With best regards
Zoran
(*) I am working on a stochastic simulation software that generates all configurations of a system and needs to store how many times each configuration occurred. In that sense a list {n1, n2, ..., nT} describes a particular configuration of the system saying that there are n1 particles of type 1, n2 particles of type 2, ..., nT particles of type T. There can be exponentially many such configurations.
Could you give some detail on how you worked out that lookup time is exponential?
If it is indeed exponential, perhaps you could speed things up by using Hash on your keys (configurations), then storing key-value pairs in a list like {{key1,value1},{key2,value2}}, kept sorted by key and then using binary search (which should be log time). This should be very quick to code up but not optimum in terms of speed.
If that's not fast enough, one could think about setting up a proper hashtable implementation (which I thought was what the f[{0,1,0,1}]=3 approach did, without having checked).
But some toy example of the slowdown would be useful to proceed further...
EDIT: I just tried
test[length_] := Block[{f},
Do[
f[RandomInteger[{0, 10}, 100]] = RandomInteger[0, 10];,
{i, 1, length}
];
f[{0, 0, 0, 0, 1, 7, 0, 3, 7, 8, 0, 4, 5, 8, 0, 8, 6, 7, 7, 0, 1, 6,
3, 9, 6, 9, 2, 7, 2, 8, 1, 1, 8, 4, 0, 5, 2, 9, 9, 10, 6, 3, 6,
8, 10, 0, 7, 1, 2, 8, 4, 4, 9, 5, 1, 10, 4, 1, 1, 3, 0, 3, 6, 5,
4, 0, 9, 5, 4, 6, 9, 6, 10, 6, 2, 4, 9, 2, 9, 8, 10, 0, 8, 4, 9,
5, 5, 9, 7, 2, 7, 4, 0, 2, 0, 10, 2, 4, 10, 1}] // timeIt
]
with timeIt defined to accurately time even short runs as follows:
timeIt::usage = "timeIt[expr] gives the time taken to execute expr,
repeating as many times as necessary to achieve a total time of \
1s";
SetAttributes[timeIt, HoldAll]
timeIt[expr_] := Module[{t = Timing[expr;][[1]], tries = 1},
While[t < 1.,
tries *= 2;
t = Timing[Do[expr, {tries}];][[1]];
];
Return[t/tries]]
and then
out = {#, test[#]} & /# {10, 100, 1000, 10000, 100000, 100000};
ListLogLogPlot#out
(also for larger runs). So it seems constant time here.
Suppose you enter your information not like
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
but into a n1 x n2 x n3 x n4 matrix m like
m[[2,1,1,1]] = 1
m[[1,2,1,1]] = 2
etc.
(you could even enter values not as f[{1,0,0,0}]=1, but as f[{1,0,0,0},1] with
f[li_List, i_Integer] := Part[m, Apply[Sequence, li + 1]] = i;
f[li_List] := Part[m, Apply[Sequence, li + 1]];
where you have to initialize m e.g. by m = ConstantArray[0, {4, 4, 4, 4}];)
Let's compare timings:
testf[z_] :=
(
Do[ f[{n1, n2, n3, n4}] = RandomInteger[{1,100}], {n1,z}, {n2,z}, {n3,z},{n4,z}];
First[ Timing[ Do[ f[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z} ] ] ]
);
Framed[
ListLinePlot[
Table[{z, testf[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"DownValue approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"},ImageSize -> 500
]
]
Clear[f];
testf2[z_] :=
(
m = RandomInteger[{1, 100}, {z, z, z, z}];
f2[ni__Integer] := m[[Sequence ## ({ni} + 1)]];
First[ Timing[ Do[ f2[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z}] ] ]
)
Framed[
ListLinePlot[
Table[{z, testf2[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"Matrix approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"}, ImageSize -> 500
]
]
gives
So for larger sets up information a matrix approach seems clearly preferrable.
Of course, if you have truly large data, say more GB than you have RAM, then you just
have to use a database and DatabaseLink.

Maximize function in Mathematica which counts elements

I reduced a debugging problem in Mathematica 8 to something similar to the following code:
f = Function[x,
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Count[list, x]
];
f[4]
Maximize{f[x], x, Integers]
Output:
4
{0, {x->0}}
So, while the maximum o function f is obtained when x equals 4 (as confirmed in the first output line), why does Maximize return x->0 (output line 2)?
The reason for this behavior can be easily found using Trace. What happens is that your function is evaluated inside Maximize with still symbolic x, and since your list does not contain symbol x, results in zero. Effectively, you call Maximize[0,x,Integers], hence the result. One thing you can do is to protect the function from immediate evaluation by using pattern-defined function with a restrictive pattern, like this for example:
Clear[ff];
ff[x_?IntegerQ] :=
With[{list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5}}, Count[list, x]]
It appears that Maximize can not easily deal with it however, but NMaximize can:
In[73]:= NMaximize[{ff[x], Element[x, Integers]}, x]
Out[73]= {4., {x -> 4}}
But, generally, either of the Maximize family functions seem not quite appropriate for the job. You may be better off by explicitly computing the maximum, for example like this:
In[78]:= list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Extract[#, Position[#, Max[#], 1, 1] &[#[[All, 2]]]] &[Tally[list]]
Out[79]= {{4, 4}}
HTH
Try this:
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
First#Sort[Tally[list], #1[[2]] > #2[[2]] &]
Output:
{4, 4}

Permutations distinct under given symmetry (Mathematica 8 group theory)

Given a list of integers like {2,1,1,0} I'd like to list all permutations of that list that are not equivalent under given group. For instance, using symmetry of the square, the result would be {{2, 1, 1, 0}, {2, 1, 0, 1}}.
Approach below (Mathematica 8) generates all permutations, then weeds out the equivalent ones. I can't use it because I can't afford to generate all permutations, is there a more efficient way?
Update: actually, the bottleneck is in DeleteCases. The following list {2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 0, 0, 0} has about a million permutations and takes 0.1 seconds to compute. Apparently there are supposed to be 1292 orderings after removing symmetries, but my approach doesn't finish in 10 minutes
removeEquivalent[{}] := {};
removeEquivalent[list_] := (
Sow[First[list]];
equivalents = Permute[First[list], #] & /# GroupElements[group];
DeleteCases[list, Alternatives ## equivalents]
);
nonequivalentPermutations[list_] := (
reaped = Reap#FixedPoint[removeEquivalent, Permutations#list];
reaped[[2, 1]]
);
group = DihedralGroup[4];
nonequivalentPermutations[{2, 1, 1, 0}]
What's wrong with:
nonequivalentPermutations[list_,group_]:= Union[Permute[list,#]& /# GroupElements[group];
nonequivalentPermutations[{2,1,1,0},DihedralGroup[4]]
I don't have Mathematica 8, so I can't test this. I just have Mathematica 7.
I got an elegant and fast solution from Maxim Rytin, relying on ConnectedComponents function
Module[{gens, verts, edges},
gens = PermutationList /# GroupGenerators#DihedralGroup[16];
verts =
Permutations#{2, 2, 2, 2, 2, 2, 2, 1, 1, 0, 0, 0, 0, 0, 0, 0};
edges = Join ## (Transpose#{verts, verts[[All, #]]} &) /# gens;
Length#ConnectedComponents#Graph[Rule ### Union#edges]] // Timing

Resources