I have to generate 10000 numbers which are the result of 12 random numbers subtracted by 0.5 and then added together - random

I'm stuck here, and can't generate a do loop that repeats this operation 10k times and result in a list or array
{((RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[]],
12])) - 0.5}

Do does things but it doesn't generate output. E.g.
Do[{1, 2, 3}, 2]
- no output -
Have it add to a list though...
alist = {};
Do[alist = Join[alist, {1, 2, 3}], 2]
alist
{1, 2, 3, 1, 2, 3}

Related

Cowpatibility USACO

I'm confused with the USACO Cowpatibility solution explanation and code.
The problem is defined here: http://usaco.org/index.php?page=viewproblem2&cpid=862.
Their solution is defined here: http://usaco.org/current/data/sol_cowpatibility_gold_dec18.html.
I know their linear solution requires the property of inclusion and exclusion (PIE), and I understand this property, but I am confused about how they implemented it. I'm looking for an explanation of these lines:
"This motivates the following inclusion-exclusion solution: for every subset of flavors, count how many pairs of cows that like all flavors within each subset. We add all the counts for subsets of size 1, then to avoid double-counting, we subtract all the counts for subsets of size 2. We then add all the counts of subsets of size 3, subtract all the counts of subsets of size 4, and add the counts of subsets of size 5."
How do they determine every possible subset, and what are these subsets? Why are there only 31N subsets? It would also be helpful if someone gives examples of what the subsets would be for their sample case.
They generated and stored subsets in order to keep track of the number of pairs of cows with 1 flavor in common, 2 flavors in common, 3 flavors in common, 4 flavors in common, and all 5 flavors in common. To do this, they used a map.
Now, there are 31N subsets because for each cow, you can create 31 combinations of favorite flavors. For example, Cow 1's favorite flavors of ice cream were 1, 2, 3, 4, 5. So the different subsets were:
{1, 0, 0, 0, 0} {1, 3, 0, 0, 0} {2, 5, 0, 0, 0} {1, 2, 5, 0, 0}
{1, 2, 0, 0, 0} {1, 4, 0, 0, 0} {1, 3, 4, 0, 0} {2, 3, 5, 0, 0}
{1, 2, 3, 0, 0} {1, 5, 0, 0, 0} {1, 3, 5, 0, 0} {3, 4, 5, 0, 0}
{1, 2, 3, 4, 0} {2, 3, 0, 0, 0} {2, 3, 4, 0, 0} {1, 2, 3, 5, 0}
{1, 2, 3, 4, 5} {2, 4, 0, 0, 0} {2, 4, 5, 0, 0} {1, 3, 4, 5, 0}
{2, 3, 4, 5, 0} {2, 0, 0, 0, 0} {3, 0, 0, 0, 0} {4, 0, 0, 0, 0}
{5, 0, 0, 0, 0} {3, 4, 0, 0, 0} {3, 5, 0, 0, 0} {4, 5, 0, 0, 0}
{1, 4, 5, 0, 0} {1, 2, 4, 0, 0} {1, 2, 4, 5, 0}
As you can see, there are 31 subsets. (This is because there are 2^5 = 32 sets that can be made, including an empty set. 32 - 1 = 31.) Since N ≤ 50,000, you can generate 31N subsets. After scanning through the input, the code generated the subsets for each cow and added them to a map:
map<S5, int> subsets;
They mapped each combination to the number of times it was seen. Some examples of entries for the sample input would be:
{
[{1, 0, 0, 0, 0}, 2], # 2 cows, Cow 1 and Cow 2 both like flavor 1
[{8, 10, 0, 0, 0}, 2], # 2 cows, Cow 2 and Cow 3 both like flavors 8 and 10
[{50, 60, 80, 0, 0}, 1], # 1 cow, Cow 4 liked flavors 50, 60, 80
# and so on...
}
Finally, based the number of nonzero numbers in the subset, the algorithm applies the inclusion-exclusion principle. It simply iterates through all 31N subsets, and either adds or subtracts the count stored in the map for that subset. (If it was 1, 3, or 5 nonzero numbers the counts were added; else they were subtracted.) It then subtracts this answer from N * (N-1) / 2 to output the number of pairs of cows that aren't compatible.
I hope this explanation helps! Good luck for future contests!
There are 31N distinct subsets because each cow has five possible flavor choices. Specifically, this line explains the subsets:
we can explicitly generate all the subsets of flavors where at least
one cow likes all the flavors in that subset
The way to do that is to iterate over all N cows then construct the power set of flavors that they like, excluding the empty set. There are 2^5 sets in the power set, so removing the empty set results in 31. Therefore, there are 31N sets total.
An example is quite helpful here, taking the sample input:
4
1 2 3 4 5 # Cow 0
1 2 3 10 8 # Cow 1
10 9 8 7 6 # Cow 2
50 60 70 80 90 # Cow 3
The subsets will be:
{
{1}, {1, 2}, {1, 3}, ..., {2, 3, 4, 5}, {1, 2, 3, 4, 5}, # Cow 0
{1}, {1, 2}, {1, 3}, ..., {2, 3, 10, 8}, {1, 2, 3, 10, 8}, # Cow 1
...
}
Each cow generates 31 subsets. From there, the algorithm counts the number of cows that generate a specific subset (for example, note that {1} is generated by both cow 0 and 1, we just keep track of how many cows generate each subset), and applies inclusion-exclusion based on the subset size.
Nice problem, I used to do USACO and they had really interesting problems which still stand out amongst the "clever" interview questions a lot of companies give. :)

How to calculate a matrix formed by vector in Mathematica

I need to obtain a matrix vvT formed by a column vector v. i.e. the column vector v matrix times its transpose.
I found Mathematica doesn't support column vector. Please help.
Does this do what you want?
v = List /# Range#5;
vT = Transpose[v];
vvT = v.vT;
v // MatrixForm
vT // MatrixForm
vvT // MatrixForm
To get {1, 2, 3, 4, 5} into {{1}, {2}, {3}, {4}, {5}} you can use any of:
List /# {1, 2, 3, 4, 5}
{ {1, 2, 3, 4, 5} }\[Transpose]
Partition[{1, 2, 3, 4, 5}, 1]
You may find one of these more convenient than the others. Usually on long lists you will find Partition to be the fastest.
Also, your specific operation can be done in different ways:
x = {1, 2, 3, 4, 5};
Outer[Times, x, x]
Syntactically shortest:

Deriving Mean of Values within a Tensor

I have a 20000 x 185 x 5 tensor, which looks like
{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}...
(continue for 185 times)}
{{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...
...
...
...
{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}
The 20000 represents iteration number, the 185 represents individuals, and each individual has 5 attributes. I need to construct a 185 x 5 matrix that stores the mean value for each individual's 5 attributes, averaged across the 20000 iterations.
Not sure what the best way to do this is. I know Mean[ ] works on matrices, but with a Tensor, the derived values might not be what I need. Also, Mathematica ran out of memory if I tried to do Mean[tensor]. Please provide some help or advice. Thank you.
When in doubt, drop the size of the dimensions. (You can still keep them distinct to easily see where things end up.)
(* In[1]:= *) data = Array[a, {4, 3, 2}]
(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1],
a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1],
a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1],
a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1],
a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1],
a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}
(* In[2]:= *) Dimensions[data]
(* Out[2]= *) {4, 3, 2}
(* In[3]:= *) means = Mean[data]
(* Out[3]= *) {
{1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]),
1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])},
{1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]),
1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])},
{1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]),
1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
}
(* In[4]:= *) Dimensions[means]
(* Out[4]= *) {3, 2}
Mathematica ran out of memory if I tried to do Mean[tensor]
This is probably because intermediate results are larger than the final result. This is likely if the elements are not type Real or Integer. Example:
a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
{109125576, 124244808}
{269465456, 376960648}
If they are, and are in packed array form, perhaps the elements are such that the array in unpacked during processing.
Here is an example where the tensor is a packed array of small numbers, and unpacking does not occur.
a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163012808, 163016952}
{163018944, 163026688}
Here is the same size of tensor with very large numbers.
a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163010680, 458982088}
{163122608, 786958080}
To elaborate a little on the other answers, there is no reason to expect Mathematica functions to operate materially differently on tensors than matrices because Mathemetica considers them both to be nested Lists, that are just of different nesting depth. How functions behave with lists depends on whether they're Listable, which you can check using Attributes[f], where fis the function you are interested in.
Your data list's dimensionality isn't actually that big in the scheme of things. Without seeing your actual data it is hard to be sure, but I suspect the reason you are running out of memory is that some of your data is non-numerical.
I don't know what you're doing incorrectly (your code will help). But Mean[] already works as you want it to.
a = RandomReal[1, {20000, 185, 5}];
b = Mean#a;
Dimensions#b
Out[1]= {185, 5}
You can even check that this is correct:
{Max#b, Min#b}
Out[2]={0.506445, 0.494061}
which is the expected value of the mean given that RandomReal uses a uniform distribution by default.
Assume you have the following data :
a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];
In a straightforward manner You can find a table which stores the means of a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]:
c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
Length[a] // N; // Timing
{37.487, Null}
or simply :
d = Total[a]/Length[a] // N; // Timing
{0.702, Null}
The second way is about 50 times faster.
c == d
True
To extend on Brett's answer a bit, when you call Mean on a n-dimensional tensor then it averages over the first index and returns an n-1 dimensional tensor:
a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)
In the more general case where you may wish to average over the i-th argument, you would have to transpose the data around first. For example, say you want to average the 3nd of 5 dimensions. You would need the 3rd element first, followed by the 1st, 2nd, 4th, 5th.
a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)
In other words, you would make a call to Transpose where you placed the i-th index as the first tensor index and moved everything before it ahead one. Anything that comes after the i-th index stays the same.
This tends to come in handy when your data comes in odd formats where the first index may not always represent different realizations of a data sample. I've had this come up, for example, when I had to do time averaging of large wind data sets where the time series came third (!) in terms of the tensor representation that was available.
You could imagine the generalizedTenorMean would look something like this then:
Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] :=
Module[{n = Length#Dimensions#A, ordering},
ordering =
Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
Mean#Transpose[A, ordering]]
This reduces to the plain-old-mean when i == 1. Try it out:
A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions#A (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 1] (* {4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 7] (* {2, 4, 6, 8, 10, 12} *)
On a side note, I'm surprised that Mathematica doesn't support this by default. You don't always want to average over the first level of a list.

Maximize function in Mathematica which counts elements

I reduced a debugging problem in Mathematica 8 to something similar to the following code:
f = Function[x,
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Count[list, x]
];
f[4]
Maximize{f[x], x, Integers]
Output:
4
{0, {x->0}}
So, while the maximum o function f is obtained when x equals 4 (as confirmed in the first output line), why does Maximize return x->0 (output line 2)?
The reason for this behavior can be easily found using Trace. What happens is that your function is evaluated inside Maximize with still symbolic x, and since your list does not contain symbol x, results in zero. Effectively, you call Maximize[0,x,Integers], hence the result. One thing you can do is to protect the function from immediate evaluation by using pattern-defined function with a restrictive pattern, like this for example:
Clear[ff];
ff[x_?IntegerQ] :=
With[{list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5}}, Count[list, x]]
It appears that Maximize can not easily deal with it however, but NMaximize can:
In[73]:= NMaximize[{ff[x], Element[x, Integers]}, x]
Out[73]= {4., {x -> 4}}
But, generally, either of the Maximize family functions seem not quite appropriate for the job. You may be better off by explicitly computing the maximum, for example like this:
In[78]:= list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Extract[#, Position[#, Max[#], 1, 1] &[#[[All, 2]]]] &[Tally[list]]
Out[79]= {{4, 4}}
HTH
Try this:
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
First#Sort[Tally[list], #1[[2]] > #2[[2]] &]
Output:
{4, 4}

how to import a file into mathematica and reference a column by header name

I have a TSV file with many columns like so;
genename X1 X100 X103 X105 X115 X117 X120 X122 X123
Gene20728 0.415049 0.517868 0.820183 0.578081 0.30997 0.395181
I would like to import it into Mathematica, and then extract and sort a column.
i.e., I want to extract column ["X117"] and sort it, and output the sorted list.
table = Import["file.csv", "Table"];
x117 = Drop[table[[All, 7]], 1];
sorted = Sort[x117];
I do not think there is a built in method of achieving the smart structure you seem to be asking for.
Below is the what I think is the most straight forward implementation out of the various possible methods.
stringdata = "h1\th2\n1\t2\n3\t4\n5"
h1 h2
1 2
5 4
3
Clear[ImportColumnsByName];
ImportColumnsByName[filename_] :=
Module[{data, headings, columns, struc},
data = ImportString[filename, "TSV"];
headings = data[[1]];
columns = Transpose[PadRight[data[[2 ;; -1]]]];
MapThread[(struc[#1] = #2) &, {headings, columns}];
struc
]
Clear[test];
test = ImportColumnsByName[stringdata];
test["h1"]
test["h2"]
Sort[test["h1"]]
outputs:
{1, 3, 5}
{2, 4, 0}
{1, 3, 5}
Building on ragfield's solution, this is a more dynamic method, however every call to this structure makes a call to Position and Part.
Clear[ImportColumnsByName];
ImportColumnsByName[filename_] := Module[{data, temp},
data = PadRight#ImportString[filename, "Table"];
temp[heading_] :=
Rest[data[[All, Position[data[[1]], heading][[1, 1]]]]];
temp
]
Clear[test];
test = ImportColumnsByName[stringdata];
test["h1"]
test["h2"]
Sort[test["h1"]]
outputs:
{1, 3, 5}
{2, 4, 0}
{1, 3, 5}
Starting from ragfield's code:
table = Import["file.csv", "Table"];
colname = "X117"
x117 = Drop[table[[All, Position[tb[[1, All]], colname]//Flatten]],
1]//Flatten;
sorted = Sort[x117];
For processing Excel files from various sites I do variations on this:
data = {{"h1", "h2"}, {1, 2}, {3, 4}, {5, ""}};
find[x_String] := Cases[Transpose[data], {x, __}]
In[]=find["h1"]
Out[]={{"h1", 1, 3, 5}}
If it is ragged sort of data you can usually pad it readily enough to make it suitable for transposing. Additionally some of my sources are lazy with formatting, sometimes headers change case, sometimes there is an empty row before the header, and so on:
find2[x_String,data_List] :=
Cases[Transpose[data], {___,
y_String /;
StringMatchQ[StringTrim[y], x, IgnoreCase -> True], __}]
In[]=find2["H1",data]
Out[]={{"h1", 1, 3, 5}}
data2 = {{"", ""}, {"H1 ", "h2"}, {1, 2}, {3, 4}, {5, ""}};
In[]=find2["h1",data2]
Out[]={{,"H1 ", 1, 3, 5}}

Resources