How can I implement such a map-like operation in mathematica - wolfram-mathematica

I have a list and an arbitrary function taking 4 parameters, let's say {1, 11, 3, 13, 9, 0, 12, 7} and f[{x,y,z,w}]={x+y, z+w}, what I want to do is to form a new list such that 4 consecutive elements in the original list are evaluated to get a new value as the new list's component, and the evaluation has to be done in every 2 positions in the original list, in this case, the resulting list is:
{{12, 16}, {16, 9}, {9, 19}}
Note here 4 and 2 can change. How to do this conveniently in Mathematica? I imagine this as something like Map, but not sure how to relate.

There's an alternative to Map[f, Partition[...]]: Developer`PartitionMap. Which works exactly like Map[f, Partition[list, n, ...]]. So, your code would be
Needs["Developer`"]
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
PartitionMap[f,list, 4, 2]
giving the same result as Mark's answer.

f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
f /# Partition[list, 4, 2]

Related

What is the fastest way to join multiple subsets that have similar elements?

I have a list with 500+ thousand subsets each having from 1 to 500 values (integers). So i have something like:
{1, 2, 3 }
{2, 3}
{4, 5}
{3, 6, 7}
{7, 9}
{8, 4}
{10, 11}
After running the code I would like to get:
{1, 2, 3, 6, 7, 9}
{4, 5, 8}
{10, 11}
I wrote simple code [here] that compares each subset to each subset, if they intersect they are joined together, else not.
It is ok on small scale, but with big amount of data it takes forever.
Please, could you advise any improvements?
P.S. I am not strong in maths or logics, big O notation would be greek for me. I am sorry.
You're trying to find the connected components in a graph, with each of your input sets representing a set of nodes that's fully connected. Here's a simple implementation:
sets = [{1, 2, 3 },{2, 3},{4, 5},{3, 6, 7},{7, 9},{8, 4},{10, 11}]
allelts = set.union(*sets)
components = {X: {X} for X in allelts}
component = {X: X for X in allelts}
for S in sets:
comp = sorted({component[X] for X in S})
mergeto = comp[0]
for mergefrom in comp[1:]:
components[mergeto] |= components[mergefrom]
for X in components[mergefrom]:
component[X] = mergeto
del components[mergefrom]
That results in components having a list of components (keyed by their minimum element), and component storing the components for each element:
>>> print(components)
{1: {1, 2, 3, 6, 7, 9}, 4: {8, 4, 5}, 10: {10, 11}}
>>> print(component)
{1: 1, 2: 1, 3: 1, 4: 4, 5: 4, 6: 1, 7: 1, 8: 4, 9: 1, 10: 10, 11: 10}
>>>

How to calculate a matrix formed by vector in Mathematica

I need to obtain a matrix vvT formed by a column vector v. i.e. the column vector v matrix times its transpose.
I found Mathematica doesn't support column vector. Please help.
Does this do what you want?
v = List /# Range#5;
vT = Transpose[v];
vvT = v.vT;
v // MatrixForm
vT // MatrixForm
vvT // MatrixForm
To get {1, 2, 3, 4, 5} into {{1}, {2}, {3}, {4}, {5}} you can use any of:
List /# {1, 2, 3, 4, 5}
{ {1, 2, 3, 4, 5} }\[Transpose]
Partition[{1, 2, 3, 4, 5}, 1]
You may find one of these more convenient than the others. Usually on long lists you will find Partition to be the fastest.
Also, your specific operation can be done in different ways:
x = {1, 2, 3, 4, 5};
Outer[Times, x, x]
Syntactically shortest:

Deriving Mean of Values within a Tensor

I have a 20000 x 185 x 5 tensor, which looks like
{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}...
(continue for 185 times)}
{{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...
...
...
...
{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}
The 20000 represents iteration number, the 185 represents individuals, and each individual has 5 attributes. I need to construct a 185 x 5 matrix that stores the mean value for each individual's 5 attributes, averaged across the 20000 iterations.
Not sure what the best way to do this is. I know Mean[ ] works on matrices, but with a Tensor, the derived values might not be what I need. Also, Mathematica ran out of memory if I tried to do Mean[tensor]. Please provide some help or advice. Thank you.
When in doubt, drop the size of the dimensions. (You can still keep them distinct to easily see where things end up.)
(* In[1]:= *) data = Array[a, {4, 3, 2}]
(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1],
a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1],
a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1],
a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1],
a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1],
a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}
(* In[2]:= *) Dimensions[data]
(* Out[2]= *) {4, 3, 2}
(* In[3]:= *) means = Mean[data]
(* Out[3]= *) {
{1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]),
1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])},
{1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]),
1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])},
{1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]),
1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
}
(* In[4]:= *) Dimensions[means]
(* Out[4]= *) {3, 2}
Mathematica ran out of memory if I tried to do Mean[tensor]
This is probably because intermediate results are larger than the final result. This is likely if the elements are not type Real or Integer. Example:
a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
{109125576, 124244808}
{269465456, 376960648}
If they are, and are in packed array form, perhaps the elements are such that the array in unpacked during processing.
Here is an example where the tensor is a packed array of small numbers, and unpacking does not occur.
a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163012808, 163016952}
{163018944, 163026688}
Here is the same size of tensor with very large numbers.
a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163010680, 458982088}
{163122608, 786958080}
To elaborate a little on the other answers, there is no reason to expect Mathematica functions to operate materially differently on tensors than matrices because Mathemetica considers them both to be nested Lists, that are just of different nesting depth. How functions behave with lists depends on whether they're Listable, which you can check using Attributes[f], where fis the function you are interested in.
Your data list's dimensionality isn't actually that big in the scheme of things. Without seeing your actual data it is hard to be sure, but I suspect the reason you are running out of memory is that some of your data is non-numerical.
I don't know what you're doing incorrectly (your code will help). But Mean[] already works as you want it to.
a = RandomReal[1, {20000, 185, 5}];
b = Mean#a;
Dimensions#b
Out[1]= {185, 5}
You can even check that this is correct:
{Max#b, Min#b}
Out[2]={0.506445, 0.494061}
which is the expected value of the mean given that RandomReal uses a uniform distribution by default.
Assume you have the following data :
a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];
In a straightforward manner You can find a table which stores the means of a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]:
c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
Length[a] // N; // Timing
{37.487, Null}
or simply :
d = Total[a]/Length[a] // N; // Timing
{0.702, Null}
The second way is about 50 times faster.
c == d
True
To extend on Brett's answer a bit, when you call Mean on a n-dimensional tensor then it averages over the first index and returns an n-1 dimensional tensor:
a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)
In the more general case where you may wish to average over the i-th argument, you would have to transpose the data around first. For example, say you want to average the 3nd of 5 dimensions. You would need the 3rd element first, followed by the 1st, 2nd, 4th, 5th.
a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)
In other words, you would make a call to Transpose where you placed the i-th index as the first tensor index and moved everything before it ahead one. Anything that comes after the i-th index stays the same.
This tends to come in handy when your data comes in odd formats where the first index may not always represent different realizations of a data sample. I've had this come up, for example, when I had to do time averaging of large wind data sets where the time series came third (!) in terms of the tensor representation that was available.
You could imagine the generalizedTenorMean would look something like this then:
Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] :=
Module[{n = Length#Dimensions#A, ordering},
ordering =
Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
Mean#Transpose[A, ordering]]
This reduces to the plain-old-mean when i == 1. Try it out:
A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions#A (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 1] (* {4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 7] (* {2, 4, 6, 8, 10, 12} *)
On a side note, I'm surprised that Mathematica doesn't support this by default. You don't always want to average over the first level of a list.

Definition lookup speed: a performance issue

I have the following problem.
I need to build a very large number of definitions(*) such as
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
f[{0,0,1,0}] = 3
f[{0,0,0,1}] = 2
...
f[{2,3,1,2}] = 4
...
f[{n1,n2,n3,n4}] = some integer
...
This is just an example. The length of the argument list does not need to be 4 but can be anything.
I realized that the lookup for each value slows down with exponential complexity when the length of the argument list increases. Perhaps this is not so strange, since it is clear that in principle there is a combinatorial explosion in how many definitions Mathematica needs to store.
Though, I have expected Mathematica to be smart and that value extract should be constant time complexity. Apparently it is not.
Is there any way to speed up lookup time? This probably has to do with how Mathematica internally handles symbol definition lookups. Does it phrases the list until it finds the match? It seems that it does so.
All suggestions highly appreciated.
With best regards
Zoran
(*) I am working on a stochastic simulation software that generates all configurations of a system and needs to store how many times each configuration occurred. In that sense a list {n1, n2, ..., nT} describes a particular configuration of the system saying that there are n1 particles of type 1, n2 particles of type 2, ..., nT particles of type T. There can be exponentially many such configurations.
Could you give some detail on how you worked out that lookup time is exponential?
If it is indeed exponential, perhaps you could speed things up by using Hash on your keys (configurations), then storing key-value pairs in a list like {{key1,value1},{key2,value2}}, kept sorted by key and then using binary search (which should be log time). This should be very quick to code up but not optimum in terms of speed.
If that's not fast enough, one could think about setting up a proper hashtable implementation (which I thought was what the f[{0,1,0,1}]=3 approach did, without having checked).
But some toy example of the slowdown would be useful to proceed further...
EDIT: I just tried
test[length_] := Block[{f},
Do[
f[RandomInteger[{0, 10}, 100]] = RandomInteger[0, 10];,
{i, 1, length}
];
f[{0, 0, 0, 0, 1, 7, 0, 3, 7, 8, 0, 4, 5, 8, 0, 8, 6, 7, 7, 0, 1, 6,
3, 9, 6, 9, 2, 7, 2, 8, 1, 1, 8, 4, 0, 5, 2, 9, 9, 10, 6, 3, 6,
8, 10, 0, 7, 1, 2, 8, 4, 4, 9, 5, 1, 10, 4, 1, 1, 3, 0, 3, 6, 5,
4, 0, 9, 5, 4, 6, 9, 6, 10, 6, 2, 4, 9, 2, 9, 8, 10, 0, 8, 4, 9,
5, 5, 9, 7, 2, 7, 4, 0, 2, 0, 10, 2, 4, 10, 1}] // timeIt
]
with timeIt defined to accurately time even short runs as follows:
timeIt::usage = "timeIt[expr] gives the time taken to execute expr,
repeating as many times as necessary to achieve a total time of \
1s";
SetAttributes[timeIt, HoldAll]
timeIt[expr_] := Module[{t = Timing[expr;][[1]], tries = 1},
While[t < 1.,
tries *= 2;
t = Timing[Do[expr, {tries}];][[1]];
];
Return[t/tries]]
and then
out = {#, test[#]} & /# {10, 100, 1000, 10000, 100000, 100000};
ListLogLogPlot#out
(also for larger runs). So it seems constant time here.
Suppose you enter your information not like
f[{1,0,0,0}] = 1
f[{0,1,0,0}] = 2
but into a n1 x n2 x n3 x n4 matrix m like
m[[2,1,1,1]] = 1
m[[1,2,1,1]] = 2
etc.
(you could even enter values not as f[{1,0,0,0}]=1, but as f[{1,0,0,0},1] with
f[li_List, i_Integer] := Part[m, Apply[Sequence, li + 1]] = i;
f[li_List] := Part[m, Apply[Sequence, li + 1]];
where you have to initialize m e.g. by m = ConstantArray[0, {4, 4, 4, 4}];)
Let's compare timings:
testf[z_] :=
(
Do[ f[{n1, n2, n3, n4}] = RandomInteger[{1,100}], {n1,z}, {n2,z}, {n3,z},{n4,z}];
First[ Timing[ Do[ f[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z} ] ] ]
);
Framed[
ListLinePlot[
Table[{z, testf[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"DownValue approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"},ImageSize -> 500
]
]
Clear[f];
testf2[z_] :=
(
m = RandomInteger[{1, 100}, {z, z, z, z}];
f2[ni__Integer] := m[[Sequence ## ({ni} + 1)]];
First[ Timing[ Do[ f2[{n2, n4, n1, n3}], {n1, z}, {n2, z}, {n3, z}, {n4, z}] ] ]
)
Framed[
ListLinePlot[
Table[{z, testf2[z]}, {z, 22, 36, 2}],
PlotLabel -> Row[{"Matrix approach: ",
Round[MemoryInUse[]/1024.^2],
" MB needed"
}],
AxesLabel -> {"n1,n2,n3,n4", "time/s"}, ImageSize -> 500
]
]
gives
So for larger sets up information a matrix approach seems clearly preferrable.
Of course, if you have truly large data, say more GB than you have RAM, then you just
have to use a database and DatabaseLink.

Maximize function in Mathematica which counts elements

I reduced a debugging problem in Mathematica 8 to something similar to the following code:
f = Function[x,
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Count[list, x]
];
f[4]
Maximize{f[x], x, Integers]
Output:
4
{0, {x->0}}
So, while the maximum o function f is obtained when x equals 4 (as confirmed in the first output line), why does Maximize return x->0 (output line 2)?
The reason for this behavior can be easily found using Trace. What happens is that your function is evaluated inside Maximize with still symbolic x, and since your list does not contain symbol x, results in zero. Effectively, you call Maximize[0,x,Integers], hence the result. One thing you can do is to protect the function from immediate evaluation by using pattern-defined function with a restrictive pattern, like this for example:
Clear[ff];
ff[x_?IntegerQ] :=
With[{list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5}}, Count[list, x]]
It appears that Maximize can not easily deal with it however, but NMaximize can:
In[73]:= NMaximize[{ff[x], Element[x, Integers]}, x]
Out[73]= {4., {x -> 4}}
But, generally, either of the Maximize family functions seem not quite appropriate for the job. You may be better off by explicitly computing the maximum, for example like this:
In[78]:= list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
Extract[#, Position[#, Max[#], 1, 1] &[#[[All, 2]]]] &[Tally[list]]
Out[79]= {{4, 4}}
HTH
Try this:
list = {1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5};
First#Sort[Tally[list], #1[[2]] > #2[[2]] &]
Output:
{4, 4}

Resources