Related
Suppose you have a list of subsets S1,...,Sn of the integer range R={1,2,...,N}, and an integer k. Is there an efficient way to find a subset C of R of size k such that C is a subset of a maximal number of the Si?
As an example, let R={1,2,3,4} and k=2
S1={1,2,3}
S2={1,2,3}
S3={1,2,4}
S4={1,3,4}
Then I want to return either C={1,2} or C={1,3} (doesn't matter which).
I think your problem is NP-Hard. Consider the bipartite graph with the left nodes being your sets and the right nodes being the integers {1, ..., N}, with an edge between two nodes if the set contains the integer. Then, finding a common subset of size k, which is a subset of a maximal number of the Si, is equivalent to finding a complete bipartite subgraph K(i, k) with maximal number of edges i*k. If you could do this in polynomial time, then, you could find the complete bipartite subgraph K(i, j) with maximal number of edges i*j in polynomial time, by trying for each fixed k. But this problem in NP-Complete (Complete bipartite graph).
So, unless P=NP, your problem does not have a polynomial time algorithm.
Assuming I understand your question I believe this is straightforward for fairly small sets.
I will use Mathematica code for illustration, but the concept is universal.
I generate 10 random subsets of length 4, from the set {1 .. 8}:
ss = Subsets[Range#8, {4}] ~RandomSample~ 10
{{1, 3, 4, 6}, {2, 6, 7, 8}, {3, 5, 6, 7}, {2, 4, 6, 7}, {1, 4, 5, 8},
{2, 4, 6, 8}, {1, 2, 3, 8}, {1, 6, 7, 8}, {1, 2, 4, 7}, {1, 2, 5, 7}}
I convert these to a binary array of the presence of each number in each subset:
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
Grid[a]
That is ten columns for ten subsets, and eight rows for elements {1 .. 8}.
Now generate all possible target subsets (size 3):
keys = Subsets[Union ## ss, {3}];
Take a "key" and extract those rows from the array and do a BitAnd operation (return 1 iff all columns equal 1), then count the number of ones. For example, for key {1, 6, 8} we have:
a[[{1, 6, 8}]]
After BitAnd:
Do this for each key:
counts = Tr[BitAnd ## a[[#]]] & /# keys;
Then find the position(s) of the maximum element of that list, and extract the corresponding parts of keys:
keys ~Extract~ Position[counts, Max#counts]
{{1, 2, 7}, {2, 4, 6}, {2, 4, 7}, {2, 6, 7}, {2, 6, 8}, {6, 7, 8}}
With adequate memory this process works quickly for a larger set. Starting with 50,000 randomly selected subsets of length 7 from {1 .. 30}:
ss = Subsets[Range#30, {7}] ~RandomSample~ 50000;
The maximum sub-subsets of length 4 are calculated in about nine seconds:
AbsoluteTiming[
a = Normal#SparseArray[Join ## MapIndexed[Tuples[{##}] &, ss] -> 1];
keys = Subsets[Union ## ss, {4}];
counts = Tr[BitAnd ## a[[#]]] & /# keys;
keys~Extract~Position[counts, Max#counts]
]
{8.8205045, {{2, 3, 4, 20},
{7, 10, 15, 18},
{7, 13, 16, 26},
{11, 21, 26, 28}}}
I should add that Mathematica is a high level language and these operations are on generic objects, therefore if this is done truly at the binary level this should be much faster, and more memory efficient.
I hope I don't misunderstand the problem... Here a solution in SWI-Prolog
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
transpose_pairs(NumToCount, CountToNumSortedR),
reverse(CountToNumSortedR, CountToNumSorted),
length(Subset, K), % list of free vars
prefix(SolutionsK, CountToNumSorted),
pairs_values(SolutionsK, Subset).
test output:
?- solve.
[1,3]
true ;
[7,6,2]
true.
edit: I think that the above solution is wrong, in the sense that what's returned could not be a subset of any of the input: here (a commented) solution without this problem:
:- module(subsets, [solve/0]).
:- [library(pairs),
library(aggregate),
library(ordsets)].
solve :-
problem(R, K, Subsets),
once(subset_of_maximal_number(R, K, Subsets, Subset)),
writeln(Subset).
problem(4, 2,
[[1,2,3], [1,2,3], [1,2,4], [1,3,4]]).
problem(8, 3,
[[1, 3, 4, 6], [2, 6, 7, 8], [3, 5, 6, 7], [2, 4, 6, 7], [1, 4, 5, 8],
[2, 4, 6, 8], [1, 2, 3, 8], [1, 6, 7, 8], [1, 2, 4, 7], [1, 2, 5, 7]]).
subset_of_maximal_number(R, K, Subsets, Subset) :-
flatten(Subsets, Numbers),
findall(Num-Count,
( between(1, R, Num),
aggregate_all(count, member(Num, Numbers), Count)
), NumToCount),
% actually sort by ascending # of occurrences
transpose_pairs(NumToCount, CountToNumSorted),
pairs_values(CountToNumSorted, PreferredRev),
% we need higher values first
reverse(PreferredRev, Preferred),
% empty slots to fill, preferred first
length(SubsetP, K),
select_k(Preferred, SubsetP),
% verify our selection it's an actual subset of any of subsets
sort(SubsetP, Subset),
once((member(S, Subsets), ord_subtract(Subset, S, []))).
select_k(_Subset, []).
select_k(Subset, [E|R]) :-
select(E, Subset, WithoutE),
select_k(WithoutE, R).
test:
?- solve.
[1,3]
true ;
[2,6,7]
true.
I need to obtain a matrix vvT formed by a column vector v. i.e. the column vector v matrix times its transpose.
I found Mathematica doesn't support column vector. Please help.
Does this do what you want?
v = List /# Range#5;
vT = Transpose[v];
vvT = v.vT;
v // MatrixForm
vT // MatrixForm
vvT // MatrixForm
To get {1, 2, 3, 4, 5} into {{1}, {2}, {3}, {4}, {5}} you can use any of:
List /# {1, 2, 3, 4, 5}
{ {1, 2, 3, 4, 5} }\[Transpose]
Partition[{1, 2, 3, 4, 5}, 1]
You may find one of these more convenient than the others. Usually on long lists you will find Partition to be the fastest.
Also, your specific operation can be done in different ways:
x = {1, 2, 3, 4, 5};
Outer[Times, x, x]
Syntactically shortest:
I have a list and an arbitrary function taking 4 parameters, let's say {1, 11, 3, 13, 9, 0, 12, 7} and f[{x,y,z,w}]={x+y, z+w}, what I want to do is to form a new list such that 4 consecutive elements in the original list are evaluated to get a new value as the new list's component, and the evaluation has to be done in every 2 positions in the original list, in this case, the resulting list is:
{{12, 16}, {16, 9}, {9, 19}}
Note here 4 and 2 can change. How to do this conveniently in Mathematica? I imagine this as something like Map, but not sure how to relate.
There's an alternative to Map[f, Partition[...]]: Developer`PartitionMap. Which works exactly like Map[f, Partition[list, n, ...]]. So, your code would be
Needs["Developer`"]
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
PartitionMap[f,list, 4, 2]
giving the same result as Mark's answer.
f[{x_, y_, z_, w_}] = {x + y, z + w};
list = {1, 11, 3, 13, 9, 0, 12, 7};
f /# Partition[list, 4, 2]
I have a 20000 x 185 x 5 tensor, which looks like
{{{a1_1,a2_1,a3_1,a4_1,a5_1},{b1_1,b2_1,b3_1,b4_1,b5_1}...
(continue for 185 times)}
{{a1_2,a2_2,a3_2,a4_2,a5_2},{b1_2,b2_2,b3_2,b4_2,b5_2}...
...
...
...
{{a1_20000,a2_20000,a3_20000,a4_20000,a5_20000},
{b1_20000,b2_20000,b3_20000,b4_20000,b5_20000}... }}
The 20000 represents iteration number, the 185 represents individuals, and each individual has 5 attributes. I need to construct a 185 x 5 matrix that stores the mean value for each individual's 5 attributes, averaged across the 20000 iterations.
Not sure what the best way to do this is. I know Mean[ ] works on matrices, but with a Tensor, the derived values might not be what I need. Also, Mathematica ran out of memory if I tried to do Mean[tensor]. Please provide some help or advice. Thank you.
When in doubt, drop the size of the dimensions. (You can still keep them distinct to easily see where things end up.)
(* In[1]:= *) data = Array[a, {4, 3, 2}]
(* Out[1]= *) {{{a[1, 1, 1], a[1, 1, 2]}, {a[1, 2, 1],
a[1, 2, 2]}, {a[1, 3, 1], a[1, 3, 2]}}, {{a[2, 1, 1],
a[2, 1, 2]}, {a[2, 2, 1], a[2, 2, 2]}, {a[2, 3, 1],
a[2, 3, 2]}}, {{a[3, 1, 1], a[3, 1, 2]}, {a[3, 2, 1],
a[3, 2, 2]}, {a[3, 3, 1], a[3, 3, 2]}}, {{a[4, 1, 1],
a[4, 1, 2]}, {a[4, 2, 1], a[4, 2, 2]}, {a[4, 3, 1], a[4, 3, 2]}}}
(* In[2]:= *) Dimensions[data]
(* Out[2]= *) {4, 3, 2}
(* In[3]:= *) means = Mean[data]
(* Out[3]= *) {
{1/4 (a[1, 1, 1] + a[2, 1, 1] + a[3, 1, 1] + a[4, 1, 1]),
1/4 (a[1, 1, 2] + a[2, 1, 2] + a[3, 1, 2] + a[4, 1, 2])},
{1/4 (a[1, 2, 1] + a[2, 2, 1] + a[3, 2, 1] + a[4, 2, 1]),
1/4 (a[1, 2, 2] + a[2, 2, 2] + a[3, 2, 2] + a[4, 2, 2])},
{1/4 (a[1, 3, 1] + a[2, 3, 1] + a[3, 3, 1] + a[4, 3, 1]),
1/4 (a[1, 3, 2] + a[2, 3, 2] + a[3, 3, 2] + a[4, 3, 2])}
}
(* In[4]:= *) Dimensions[means]
(* Out[4]= *) {3, 2}
Mathematica ran out of memory if I tried to do Mean[tensor]
This is probably because intermediate results are larger than the final result. This is likely if the elements are not type Real or Integer. Example:
a = Tuples[{x, Sqrt[y], z^x, q/2, Mod[r, 1], Sin[s]}, {2, 4}];
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
{109125576, 124244808}
{269465456, 376960648}
If they are, and are in packed array form, perhaps the elements are such that the array in unpacked during processing.
Here is an example where the tensor is a packed array of small numbers, and unpacking does not occur.
a = RandomReal[99, {20000, 185, 5}];
PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163012808, 163016952}
{163018944, 163026688}
Here is the same size of tensor with very large numbers.
a = RandomReal[$MaxMachineNumber, {20000, 185, 5}];
Developer`PackedArrayQ[a]
{MemoryInUse[], MaxMemoryUsed[]}
b = Mean[a];
{MemoryInUse[], MaxMemoryUsed[]}
True
{163010680, 458982088}
{163122608, 786958080}
To elaborate a little on the other answers, there is no reason to expect Mathematica functions to operate materially differently on tensors than matrices because Mathemetica considers them both to be nested Lists, that are just of different nesting depth. How functions behave with lists depends on whether they're Listable, which you can check using Attributes[f], where fis the function you are interested in.
Your data list's dimensionality isn't actually that big in the scheme of things. Without seeing your actual data it is hard to be sure, but I suspect the reason you are running out of memory is that some of your data is non-numerical.
I don't know what you're doing incorrectly (your code will help). But Mean[] already works as you want it to.
a = RandomReal[1, {20000, 185, 5}];
b = Mean#a;
Dimensions#b
Out[1]= {185, 5}
You can even check that this is correct:
{Max#b, Min#b}
Out[2]={0.506445, 0.494061}
which is the expected value of the mean given that RandomReal uses a uniform distribution by default.
Assume you have the following data :
a = Table[RandomInteger[100], {i, 20000}, {j, 185}, {k, 5}];
In a straightforward manner You can find a table which stores the means of a[[1,j,k]],a[[2,j,k]],...a[[20000,j,k]]:
c = Table[Sum[a[[i, j, k]], {i, Length[a]}], {j, 185}, {k, 5}]/
Length[a] // N; // Timing
{37.487, Null}
or simply :
d = Total[a]/Length[a] // N; // Timing
{0.702, Null}
The second way is about 50 times faster.
c == d
True
To extend on Brett's answer a bit, when you call Mean on a n-dimensional tensor then it averages over the first index and returns an n-1 dimensional tensor:
a = RandomReal[1, {a1, a2, a3, ... an}];
Dimensions[a] (* This would have n entries in it *)
b = Mean[a];
Dimensions[b] (* Has n-1 entries, where averaging was done over the first index *)
In the more general case where you may wish to average over the i-th argument, you would have to transpose the data around first. For example, say you want to average the 3nd of 5 dimensions. You would need the 3rd element first, followed by the 1st, 2nd, 4th, 5th.
a = RandomReal[1, {5, 10, 2, 40, 10}];
b = Transpose[a, {2, 3, 4, 1, 5}];
c = Mean[b]; (* Now of dimensions {5, 10, 40, 10} *)
In other words, you would make a call to Transpose where you placed the i-th index as the first tensor index and moved everything before it ahead one. Anything that comes after the i-th index stays the same.
This tends to come in handy when your data comes in odd formats where the first index may not always represent different realizations of a data sample. I've had this come up, for example, when I had to do time averaging of large wind data sets where the time series came third (!) in terms of the tensor representation that was available.
You could imagine the generalizedTenorMean would look something like this then:
Clear[generalizedTensorMean];
generalizedTensorMean[A_, i_] :=
Module[{n = Length#Dimensions#A, ordering},
ordering =
Join[Table[x, {x, 2, i}], {1}, Table[x, {x, i + 1, n}]];
Mean#Transpose[A, ordering]]
This reduces to the plain-old-mean when i == 1. Try it out:
A = RandomReal[1, {2, 4, 6, 8, 10, 12, 14}];
Dimensions#A (* {2, 4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 1] (* {4, 6, 8, 10, 12, 14} *)
Dimensions#generalizedTensorMean[A, 7] (* {2, 4, 6, 8, 10, 12} *)
On a side note, I'm surprised that Mathematica doesn't support this by default. You don't always want to average over the first level of a list.
Example:
list:={ Plus[1,1], Times[2,3] }
When looking at list, I get
{2,6}
I want to keep them unevaluated (as above) so that list returns
{ Plus[1,1], Times[2,3] }
Later I want to evaluate the functions in list sequence to get
{2,6}
The number of unevaluated functions in list is not known beforehand. Besides Plus, user defined functions like f[x_] may be stored in list
I hope the example is clear.
What is the best way to do this?
The best way is to store them in Hold, not List, like so:
In[255]:= f[x_] := x^2;
lh = Hold[Plus[1, 1], Times[2, 3], f[2]]
Out[256]= Hold[1 + 1, 2 3, f[2]]
In this way, you have full control over them. At some point, you may call ReleaseHold to evaluate them:
In[258]:= ReleaseHold#lh
Out[258]= Sequence[2, 6, 4]
If you want the results in a list rather than Sequence, you may use just List##lh instead. If you need to evaluate a specific one, simply use Part to extract it:
In[261]:= lh[[2]]
Out[261]= 6
If you insist on your construction, here is a way:
In[263]:= l:={Plus[1,1],Times[2,3],f[2]};
Hold[l]/.OwnValues[l]
Out[264]= Hold[{1+1,2 3,f[2]}]
EDIT
In case you have some functions/symbols with UpValues which can evaluate even inside Hold, you may want to use HoldComplete in place of Hold.
EDIT2
As pointed by #Mr.Wizard in another answer, sometimes you may find it more convenient to have Hold wrapped around individual items in your sequence. My comment here is that the usefulness of both forms is amplified once we realize that it is very easy to transform one into another and back. The following function will split the sequence inside Hold into a list of held items:
splitHeldSequence[Hold[seq___], f_: Hold] := List ## Map[f, Hold[seq]]
for example,
In[274]:= splitHeldSequence[Hold[1 + 1, 2 + 2]]
Out[274]= {Hold[1 + 1], Hold[2 + 2]}
grouping them back into a single Hold is even easier - just Apply Join:
In[275]:= Join ## {Hold[1 + 1], Hold[2 + 2]}
Out[275]= Hold[1 + 1, 2 + 2]
The two different forms are useful in diferrent circumstances. You can easily use things such as Union, Select, Cases on a list of held items without thinking much about evaluation. Once finished, you can combine them back into a single Hold, for example, to feed as unevaluated sequence of arguments to some function.
EDIT 3
Per request of #ndroock1, here is a specific example. The setup:
l = {1, 1, 1, 2, 4, 8, 3, 9, 27}
S[n_] := Module[{}, l[[n]] = l[[n]] + 1; l]
Z[n_] := Module[{}, l[[n]] = 0; l]
placing functions in Hold:
In[43]:= held = Hold[Z[1], S[1]]
Out[43]= Hold[Z[1], S[1]]
Here is how the exec function may look:
exec[n_] := MapAt[Evaluate, held, n]
Now,
In[46]:= {exec[1], exec[2]}
Out[46]= {Hold[{0, 1, 1, 2, 4, 8, 3, 9, 27}, S[1]], Hold[Z[1], {1, 1, 1, 2, 4, 8, 3, 9, 27}]}
Note that the original variable held remains unchanged, since we operate on the copy. Note also that the original setup contains mutable state (l), which is not very idiomatic in Mathematica. In particular, the order of evaluations matter:
In[61]:= Reverse[{exec[2], exec[1]}]
Out[61]= {Hold[{0, 1, 1, 2, 4, 8, 3, 9, 27}, S[1]], Hold[Z[1], {2, 1, 1, 2, 4, 8, 3, 9, 27}]}
Whether or not this is desired depends on the specific needs, I just wanted to point this out. Also, while the exec above is implemented according to the requested spec, it implicitly depends on a global variable l, which I consider a bad practice.
An alternative way to store functions suggested by #Mr.Wizard can be achieved e.g. like
In[63]:= listOfHeld = splitHeldSequence[held]
Out[63]= {Hold[Z1], Hold[S1]}
and here
In[64]:= execAlt[n_] := MapAt[ReleaseHold, listOfHeld, n]
In[70]:= l = {1, 1, 1, 2, 4, 8, 3, 9, 27} ;
{execAlt[1], execAlt[2]}
Out[71]= {{{0, 1, 1, 2, 4, 8, 3, 9, 27}, Hold[S[1]]}, {Hold[Z[1]], {1, 1, 1, 2, 4, 8, 3, 9, 27}}}
The same comments about mutability and dependence on a global variable go here as well. This last form is also more suited to query the function type:
getType[n_, lh_] := lh[[n]] /. {Hold[_Z] :> zType, Hold[_S] :> sType, _ :> unknownType}
for example:
In[172]:= getType[#, listOfHeld] & /# {1, 2}
Out[172]= {zType, sType}
The first thing that spings to mind is to not use List but rather use something like this:
SetAttributes[lst, HoldAll];
heldL=lst[Plus[1, 1], Times[2, 3]]
There will surely be lots of more erudite suggestions though!
You can also use Hold on every element that you want held:
a = {Hold[2 + 2], Hold[2*3]}
You can use HoldForm on either the elements or the list, if you want the appearance of the list without Hold visible:
b = {HoldForm[2 + 2], HoldForm[2*3]}
c = HoldForm#{2 + 2, 2*3}
{2 + 2, 2 * 3}
And you can recover the evaluated form with ReleaseHold:
a // ReleaseHold
b // ReleaseHold
c // ReleaseHold
Out[8]= {4, 6}
Out[9]= {4, 6}
Out[10]= {4, 6}
The form Hold[2+2, 2*3] or that of a and b above are good because you can easily add terms with e.g. Append. For b type is it logically:
Append[b, HoldForm[8/4]]
For Hold[2+2, 2*3]:
Hold[2+2, 2*3] ~Join~ Hold[8/4]
Another way:
lh = Function[u, Hold#u, {HoldAll, Listable}];
k = lh#{2 + 2, Sin[Pi]}
(*
->{Hold[2 + 2], Hold[Sin[\[Pi]]]}
*)
ReleaseHold#First#k
(*
-> 4
*)