This seems like it should be an obvious question, but the tutorials and documentation on lists are not forthcoming. Many of these issues stem from the sheer size of my text files (hundreds of MB) and my attempts to boil them down to something manageable by my system. As a result, I'm doing my work in segments and am now trying to combine the results.
I have multiple word frequency lists (~40 of them). The lists can either be taken through Import[ ] or as variables generated in Mathematica. Each list appears as the following and has been generated using the Tally[ ] and Sort[ ] commands:
{{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n", 16850}, {"in",
16164}, {"de", 14930}, {"a", 14660}, {"to", 14175}, {"la", 7347},
{"was", 6030}, {"l", 5981}, {"le", 5735}, <<51293>>, {"abattoir",
1}, {"abattement", 1}, {"abattagen", 1}, {"abattage", 1},
{"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1},
{"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}
Here is an example of the second file:
{{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of", 16262}, {"and",
14488}, {"to", 12726}, {"a", 12635}, {"in", 11141}, {"la", 10739},
{"et", 9016}, {"les", 8675}, {"le", 7748}, <<101032>>,
{"abattement", 1}, {"abattagen", 1}, {"abattage", 1}, {"abated",
1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1}, {"aase", 1},
{"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}
I want to combine them so that the frequency data aggregates: i.e. if the second file has 30,419 occurrences of 'the' and is joined to the first file, it should return that there are 72,635 occurrences (and so on as I move through the entire collection).
It sounds like you need GatherBy.
Suppose your two lists are named data1 and data2, then use
{#[[1, 1]], Total[#[[All, 2]]]} & /# GatherBy[Join[data1, data2], First]
This easily generalizes to any number of lists, not just two.
Try using a hash table, like this. First set things up:
ClearAll[freq];
freq[_] = 0;
Now eg freq["safas"] returns 0. Next, if the lists are defined as
lst1 = {{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n",
16850}, {"in", 16164}, {"de", 14930}, {"a", 14660}, {"to",
14175}, {"la", 7347}, {"was", 6030}, {"l", 5981}, {"le",
5735}, {"abattoir", 1}, {"abattement", 1}, {"abattagen",
1}, {"abattage", 1}, {"abated", 1}, {"abandonn", 1}, {"abaiss",
1}, {"aback", 1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah",
1}, {"aaa", 1}};
lst2 = {{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of",
16262}, {"and", 14488}, {"to", 12726}, {"a", 12635}, {"in",
11141}, {"la", 10739}, {"et", 9016}, {"les", 8675}, {"le",
7748}, {"abattement", 1}, {"abattagen", 1}, {"abattage",
1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback",
1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}};
you may run this
Scan[(freq[#[[1]]] += #[[2]]) &, lst1]
after which eg
freq["the"]
(*
42216
*)
and then the next list
Scan[(freq[#[[1]]] += #[[2]]) &, lst2]
after which eg
freq["the"]
72635
while still
freq["safas"]
(*
0
*)
Here is a direct Sow/Reap function:
Reap[#2~Sow~# & ### data1~Join~data2;, _, {#, Tr##2} &][[2]]
Here is a concise form of acl's method:
Module[{c},
c[_] = 0;
c[#] += #2 & ### data1~Join~data2;
{#[[1, 1]], #2} & ### Most#DownValues#c
]
This appears to be a bit faster than Szabolcs code on my system:
data1 ~Join~ data2 ~GatherBy~ First /.
{{{x_, a_}, {x_, b_}} :> {x, a + b}, {x : {_, _}} :> x}
There's an old saying, "if all you have is a hammer, everything becomes a nail." So, here's my hammer: SelectEquivalents.
This can be done a little quicker using SelectEquivalents:
SelectEquivalents[data1~Join~data2, #[[1]]&, #[[2]]&, {#1, Total[#2]}&]
In order, the first param is obviously just the joined lists, the second one is what they're grouped by (in this case the first element), the third param strips off the string leaving just the count, and the fourth param puts it back together with the string as #1 and the counts in a list as #2.
Try ReplaceRepeated.
Join the lists. Then use
//. {{f1___, {a_, c1_}, f2___, {a_, c2_}, f3___} -> {f1, f2, f3, {a, c1 + c2}}}
Related
So I'm pretty new to Mathematica, and am trying to learn to solve problems in a functional way. The problem I was solving was to list the ways in which I could sum elements from a list (with repetitions), so the sum is leq to some value. The code below solves this just fine.
i = {7.25, 7.75, 15, 19, 22};
m = 22;
getSum[l_List, n_List] := Total[Thread[{l, n}] /. {x_, y_} -> x y];
t = Prepend[Map[Range[0, Floor[m/#]] &, i], List];
Outer ## %;
Flatten[%, ArrayDepth[%] - 2];
Map[{#, getSum[i, #]} &, %];
DeleteCases[%, {_, x_} /; x > m || x == 0];
TableForm[Flatten /# SortBy[%, Last], 0,
TableHeadings -> {None, Append[i, "Total"]}]
However, the code check a lot of unneccesary cases, which could be a problem if m is higher of the list is longer. My question is simply what would be the most Mathematica-esque way of solving this problem, concerning both efficiency and code elegancy.
One simple though not optimal way is :
sol = Reduce[Dot[i, {a, b, c, d, e}] <= m, {a, b, c, d, e}, Integers];
at first try with a smaller i, say i = {7.25, 7.75} to get a feeling about whether you can use this.
You can improve speed by providing upper limits for the coefficients, like in
sol = Reduce[And ## {Dot[i, {a, b, c, d, e}] <= m,
Sequence ## Thread[{a, b, c, d, e} <= Quotient[m, i]]},
{a, b, c, d, e}, Integers]
How about
recurr[numbers_, boundary_] :=
Reap[memoryRecurr[0, {}, numbers, boundary]][[2, 1]];
memoryRecurr[_, _, {}, _] := Null;
memoryRecurr[sum_, numbers_, restNumbers_, diff_] :=
(
Block[
{presentNumber = First[restNumbers], restRest = Rest[restNumbers]}
,
If[
presentNumber <= diff
,
Block[{
newNumbers = Append[numbers, presentNumber],
newSum = sum + presentNumber
},
Sow[{newNumbers, newSum}];
memoryRecurr[
newSum,
newNumbers,
restRest,
diff - presentNumber
];
]
];
memoryRecurr[sum, numbers, restRest, diff]
];
);
So that
recurr[{1, 2, 3, 4, 5}, 7]
->
{{{1}, 1}, {{1, 2}, 3}, {{1, 2, 3}, 6}, {{1, 2, 4}, 7}, {{1, 3},
4}, {{1, 4}, 5}, {{1, 5}, 6}, {{2}, 2}, {{2, 3}, 5}, {{2, 4},
6}, {{2, 5}, 7}, {{3}, 3}, {{3, 4}, 7}, {{4}, 4}, {{5}, 5}}
Let
n=2^10 3^7 5^4...31^2...59^2 61...97
be the factorization of an integer such that the powers of primes are non-increasing.
I would like to write a code in Mathematica to find Min and Max of prime factor of n such that they have the same power.
for example I want a function which take r(the power) and give (at most two) primes in general. A specific answer for the above sample is
minwithpower[7]=3
maxwithpower[7]=3
minwithpower[2]=31
maxwithpower[2]=59
Any idea please.
Let n = 91065388654697452410240000 then
FactorInteger[n]
returns
{{2, 10}, {3, 7}, {5, 4}, {7, 4}, {31, 2}, {37, 2}, {59, 2}, {61, 1}, {97, 1}}
and the expression
Cases[FactorInteger[n], {_, 2}]
returns only those elements from the list of factors and coefficients where the coefficient is 2, ie
{{31, 2}, {37, 2}, {59, 2}}
Next, the expression
Cases[FactorInteger[n], {_, 2}] /. {{min_, _}, ___, {max_, _}} -> {min, max}
returns
{31, 59}
Note that this approach fails if the power you are interested in only occurs once in the output from FactorInteger, for example
Cases[FactorInteger[n], {_, 7}] /. {{min_, _}, ___, {max_, _}} -> {min, max}
returns
{{3, 7}}
but you should be able to fix that deficiency quite easily.
One solution is :
getSamePower[exp_, n_] := With[{powers =
Select[ReleaseHold[n /. {Times -> List, Power[a_, b_] -> {a, b}}], #[[2]] ==
exp &]},
If[Length[powers] == 1, {powers[[1, 1]], powers[[1, 1]]}, {Min[powers[[All, 1]]], Max[powers[[All, 1]]]}]]
to be used as :
getSamePower[7, 2^10 3^7 5^4 \[Pi]^1 31^2 E^1 59^2 61^1 I^1 97^1 // HoldForm]
(* {3, 3} *)
getSamePower[2, 2^10 3^7 5^4 \[Pi]^1 31^2 E^1 59^2 61^1 I^1 97^1 // HoldForm]
(* {31, 59} *)
How do I use Mathematica's Gather/Collect/Transpose functions to convert:
{ { {1, foo1}, {2, foo2}, {3, foo3} }, { {1, bar1}, {2, bar2}, {3, bar3} } }
to
{ {1, foo1, bar1}, {2, foo2, bar2}, {3, foo3, bar3} }
EDIT: Thanks! I was hoping there was a simple way, but I guess not!
Here is your list:
tst = {{{1, foo1}, {2, foo2}, {3, foo3}}, {{1, bar1}, {2, bar2}, {3, bar3}}}
Here is one way:
In[84]:=
Flatten/#Transpose[{#[[All,1,1]],#[[All,All,2]]}]&#
GatherBy[Flatten[tst,1],First]
Out[84]= {{1,foo1,bar1},{2,foo2,bar2},{3,foo3,bar3}}
EDIT
Here is a completely different version, just for fun:
In[106]:=
With[{flat = Flatten[tst,1]},
With[{rules = Dispatch[Rule###flat]},
Map[{#}~Join~ReplaceList[#,rules]&,DeleteDuplicates[flat[[All,1]]]]]]
Out[106]= {{1,foo1,bar1},{2,foo2,bar2},{3,foo3,bar3}}
EDIT 2
And here is yet another way, using linked lists and inner function to accumulate the results:
In[113]:=
Module[{f},f[x_]:={x};
Apply[(f[#1] = {f[#1],#2})&,tst,{2}];
Flatten/#Most[DownValues[f]][[All,2]]]
Out[113]= {{1,foo1,bar1},{2,foo2,bar2},{3,foo3,bar3}}
EDIT 3
Ok, for those who consider all of the above too complicated, here is a really simple rule - based solution:
In[149]:=
GatherBy[Flatten[tst, 1], First] /. els : {{n_, _} ..} :> {n}~Join~els[[All, 2]]
Out[149]= {{1, foo1, bar1}, {2, foo2, bar2}, {3, foo3, bar3}}
Perhaps easier:
tst = {{{1, foo1}, {2, foo2}, {3, foo3}}, {{1, bar1}, {2, bar2}, {3, bar3}}};
GatherBy[Flatten[tst, 1], First] /. {{k_, n_}, {k_, m_}} -> {k, n, m}
(*
-> {{1, foo1, bar1}, {2, foo2, bar2}, {3, foo3, bar3}}
*)
MapThread
If the "foo" and "bar" sublists are guaranteed to be aligned with one another (as they are in the example) and if you will consider using functions other than Gather/Collect/Transpose, then MapThread will suffice:
data={{{1,foo1},{2,foo2},{3,foo3}},{{1,bar1},{2,bar2},{3,bar3}}};
MapThread[{#1[[1]], #1[[2]], #2[[2]]}&, data]
result:
{{1, foo1, bar1}, {2, foo2, bar2}, {3, foo3, bar3}}
Pattern Matching
If the lists are not aligned, you could also use straight pattern matching and replacement (although I wouldn't recommend this approach for large lists):
data //.
{{h1___, {x_, foo__}, t1___}, {h2___, {x_, bar_}, t2___}} :>
{{h1, {x, foo, bar}, t1}, {h2, t2}} // First
Sow/Reap
A more efficient approach for unaligned lists uses Sow and Reap:
Reap[Cases[data, {x_, y_} :> Sow[y, x], {2}], _, Prepend[#2, #1] &][[2]]
Also just for fun ...
DeleteDuplicates /# Flatten /# GatherBy[Flatten[list, 1], First]
where
list = {{{1, foo1}, {2, foo2}, {3, foo3}}, {{1, bar1}, {2, bar2}, {3,
bar3}}}
Edit.
Some more fun ...
Gather[#][[All, 1]] & /# Flatten /# GatherBy[#, First] & #
Flatten[list, 1]
Here is how I would do it using the version of SelectEquivalents I posted in What is in your Mathematica tool bag?
l = {{{1, foo1}, {2, foo2}, {3, foo3}}, {{1, bar1}, {2, bar2}, {3, bar3}}};
SelectEquivalents[
l
,
MapLevel->2
,
TagElement->(#[[1]]&)
,
TransformElement->(#[[2]]&)
,
TransformResults->(Join[{#1},#2]&)
]
This method is quite generic. I used to use functions such as GatherBy before for treating huge lists I generate in Monte-Carlo simulations. Now with SelectEquivalents implementations for such operations are much more intuitive. Plus it is based on the combination Reap and Sow which is very fast in Mathematica.
Until the question is updated to be more clear and specific, I will assume what I want to, and suggest this:
UnsortedUnion ### #~Flatten~{2} &
See: UnsortedUnion
Maybe a bit overcomplicated, but:
lst = {{{1, foo1}, {2, foo2}, {3, foo3}}, {{1, bar1}, {2, bar2}, {3, bar3}}}
Map[
Flatten,
{Scan[Sow[#[[1]]] &,
Flatten[lst, 1]] // Reap // Last // Last // DeleteDuplicates,
Scan[Sow[#[[2]], #[[1]]] &,
Flatten[lst, 1]] // Reap // Last} // Transpose
]
(*
{{1, foo1, bar1}, {2, foo2, bar2}, {3, foo3, bar3}}
*)
Here's how this works:
Scan[Sow[#[[1]]] &,
Flatten[lst, 1]] // Reap // Last // Last // DeleteDuplicates
returns the unique first elements of each of your list items, in the order they were sown (since DeleteDuplicates never reorders elements). Then,
Scan[Sow[#[[2]], #[[1]]] &,
Flatten[lst, 1]] // Reap // Last
exploits the fact that Reap returns expressions sown with difference tags in different lists. So then put them together, and transpose.
This has the disadvantage that we scan twice.
EDIT:
This
Map[
Flatten,
{DeleteDuplicates##[[1]],
Rest[#]} &#Last#Reap[
Scan[(Sow[#[[1]]]; Sow[#[[2]], #[[1]]];) &,
Flatten[lst, 1]]] // Transpose
]
is (very) slightly faster, but is even less readable...
I have a list of pairs of values in mathematica, for example List= {{3,1},{5,4}}.
How do I change the first element (3 & 5) if the second element does not reach a threshold. For example, if the second parts are below 2 then i wish the first parts to go to zero. so that list then = {{0,1},{5,4}}. Some of these lists are extremely long so manually doing it is not an option, unfortunately.
Conceptually, the general way is to use Map. In your case, the code would be
In[13]:= lst = {{3, 1}, {5, 4}}
Out[13]= {{3, 1}, {5, 4}}
In[14]:= thr = 2
Out[14]= 2
In[15]:= Map[{If[#[[2]] < thr, 0, #[[1]]], #[[2]]} &, lst]
Out[15]= {{0, 1}, {5, 4}}
The # symbol here stands for the function argument. You can read more on pure functions here. Double square brackets stand for the Part extraction. You can make it a bit more concise by using Apply on level 1, which is abbreviated by ###:
In[27]:= {If[#2 < thr, 0, #], #2} & ### lst
Out[27]= {{0, 1}, {5, 4}}
Note however that the first method is several times faster for large numerical lists. An even faster, but somewhat more obscure method is this:
In[29]:= Transpose[{#[[All, 1]]*UnitStep[#[[All, 2]] - thr], #[[All, 2]]}] &[lst]
Out[29]= {{0, 1}, {5, 4}}
It is faster because it uses very optimized vectorized operations which apply to all sub-lists at once. Finally, if you want the ultimate performance, this procedural compiled to C version will be another factor of 2 faster:
fn = Compile[{{lst, _Integer, 2}, {threshold, _Real}},
Module[{copy = lst, i = 1},
For[i = 1, i <= Length[lst], i++,
If[copy[[i, 2]] < threshold, copy[[i, 1]] = 0]];
copy], CompilationTarget -> "C", RuntimeOptions -> "Speed"]
You use it as
In[32]:= fn[lst, 2]
Out[32]= {{0, 1}, {5, 4}}
For this last one, you need a C compiler installed on your machine.
Another alternative: Apply (###, Apply at level 1) and Boole (turns logical values in 1's and 0's):
lst = {{3, 1}, {5, 4}};
{#1 Boole[#2 >= 2], #2} & ### lst
An alternative approach might be to use substitution rules, and attach a condition (/;)
lst = {{3, 1}, {5, 4}};
lst /. {x_, y_ /; y < 2} -> {0, y}
output:
{{0, 1}, {5, 4}}
Assuming that your matrix is 2x2 and by second elemnt you mean the second row:
This should work:
If[A[[2, 1]] < 2 || A[[2, 2]] < 2, A[[2,1]] = 0 ]; A
You may have to change the variables, since your questions is kind of confusing. But that's the idea ;-)
I have two list operations which I would like to ask for help. The way I implemented them is not very elegant, so I want to learn from you experts.
1) Suppose I have two lists, one is like {{0,2,4},{1,3,2},{2,0,4}} and the other is {{1,3,7},{2,4,6},{3,1,9}}. I want to either based on the value, or based on some criterion to filter through the first list, and then get the corresponding elements in the second. For example, based on value which is non-zero, I want to get {{3,7},{2,4,6},{3,9}}. Based on the condition greater than 2, I want to get {{7},{4},{9}}.
2) I have a list such as {{{1,2},{1,1}},{{1,3},{2,4}},{{1,2},{2,3}},{{1,4},{3,3}}}. I want to form {{{1,2},{{1,1},{2,3}}},{{1,3},{{2,4}}},{{1,4},{{3,3}}}. That is, I want to group those second lists if the first element is the same. How can I do this in a beautiful way?
Many thanks.
For the first part, you want Pick:
In[27]:= Pick[{{1,3,7},{2,4,6},{3,1,9}},{{0,2,4},{1,3,2},{2,0,4}},_?Positive]
Out[27]= {{3,7},{2,4,6},{3,9}}
In[28]:= Pick[{{1,3,7},{2,4,6},{3,1,9}},{{0,2,4},{1,3,2},{2,0,4}},_?(#>2&)]
Out[28]= {{7},{4},{9}}
For the second question, GatherBy gets you most of the way there:
In[29]:= x = GatherBy[{{{1, 2}, {1, 1}}, {{1, 3}, {2, 4}}, {{1, 2},
{2, 3}}, {{1, 4}, {3, 3}}}, First]
Out[29]= {{{{1, 2}, {1, 1}}, {{1, 2}, {2, 3}}}, {{{1, 3},
{2, 4}}}, {{{1, 4}, {3, 3}}}}
And then you can apply a rule to clean things up a bit:
In[30]:= x /. l:{{a_, _}..} :> {a, Last /# l}
Out[30]= {{{1, 2}, {{1, 1}, {2, 3}}}, {{1, 3}, {{2, 4}}},
{{1, 4}, {{3, 3}}}}
As Michael said, Pick is definitely the way to go for the first one.
For the second part, I'd like to offer an alternative that lets you do this in one line: SelectEquivalents. (I know, rather self promoting, but I use this function a lot.) To get the result your looking for, simply enter
In[1] := SelectEquivalents[ <list>, First, Last, {#1,#2}& ]
Out[1]:= {{{1, 2}, {{1, 1}, {2, 3}}}, {{1, 3}, {{2, 4}}}, {{1, 4}, {{3, 3}}}}
Internally, SelectEquivalents uses Reap and Sow, so First tags each element in <list>, Last transforms the element into the form we wish to use, and {#1, #2}& returns a list with elements of the form {Tag, {<items with that tag>}}. The advantage is that you get to specify everything in one step getting you what you want without subsequent transformations.