Related
NOTE: I am not a Mathematica programmer, but for a class I need to write expressions in it. I understand it is a functional language unlike C or Java.
I am trying to 'compare' (I use that for lack of a better term) the indexes or two irrational numbers. I then try to store whether or not they are equal, 1 and 0 respectively, in a list. Though, the comparison list is not populated (OUTPUT = "{}")
What is wrong with my logic in the for loop (aside from being non-functionally programed and inefficient)
piDigits = RealDigits[N[Pi, 15000000]]
rootDigits = RealDigits[N[Sqrt[2],15000000]]
comparisonList = List[]
For[i = 1, i < Length[Part[piDigits, 0]], i++,
If[Part[piDigits, i] == Part[rootDigits, i] ,
Append[comparisonList, 1], Append[comparisonList, 0]]]
comparisonList
I am having some trouble developing a suitably fast binning algorithm in Mathematica. I have a large (~100k elements) data set of the form
T={{x1,y1,z1},{x2,y2,z2},....}
and I want to bin it into a 2D array of around 100x100 bins, with the bin value being given by the sum of the Z values that fall into each bin.
Currently I am iterating through each element of the table, using Select to pick out which bin it is supposed to be in based on lists of bin boundaries, and adding the z value to a list of values occupying that bin. At the end I map Total onto the list of bins, summing their contents (I do this because I sometimes want to do other things, like maximize).
I have tried using Gather and other such functions to do this but the above method was ridiculously faster, though perhaps I am using Gather poorly. Anyway It still takes a few minutes to do the sorting by my method and I feel like Mathematica can do better. Does anyone have a nice efficient algorithm handy?
Here is a method based on Szabolcs's post that is about about an order of magnitude faster.
data = RandomReal[5, {500000, 3}];
(*500k values*)
zvalues = data[[All, 3]];
epsilon = 1*^-10;(*prevent 101 index*)
(*rescale and round (x,y) coordinates to index pairs in the 1..100 range*)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[
gb[[All, 1, 1]] ->
Total[gb[[All, All, 2]], {2}]]]; // AbsoluteTiming
Gives about {2.012217, Null}
AbsoluteTiming[
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 1}];
res3 = SparseArray[indexes -> zvalues];
System`SetSystemOptions[
"SparseArrayOptions" -> {"TreatRepeatedEntries" -> 0}];
]
Gives about {0.195228, Null}
res3 == res2
True
"TreatRepeatedEntries" -> 1 adds duplicate positions up.
I intend to do a rewrite of the code below because of Szabolcs' readability concerns. Until then, know that if your bins are regular, and you can use Round, Floor, or Ceiling (with a second argument) in place of Nearest, the code below will be much faster. On my system, it tests faster than the GatherBy solution also posted.
Assuming I understand your requirements, I propose:
data = RandomReal[100, {75, 3}];
bins = {0, 20, 40, 60, 80, 100};
Reap[
Sow[{#3, #2}, bins ~Nearest~ #] & ### data,
bins,
Reap[Sow[#, bins ~Nearest~ #2] & ### #2, bins, Tr##2 &][[2]] &
][[2]] ~Flatten~ 1 ~Total~ {3} // MatrixForm
Refactored:
f[bins_] := Reap[Sow[{##2}, bins ~Nearest~ #]& ### #, bins, #2][[2]] &
bin2D[data_, X_, Y_] := f[X][data, f[Y][#2, #2~Total~2 &] &] ~Flatten~ 1 ~Total~ {3}
Use:
bin2D[data, xbins, ybins]
Here's my approach:
data = RandomReal[5, {500000, 3}]; (* 500k values *)
zvalues = data[[All, 3]];
epsilon = 1*^-10; (* prevent 101 index *)
(* rescale and round (x,y) coordinates to index pairs in the 1..100 range *)
indexes = 1 + Floor[(1 - epsilon) 100 Rescale[data[[All, {1, 2}]]]];
(* approach 1: create bin-matrix first, then fill up elements by adding zvalues *)
res1 = Module[
{result = ConstantArray[0, {100, 100}]},
Do[
AddTo[result[[##]], zvalues[[i]]] & ## indexes[[i]],
{i, Length[indexes]}
];
result
]; // Timing
(* approach 2: gather zvalues by indexes, add them up, convert them to a matrix *)
res2 = Module[{gb = GatherBy[Transpose[{indexes, zvalues}], First]},
SparseArray[gb[[All, 1, 1]] -> (Total /# gb[[All, All, 2]])]
]; // Timing
res1 == res2
These two approaches (res1 & res2) can handle 100k and 200k elements per second, respectively, on this machine. Is this sufficiently fast, or do you need to run this whole program in a loop?
Here's my approach using the function SelectEquivalents defined in What is in your Mathematica tool bag? which is perfect for a problem like this one.
data = RandomReal[100, {75, 3}];
bins = Range[0, 100, 20];
binMiddles = (Most#bins + Rest#bins)/2;
nearest = Nearest[binMiddles];
SelectEquivalents[
data
,
TagElement -> ({First#nearest[#[[1]]], First#nearest[#[[2]]]} &)
,
TransformElement -> (#[[3]] &)
,
TransformResults -> (Total[#2] &)
,
TagPattern -> Flatten[Outer[List, binMiddles, binMiddles], 1]
,
FinalFunction -> (Partition[Flatten[# /. {} -> 0], Length[binMiddles]] &)
]
If you would want to group according to more than two dimensions you could use in FinalFunction this function to give to the list result the desired dimension (I don't remember where I found it).
InverseFlatten[l_,dimensions_]:= Fold[Partition[#, #2] &, l, Most[Reverse[dimensions]]];
I want to test if a list contains consecutive integers.
consQ[a_] := Module[
{ret = True},
Do[If[i > 1 && a[[i]] != a[[i - 1]] + 1, ret = False; Break[]], {i,
1, Length[a]}]; ret]
Although the function consQ does the job, I wonder if there is a better ( shorter, faster ) method of doing this, preferably using functional programming style.
EDIT:
The function above maps lists with consecutive integers in decreasing sequence to False. I would like to change this to True.
Szablics' solution is probably what I'd do, but here's an alternative:
consQ[a : {___Integer}] := Most[a] + 1 === Rest[a]
consQ[_] := False
Note that these approaches differ in how they handle the empty list.
You could use
consQ[a_List ? (VectorQ[#, IntegerQ]&)] := Union#Differences[a] === {1}
consQ[_] = False
You may want to remove the test for integers if you know that every list you pass to it will only have integers.
EDIT: A little extra: if you use a very old version that doesn't have Differences, or wonder how to implement it,
differences[a_List] := Rest[a] - Most[a]
EDIT 2: The requested change:
consQ[a : {Integer___}] := MatchQ[Union#Differences[a], {1} | {-1} | {}]
consQ[_] = False
This works with both increasing and decreasing sequences, and gives True for a list of size 1 or 0 as well.
More generally, you can test if the list of numbers are equally spaced with something like equallySpacedQ[a_List] := Length#Union#Differences[a] == 1
I think the following is also fast, and comparing reversed lists does not take longer:
a = Range[10^7];
f[a_] := Range[Sequence ## ##, Sign[-#[[1]] + #[[2]]]] &#{a[[1]], a[[-1]]} == a;
Timing[f[a]]
b = Reverse#a;
Timing[f[b]]
Edit
A short test for the fastests solutions so far:
a = Range[2 10^7];
Timing#consQSzab#a
Timing#consQBret#a
Timing#consQBeli#a
(*
{6.5,True}
{0.703,True}
{0.203,True}
*)
I like the solutions by the other two, but I'd be concerned about very long lists. Consider the data
d:dat[n_Integer?Positive]:= d = {1}~Join~Range[1, n]
which has its non-sequential point at the very beginning. Setting consQ1 for Brett's and consQ2 for Szabolcs, I get
AbsoluteTiming[ #[dat[ 10000 ] ]& /# {consQ1, consQ2}
{ {0.000110, False}, {0.001091, False} }
Or, roughly a ten times difference between the two, which stays relatively consistent with multiple trials. This time can be cut in roughly half by short-circuiting the process using NestWhile:
Clear[consQ3]
consQ3[a : {__Integer}] :=
Module[{l = Length[a], i = 1},
NestWhile[# + 1 &, i,
(#2 <= l) && a[[#1]] + 1 == a[[#2]] &,
2] > l
]
which tests each pair and only continues if they return true. The timings
AbsoluteTiming[consQ3[dat[ 10000 ]]]
{0.000059, False}
with
{0.000076, False}
for consQ. So, Brett's answer is fairly close, but occasionally, it will take twice as long.
Edit: I moved the graphs of the timing data to a Community Wiki answer.
Fold can be used in a fairly concise expression that runs very quickly:
consQFold[a_] := (Fold[If[#2 == #1 + 1, #2, Return[False]] &, a[[1]]-1, a]; True)
Pattern-matching can be used to provide a very clear expression of intent at the cost of substantially slower performance:
consQMatch[{___, i_, j_, ___}] /; j - i != 1 := False
consQMatch[_] = True;
Edit
consQFold, as written, works in Mathematica v8.0.4 but not in earlier versions of v8 or v7. To correct this problem, there are a couple of alternatives. The first is to explicitly name the Return point:
consQFold[a_] :=
(Fold[If[#2==#1+1, #2, Return[False,CompoundExpression]] &, a[[1]]-1, a]; True)
The second, as suggested by #Mr.Wizard, is to replace Return with Throw / Catch:
consQFold[a_] :=
Catch[Fold[If[#2 == #1 + 1, #2, Throw[False]]&, a[[1]]-1, a]; True]
Since the timing seems to be rather important. I've moved the comparisons between the various methods to this, Community Wiki, answer.
The data used are simply lists of consecutive integers, with a single non-consecutive point, and they're generated via
d : dat[n_Integer?Positive] := (d = {1}~Join~Range[1, n])
d : dat[n_Integer?Positive, p_Integer?Positive] /; p <= n :=
Range[1, p]~Join~{p}~Join~Range[p + 1, n]
where the first form of dat[n] is equivalent to dat[n, 1]. The timing code is simple:
Clear[consQTiming]
Options[consQTiming] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250,500, 1000}};
consQTiming[fcns__, OptionPattern[]]:=
With[{rnd = RandomInteger[{1, #}, 100]},
With[{fcn = #},
Timing[ fcn[dat[10000, #]] & /# rnd ][[1]]/100
] & /# {fcns}
] & /# OptionValue[NonConsecutivePoints]
It generates 100 random integers between 1 and each of {10, 25, 50, 100, 250, 500, 1000} and dat then uses each of those random numbers as the non-consecutive point in a list 10,000 elements long. Each consQ implementation is then applied to each list produced by dat, and the results are averaged. The plotting function is simply
Clear[PlotConsQTimings]
Options[PlotConsQTimings] = {
NonConsecutivePoints -> {10, 25, 50, 100, 250, 500, 1000}};
PlotConsQTimings[timings : { _?VectorQ ..}, OptionPattern[]] :=
ListLogLogPlot[
Thread[{OptionValue[NonConsecutivePoints], #}] & /# Transpose[timings],
Frame -> True, Joined -> True, PlotMarkers -> Automatic
]
I timed the following functions consQSzabolcs1, consQSzabolcs2, consQBrett, consQRCollyer, consQBelisarius, consQWRFold, consQWRFold2, consQWRFold3, consQWRMatch, and MrWizard's version of consQBelisarius.
In ascending order of the left most timing: consQBelisarius, consQWizard, consQRCollyer, consQBrett, consQSzabolcs1, consQWRMatch, consQSzabolcs2, consQWRFold2, consQWRFold3,and consQWRFold.
Edit: Reran all of the functions with timeAvg (the second one) instead of Timing in consQTiming. I did still average over 100 runs, though. For the most part, there was any change, except for the lowest two where there is some variation from run to run. So, take those two lines with a grain of salt as they're timings are practically identical.
I am now convinced that belisarius is trying to get my goat by writing intentionally convoluted code. :-p
I would write: f = Range[##, Sign[#2 - #]]& ## #[[{1, -1}]] == # &
Also, I believe that WReach probably intended to write something like:
consQFold[a_] :=
Catch[
Fold[If[#2 === # + 1, #2, Throw#False] &, a[[1]] - 1, a];
True
]
This question is in a way a continuation of the question I asked here:Simple way to delete a matrix column in Mathematica to which #belisarius and #Daniel provided very helpful answers.
What I am generally trying to do is to extract from a matrix A specific lines and columns OR what remains after what those specified are removed. So this can be formally writtewn as, find TakeOperator and Drop Operator such that:
TakeOperator[A,{i1,..,ip},{j1,...,jq}]=(A[[ik]][[jl]]) (1<=k<=p, 1<=l<=q) =Table[A[[ik]][[jl]],{k,p},{l,q}]
We note Ic={i'1,...,i'p'}=Complement[{1,...,Length[A]},{i1,...,ip}];Jc={j'1,...,j'q'}=Complement[{1,...,Length[A]},{j1,...,jq}];
DropOperator[A,{i1,..,ip},{j1,...,jq}]=(A[[ik]][[jl]]) (1<=k'<=p', 1<=l'<=q') =Table[A[[ik']][[jl']],{k',p'},{l','q}]
While Table as described above does the trick, it is highly inefficient to use Table in that manner.
Just to give an idea, I took # belisarius example:
In: First#Timing[a = RandomInteger[1000, {5000, 5000}];]
Out:0.218
In:Clear[b,c]
In:First#Timing[
b = Table[
If[i < 100, If[j < 100, a[[i]][[j]], a[[i]][[j + 1]]],
If[j < 100, a[[i + 1]][[j]], a[[i + 1]][[j + 1]]]], {i,
4999}, {j, 4999}]]
Out:140.807
In:First#Timing[c = Drop[a, {100}, {100}]]
Out:0.093
In:c===b
Out:True
Note: With respect to the use of Drop in the earlier post, I thought about using it as well, but when I checked the documentation, there was no suggestion of getting it done the way #belisarius and #daniel suggested. If the documentation could be updated in that direction in future releases, it would be helpful.
Part directly supports lists of indices when slicing arrays. The following definitions exploit that:
takeOperator[a_?MatrixQ, rows_List, cols_List] :=
a[[rows, cols]]
dropOperator[a_?MatrixQ, rows_List, cols_List] :=
a[[##]]& ## complementaryIndices[a, rows, cols]
complementaryIndices[a_?MatrixQ, rows_List, cols_List] :=
Complement ### Transpose # {Range /# Dimensions # a, {rows, cols}}
Example use:
a = RandomInteger[1000, {5000, 5000}];
First # Timing # takeOperator[a, Range[1, 5000, 2], Range[1, 5000, 2]]
(* 0.016 *)
First # Timing # dropOperator[a, Range[1, 5000, 2], Range[1, 5000, 2]]
(* 0.015 *)
You can also use explicit ranges in a way that is fairly efficient. They may provide some more flexibility. Here is your example.
a = RandomInteger[1000, {5000, 5000}];
Timing[b = Drop[a, {101}, {101}];]
Out[66]= {0.041993, Null}
Timing[
c = a[[Join[Range[100], Range[102, 5000]],
Join[Range[100], Range[102, 5000]]]];]
Out[67]= {0.061991, Null}
c == b
Out[62]= True
I would also suggest use of Span except offhand I do not see how to get it to work in this setting.
Daniel Lichtblau
Wolfram Research
Hi
I have a list of numbers for example k_1,k_2,...k_n, and f is a function.
Now I apply f on the list of numbers and I need those numbers such that f is increasing,i.e.
f(k_i)>f(k_j) for any i>j .
I can get the results number k_i's each in different line, but I need the results in one table separated with comma or something else and counting the number of results.
For example:
k = Table[k1, k2, k3, k4, k5, k6, k7, k8, k9, k10];
count = 0;
i=1;
For[j = i, j <= 10, j++,
If[f[k[[j]]] - f[k[[i]]] > 0, i = j; Print["k", i];
count = count + 1]];
Print["count= ", count]
I got the result like:
k2
k3
k5
k9
count=4
but I need the results to be together:
{k2,k3,k5,k9}
count=4
any idea?
thanks
Instead of Print, you could do AppendTo, ie
list={};AppendTo[list,5]
It might be good to start learning functional programming approach as Mathematica has tools to make it efficient, your above code might look something like this
pairs = Partition[list, 2, 1];
increasingPairs = Select[pairs, f[First[#]] < f[Last[#]] &];
Last /# increasingPairs
You seem to want the longest increasing subsequence. The simplest and most efficient way I am aware of to get it in Mathematica is the following:
lis[f_, vals_List] := LongestCommonSequence[#, Sort[#]] &[Map[f, vals]];
Example:
In[8]:= lis[# &, {5, 3, 6, 1, 5, 7}]
Out[8]= {5, 6, 7}
In principle, the answer is not unique - there may be several different longest increasing subsequences with the same length.