I'm very new to Mathematica, and I'm getting pretty frustrated with the errors I'm generating when it comes to creating a function. Below, I have a function I'm writing for 'centering' a matrix where rows correspond to examples, columns to features. The aim is to subtract from each element the mean of the column to which it belongs.
centerdata[datamat_] := (
numdatapoints =
Dimensions[datamat][[1]](*Get number of datapoints*)
numberfeatures =
Dimensions[datamat[[1]]][[1]](*Get number of datapoints*)
columnmean = ((Total[datamat])/numdatapoints)
For[i = 1, i < numdatapoints + 1, i++, (* For each row*)
For[j = 1, j < numfeatures + 1, j++, (* For each element*)
datum = datamat[[i]][[j]];
newval = (datum - (colmean[[j]]));
ReplacePart[datamat, {i, j} -> newval];
];
];
Return[datamat];
)
Running this function for a matrix, I get the following error:
"Set::write: Tag Times in 4 {5.84333,3.054,3.75867,1.19867} is Protected. >>
Set::write: "Tag Times in 4\ 150 is Protected."
Where {5.84333,3.054,3.75867,1.19867} is the first example in the data matrix and 150 is the number of examples in the matrix (I'm using the famous iris dataset, for anyone interested). These errors correspond to this code:
numdatapoints = Dimensions[datamat][[1]](*Get number of datapoints*)
numberfeatures = Dimensions[datamat[[1]]][[1]](*Get number of datapoints*)
Googling and toying with this error hasn't helped much as the replies in general relate to multiplication, which clearly isn't being done here.
Given a table (tab) of data the function Mean[tab] will return a list of the means of each column. Next, you want to subtract this (element-wise) from each row in the table, try this:
Map[Plus[-Mean[tab],#]&,tab]
I have a feeling that there is probably either an intrinsic statistical function to do this in one statement or that I am blind to a much simpler solution.
Since you are a beginner I suggest that you immediately read the documentation for:
Map, which is one of the fundamental operators in functional programming languages such as Mathematica pretends to be; and
pure functions whose use involves the cryptic symbols # and &.
If you are writing loops in Mathematica programs you are almost certainly mis-using the system.
Related
To make sure that this is not a duplicate, I have already checked this and this out.
I want to generate random numbers in a specific range including step size (not continuous distribution).
For example, I want to generate random numbers between -2 and 3 in which the step between two consecutive numbers is 0.02. (e.g. [-2 -1.98 -1.96 ... 2.69 2.98 3] so a generated number should be 2.96 not 2.95).
I have tried this:
a=-2*100;
b=3*100;
r = (b-a).*rand(5,1) + a;
for i=1:length(r)
if r(i) >= 0
if mod(fix(r(i)),2)
r(i)=ceil(r(i))/100;
else
r(i)=floor(r(i))/100;
end
else
if mod(fix(r(i)),2)
r(i)=floor(r(i))/100;
else
r(i)=ceil(r(i))/100;
end
end
end
and it works.
there is an alternative way to do this in MATLAB which is :
y = datasample(-2:0.02:3,5,'Replace',false)
I want to know:
How can I make my own implementation faster (improve the
performance)?
If the second method is faster (it looks faster to me), how can I
use similar implementation in C++?
Those previous answers do cover your case if you read carefully. For example, this one produces random numbers between limits with a step size of one. But let's generalize this to an arbitrary step size in case you can't figure out how to get there. There are several different ways. Here's one using randi where we use the default step size of one and the range from one to the number possible values as indices:
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randi(length(v),[5 1]))
If you look inside datasample (type edit datasample in your command window to view the code) you'll see that it's doing something very similar to this answer. In the case of the 'Replace' option being true see around line 135 (in R2013a at least).
If the 'Replace' option is false, as in your use of datasample above, then randperm actually needs to be used instead (see around line 159):
lo = 2;
hi = 3;
step = 0.02;
v = lo:step:hi;
r = v(randperm(length(v),51))
Because there is no replacement in this case, 51 is the maximum number of values that can be requested in a call and all values of r will be unique.
In C++ you should not use rand() if you're doing scientific computing and generating large numbers of random variates. Instead you should use a large period random number generator such as Mersenne Twister (the default in Matlab). C++11 includes a version of this generator as part of . More here in rand(). If you want something fast, you should try the Double precision SIMD-oriented Fast Mersenne Twister. You'll have to ask another question if you want to implement your code in C++.
The distribution you want is a simple transform of integers, so how about:
step = 0.02
r = randi([-2 3] / step, [5, 1]) * step;
In C++, rand() generates integers too, so it should be pretty obvious how to take a similar approach there.
Let
2|n, 3|n,..., p_i|n, p_ j|n,..., p_k|n
p_i < p_ j< ... < p_k
where all primes up to p_i divide n and
j > i+1
I want to write a code in Mathematica to find p_i and determine {2,3,5,...,p_i}.
thanks.
B = {};
n = 2^6 * 3^8 * 5^3 * 7^2 * 11 * 23 * 29;
For[i = 1, i <= k, i++,
If[Mod[n, Prime[i]] == 0, AppendTo[B, Prime[i]]
If[Mod[n, Prime[i + 1]] > 0, Break[]]]];
mep1= Max[B];
B
mep1
result is
{2,3,5,7,11}
11
I would like to write the code instead of B to get B[n], since I need to draw the graph of mep1[n] for given n.
If I understand your question and code correctly you want a list of prime factors of the integer n but only the initial part of that list which matches the initial part of the list of all prime numbers.
I'll first observe that what you've posted looks much more like C or one of its relatives than like Mathematica. In fact you don't seem to have used any of the power of Mathematica's in-built functions at all. If you want to really use Mathematica you need to start familiarising yourself with these functions; if that doesn't appeal stick to C and its ilk, it's a fairly useful programming language.
The first step I'd take is to get the prime factors of n like this:
listOfFactors = Transpose[FactorInteger[n]][[1]]
Look at the documentation for the details of what FactorInteger returns; here I'm using transposition and part to get only the list of prime factors and to drop their coefficients. You may not notice the use of the Part function, the doubled square brackets are the usual notation. Note also that I don't have Mathematica on this machine so my syntax may be a bit awry.
Next, you want only those elements of listOfFactors which match the corresponding elements in the list of all prime numbers. Do this in two steps. First, get the integers from 1 to k at which the two lists match:
matches = TakeWhile[Range[Length[listOfFactors]],(listOfFactors[[#]]==Prime[#])&]
and then
listOfFactors[[matches]]
I'll leave it to you to:
assemble these fragments into the function you want;
correct the syntactical errors I have made; and
figured out exactly what is going on in each (sub-)expression.
I make no warranty that this approach is the best approach in any general sense, but it makes much better use of Mathematica's intrinsic functionality than your own first try and will, I hope, point you towards better use of the system in future.
I was wondering if it is possible to use NMinimize from mathematica with an objective function, which contains random variables? E.g. I have a function with parameters which follow a distribution (normal and truncated normal). I want to fit its histogram to data that I have and constructed an objective function which now I need to minimize (so, the objective function depends on the mus and sigmas of the parameters and need to be determined). If I run my code, there is an error message: It claims the parameter for the NormalDistribution needs to be positive (If I plug in numbers for the mus and sigmas of my objective functionby hand, i don't get an error message).
So, I am wondering if it is not possible for NMinimize to handle a non-analytic function.
Thanks!
Here, I give you an example code (please note that the original function is more complicated)
listS and listT are both lists of event times. I want to fit the curve of my statistical model for the times (here, a very simple one, it consists of a truncated normal distribution) to the data I have.
For this I compare the survival curves and need to minimize the sum of the least squares.
My problem is that the function NMinimize doesn't seem to work. (Please note, that the original objective function consists of a more complicated function with parameters that are random variables)
(* Both lists are supposed to be the list of times *)
SurvivalS[listeS_, x_] := Module[{res, survivald},
survivald = SurvivalDistribution[listeS];
res = SurvivalFunction[survivald, x];
res]
Residuum[listeT_, listeS_] :=
Table[(SurvivalS[listeT, listeT[[i]]] - SurvivalS[listeS, listeT[[i]]]), {i,
1, dataN}];
LeastSquare[listeT_, listeS_] :=
Total[Function[x, x^2] /#
Residuum[listeT,
listeS]];(* objective function, here ist is the sum of least square *)
objectiveF[mu_, sigma_] :=
Piecewise[{{LeastSquare[listeT, listeS[mu, sigma]], mu > 0 && sigma > 0}},
20 (1 + (sigma + mu)^2)];
pool = 100; (* No. points from MonteCarlo *)
listeS[mu_, sigma_] := RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[mu, sigma]],pool];(* simulated data *)
listeT = Sort[RandomVariate[TruncatedDistribution[{0, 1}, NormalDistribution[.5, .9]],60]]; (* list of "measured" data *)
dataN = Length[listeT];
NMinimize[objectiveF[mu, .9], {{mu, .4}}]
The error message is: "RandomVariate::realprm: Parameter mu at position 1 in NormalDistribution[mu,0.9] is expected to be real. >>"
My question sounds more general, but I have a specific example. I have a list of data in form:
plotDataAll={{DateList1, integerValue1}, {DateList2, integerValue2}...}
The dates are sorted chronologically, that is plotDataAll[[2,1]] is a more recent time then plotDataAll[[1,1]].
I want to create plots of specific periods, 24h ago, 1 week ago, etc. For that I need just a portion of data. Here's how I got what I wanted:
mostRecentDate=Max[Map[AbsoluteTime, plotDataAll[[All,1]]]];
plotDataLast24h=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-86400.)&];
plotDataLastWeek=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-604800.)&];
plotDataLastMonth=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-2.592*^6)&];
plotDataLast6M=Select[plotDataAll,AbsoluteTime[#[[1]]]>(mostRecentDate-1.5552*^7)&];
Then I used DateListPlot to plot the data. This becomes slow if you need to do this for many sets of data.
What comes to my mind, if I could find the index of first element in list that satisfies the date condition, because it's chronologically sorted, the rest of them should satisfy the condition as well. So I would have:
plotDataLast24h=plotDataAll[[beginningIndexThatSatisfiesLast24h;;Length[plotDataAll]]
But how do I get the index of the first element that satisfies the condition?
If you have a faster way to do this, please share your answer. Also, if you have a simple, faster, but sub-optimal solution, that's fine too.
EDIT:
Time data is not in regular intervals.
If your data is at regular intervals you should be able to know how many elements constitute a day, week, etc. and use Part.
plotDataAll2[[knownIndex;;-1]]
or more specifically if the data was hourly:
plotDataAll2[[-25;;-1]]
would give you the last 24 hours. If the spacing is irregular then use Select or Pick. Date and time functions in Mma are horrendously slow unfortunately. If you are going to do a lot of date and time calculation better to do a conversion to AbsoluteTime just once and then work with that. You will also notice that your DateListPlots render much faster if you use AbsoluteTime.
plotDataAll2=plotDataAll;
plotDataAll2[[All,1]]=AbsoluteTime/#plotDataAll2[[All,1]];
mostRecentDate=plotDataAll2[[-1,1]]
On my computer Pick is about 3 times faster but there may be other improvements you can make to the code below:
selectInterval[data_, interval_] := (tmp = data[[-1, 1]] - interval;
Select[data, #[[1]] > tmp &])
pickInterval[data_, interval_] := (tmp = data[[-1, 1]] - interval;
Pick[data, Sign[data[[All, 1]] - tmp], 1])
So to find data within the last week:
Timing[selectInterval[plotDataAll2, 604800]]
Timing[pickInterval[plotDataAll2, 604800]]
The thing that you want to avoid is checking all the values in the data table. Since the data is sequential you can just start checking from the back and stop when you have found the correct index.
Schematically:
tab = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
i = j = Length#tab;
While[tab[[i]] > 5, --i];
tab[[i ;; j]]
-> {5, 6, 7, 8, 9}
sustitute > 5 for whatever you want to check for. I didn't have time to test this right now but in your case, e.g.,
maxDate=AbsoluteTime#plotDataAll[[-1,1]]; (* no need to find Max if data is sequential*)
i24h = iWeek = iMonth = iMax = Length#plotDataAll;
While[AbsoluteTime#plotDataAll[[i24h,1]] > maxDate-86400.,--i24h];
While[AbsoluteTime#plotDataAll[[iWeek,1]] > maxDate-604800.,--iWeek];
While[AbsoluteTime#plotDataAll[[iMonth,1]] > maxDate-2.592*^6.,--iMonth];
While[AbsoluteTime#plotDataAll[[i6Month,1]] > maxDate-1.5552*^7.,--i6Month];
Then, e.g.,
DateListPlot#plotDataAll[[i24h;;iMax]]
If you want to start somewhere in the middle of plotDataAll just use a While to first find the starting point and set iMax and maxDate apropriately.
For large data sets this may be one of the few instances where a loop construct is better than MMA's inbuilt functions. That, however, may be my own ignorance and if anyone here knows of a MMA inbuilt function that does this sort of "stop when match found" comparison better than While.
EDIT: Timing comparisons
I played around a bit with Mike's and my solution and compared it to the OP's method. Here is the toy code I used for each solution
tab = Range#1000000;
(* My solution *)
i = j = tab[[-1]];
While[tab[[i]] > j - 24, --i];
tab[[i ;; j]]
(* Mike's solution *)
tmp = tab[[-1]] - 24;
Pick[tab, Sign[tab[[All]] - tmp], 1]
(* Enedene's solution *)
j = tab[[-1]];
Select[tab, # > (j - 24) &]
Here are the results (OS X, MMA 8.0.4, Core2Duo 2.0GHz)
As you can see, Mike's solution has a definite advantage over enedene's solution but, as I surmised originally, the downside of using inbuilt functions like Pick is that they still perform a comparative check on all the element in a list which is highly superfluous in this instance. My solution has constant time due to the fact that no unneccessary checks are made.
I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
PlotRange->All]
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
FindMaximum[convolve[n],{n,5,7}]
but the result is useless and inaccurate:
FindMaximum::fmmp:
Machine precision is insufficient to achieve the requested accuracy or
precision.
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
List(1.984759605826571e7,List(Rule(n,6.2831853071787975)))
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).
Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072
Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?
(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
{{a,Mean[data]},{b,(Max[data]-Min[data])/2},{c,2*f2*Pi/Length[data]},d}]
(* Result (using //InputForm):
FindMinimum::sszero:
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
*)
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.