I cannot understand mistake lmer - random

I tried to solve the problem reading other answers but did not get the solution.
I am performing a lmer model:
MODHET <- lmer(PERC ~ SITE + TREAT + HET + TREAT*HET + (1|PINE), data = PRESU).
Perc is the percentage of predation. Site is a categorical variable that I am using as blocking factor. It is site identity where I performed the experiement. TREAT is categorical variable of 2 levels. HET is a continuous variable. The number of observation is 56 divided in 7 sites
Maybe the problem is how I expressed the random factor. In every site I selected 8 pines among 15 to perform the experiment. I included the pine identity as categorical random factor. For instance in Site 1 pines are called a1,a3,a7 ecc, while in site 2 are called b1,b4,b12 ecc...
The output of the model is
Error: number of levels of each grouping factor must be < number of observations
I don´t understand where is the mistake. Could it be how I called the pines?
I tried also
MODHET <- lmer(PERC ~ SITE + TREAT + HET + TREAT*HET + (1|SITE:PINE), data = PRESU)
but the output is the same.
I hope that I explained well my problems. I read on this forum similar questions about it but I still do not get the solution.
Thank you for your help

Use argument control = lmerControl(check.nobs.vs.nRE = "ignore") in your lmer-call to suppress this error. However, I guess this does not solve the actual problem. It seems to me that your grouping level contains no "groups", probably "SITE" is your random intercept?
If you consider PINES nested as "subjects" within SITES, then I would suggest following formula:
MODHET <- lmer(PERC ~ TREAT*HET + (1|SITE), data = PRESU)
or,
MODHET <- lmer(PERC ~ TREAT*HET + (1 | SITE / PINE), data = PRESU)
But my answer may be wrong, I'm not sure whether I have enough information to fully understand what you're aiming at.
edit:
Sorry, nesting was not correctly specified, I fixed it in the above formula. See also this answer .

Related

How to include an array of weights to adjust importance of observed data in sm.tsa.UnobservedComponents?

I have used the following 5 lines to achieve a kalman filter with your work for a smoothed pricing model, and it worked great.
mod = sm.tsa.UnobservedComponents(obs, 'local level')
lm = sm.OLS(obs, xlm, missing='drop').fit()
obs_noise = abs(lm.resid).mean()
params = [obs_noise, obs_noise / obs_noise_level]
mod_filter, mod_smooth = mod.filter(params), mod.smooth(params)
However currently I would like to adjust the filtering smoothness at certain time, for example, when unemployment rate or interest rate made a big surge, I would like to make the output (Kalman filtered/smoothed) value closer to the observed value, while in most other time I will keep the what it is from the model. So, I have created an array, while a few items greater than 1, and the others will be exactly 1.
e.g.: ir_coeff = np.array([1,1,1,1,1.345,1.23,1.78,1,1,1])
What could be the best approach to achieve this? Thank you a lot in advance.
I have tried to include it in the output file with a dot product operation, however it is not reasonable to do this.

Parameters for dlib::find_min_bobyqa

I'm working on the C++ version of Matt Zucker's Page dewarping. So far everything works fine, but I have a problem with optimization. In line 748 of Github repo Matt uses optimize function from Scipy. My C++ equivalent is find_min_bobyqa from dlib.net. The code is:
auto f = [&](const column_vector& ppts) { return objective( dstpoints, ppts, keypoint_index); };
dlib::find_min_bobyqa(f,
params,
2 * params.nr() + 1, // npt - number of interpolation points: x.size() + 2 <= npt && npt <= (x.size()+1)*(x.size()+2)/2
dlib::uniform_matrix<double>(params.nr(), 1, -2), // lower bound constraint
dlib::uniform_matrix<double>(params.nr(), 1, 2), // upper bound constraint
1, // initial trust region radius
1e-5, // stopping trust region radius
4000 // max number of objective function evaluations
);
In my concrete example params is a dlib::column_vector with double values and length = 189. Every element of params is less than 2.0 and greater than -2.0. Function objective() returns double value and "alone" it works properly because I get the same value as in the Python version. But after running fin_min_bobyqa function I usually get the message:
Terminate called after throwing an instance of 'dlib:bobyqa_failure', return from BOBYQA because the objective function has been called max_f_evals times.
I set max_f_evals to quite big value to see if it optimizes at all, but it doesn't. I did some tweaking with parameters but without good results. How should I set the parameters of find_min_bobyqa to get the right solution?
I am very interested in this issue as well. Zucker's work, with very minor tweaks, is ideal for straightening sheet music images, and I was looking for ways to implement it in a mobile platform when I came across your question.
My research so far suggests that BOBYQA is not the equivalent of Powell's method in scipy. BOBYQA is constrained, and the one in scipy is not.
See these links for more information, and a possible way to compile the right supporting library - I would try UOBYQA or NEWUOA.
https://github.com/jacobwilliams/PowellOpt
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html#rdd2e1855725e-3
(See the Notes section)
EDIT: see C version here:
https://github.com/emmt/Algorithms/tree/master/newuoa
I wanted to post this as a comment, but I don't have enough points for that.
I am very interested in your progress. If you're willing, please keep me posted.
I finally solved this problem. I used PRAXIS library, because it doesn't need derivative information and is fast.
I modified the code a little to my needs and now it is faster around few seconds than original version written in Python.

SSRS 2008: Using StDevP from multiple fields / Combining multiple fields in general

I'd like to calculate the standard deviation over two fields from the same dataset.
example:
MyFields1 = 10, 10
MyFields2 = 20
What I want now, is the standard deviation for (10,10,20), the expected result is 4.7
In SSRS I'd like to have something like this:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value)
Unfortunately this isn't possible, since (Fields!MyField1.Value + Fields!MyField2.Value) returns a single value and not a list of values. Is there no way to combine two fields from the same dataset into some kind of temporary dataset?
The only solutions I have are:
To create a new Dataset that contains all values from both fields. But this is very annoying because I need about twenty of those and I have six report parameters that need to filter every query. => It's probably getting very slow and annoying to maintain.
Write the formula by hand. But I don't really know how yet. StDevP is not that trivial to me. This is how I did it with Avg which is mathematically simpler:
=(SUM(Fields!MyField1.Value)+SUM(Fields!MyField2.Value))/2
found here: http://social.msdn.microsoft.com/Forums/is/sqlreportingservices/thread/7ff43716-2529-4240-a84d-42ada929020e
Btw. I know that it's odd to make such a calculation, but this is what my customer wants and I have to deliver somehow.
Thanks for any help.
CTDevP is standard deviation.
Such expression works fine for me
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value) but it's deviation from one value (Fields!MyField1.Value + Fields!MyField2.Value) which is always 0.
you can look here for formula:
standard deviation (wiki)
I believe that you need to calculate this for some group (or full dataset), to do this you need set in the CTDevP your scope:
=StDevP(Fields!MyField1.Value + Fields!MyField2.Value, "MyDataSet1")

Naive Bayesian and zero-frequency issue

I think I've implemented most of it correctly. One part confused me:
The zero-frequency problem:
Add 1 to the count for every attribute value-class combination (Laplace estimator) when an attribute value doesn’t occur with every class value.
Here's some of my client code:
//Clasify
string text = "Claim your free Macbook now!";
double posteriorProbSpam = classifier.Classify(text, "spam");
Console.WriteLine("-------------------------");
double posteriorProbHam = classifier.Classify(text, "ham");
Now say the word 'free' is present in the training data somewhere
//Training
classifier.Train("ham", "Attention: Collect your Macbook from store.");
*Lot more here*
classifier.Train("spam", "Free macbook offer expiring.");
But the word is present in my training data for category 'spam' only not in 'ham'. So when I go to calculate posteriorProbHam what do i do when I come across the word 'free'.
Still add one. The reason: Naive Bayes models P("free" | spam) and P("free" | ham) as being completely independent, so you want to estimate the probability of each completely independently. The Laplace estimator you're using for P("free" | spam) is (count("free" | spam) + 1) / count(spam); P("ham" | spam) is the same.
If you think about what it would mean to not add one, it wouldn't really make sense: seeing "free" one time in ham would make it less likely to see "free" in spam.

Algorithm to generate unique (possibly auto incremented) ids

I need to generate unique ids for my application and I am looking for suitable algorithms. I would prefer something like this --
YYYY + MM + DD + HH + MM + SS + <random salt> + <something derived from the preceding values>
F.ex. -
20100128184544ewbhk4h3b45fdg544
I was thinking about using SHA-256 or something but the resultant string should not be too long. I could use UUID but again, they are too long and they are guaranteed to be unique on only one machine.
I would welcome suggestions, ideas. My programming language is Java.
Edit: The ids need not be cryptographically secure. I am looking at simpler hashing algos like the one by Dan Bernstein, etc.
You could use that SHA-256 and then only take the first 10 bytes from the result (or however many you like, balancing length and uniqueness however you like).
So I have finally settled for this -
d = YYYYMMDDHHMMSS
hash = d + sha256(d + random_salt)[:10]
Thank you all for the response.
Try this:
java.security.messageDigest()
I think if you use SHA1(MD5(YYYYMMDDHHMMSS + YourSystemName + ClientName)) u'll be fine with 40 chars.. ;)

Resources