metafor() non-negative sampling variance - variance

I am trying to learn meta regression using the metafor() package. In running
one of the mixed regression models, I received an error indicating
"There are outcomes with non-positive sampling variances."
I am at lost as to how to proceed with this error. I understand that certain
model statistics (e.g., I^2 and QE) cannot be computed with due to the
presence of non-positive sampling variances. However, I am not sure whether
these results can be interpreted similarly as we would have otherwise. I
also tried using other estimators and/or the unweighted option; the error
still persists.
Any suggestions would be much appreciated.

First of all, to clarify: You are getting a warning, not an error.
Aside from that, I can't think of many situations where it is reasonable to assume that the sampling variance is really equal to 0 in a particular study. I would first question whether this really makes sense. This is why the rma() function is generating this warning message -- to make the user aware of this situation and question whether this really is intended/reasonable.
But suppose that we really want to go through with this, then you have to use an estimator for tau^2 that can handle this (e.g., method="REML" -- which is actually the default). If the estimate of tau^2 ends up equal to 0 as well, then the model cannot be fitted at all (due to division by zero -- and then you get an error). If you do end up with a positive estimate of tau^2, then the results should be okay (but things like the Q-test, I^2, or H^2 cannot be computed then).

Related

Prover9 "Some, but not all, of the requested proofs were found"

I'm running some lattice proofs through Prover9/Mace4. Prover9's saying Exit: Time limit. plus the message in the Title.
I've doubled the time limit from 60 to 120 seconds. Same message (in twice the time). The weird thing is:
there's only one statement to prove. That is, only one label(goal) in the report (what's with the but not all?)
it does seem to have completed the proof, in that it shows last line $F.
Mace4 can't find any counter-examples (I upped its time to 120 seconds).
I've found some GHits for that message, but they seem to be all in Chinese(?)
It's possible the axioms I've given are (mutually) recursive -- I'm trying to introduce a function and a nominated 'absorbing element' [**]; and that solving will need infinitary unification. Does Prover9 do that?
I'm happy to add the axioms and goal to this message. (I'm using a non-standard way to define the meet and join.) But first, are there any sanity checks I should go through?
[**] the absorbing element is neither lattice top nor lattice bottom; more like lattice left-corner. (The element will be lattice bottom just in case the lattice degenerates to two elements.) The function is a partial ordering 'at right angles' to top/bottom. The lattice I expect to be neither complemented nor distributive (again except when 2 elements).
I've reproduced this after much trying, but only by setting some strange option that I'm sure I wouldn't have touched. (The only option I usually change is the Time limit, and I Reset to defaults quite often, so that would have blatted any evidence.)
Here's my guess for what happened.
what's with the but not all?
You can enter multiple goals (providing they're all positive). [**]
With strange option settings, if Prover9 can prove the first but not the second, it'll keep trying until exhausted; but then only report the successful one -- with a $F. result OK.
If you double the Time limit, it'll still prove the first and still keep on trying for the second -- taking twice the time for the same outcome.
Mace4 will come across the first goal, and use up its time trying for a counter-example. There isn't one because it's provable. Again, doubling its Time limit will get the same outcome after twice as long.
[Note **] It's never that I intend to set multiple goals; but when I'm hacking/experimenting with axioms, I keep all the goals in the Goals: box so I can easily toggle un/comment. I guess I didn't comment-out one when I was uncommenting another.
The behaviour usually, as described in the manual, is Prover9 reports success at the first goal it proves; doesn't go on to other goals. If there's multiple provable goals, it seems to choose the easiest/quickest(?) irrespective of position in the file.
But with max_proofs set to more than default 1, Prover9 will keep trying. (There's also a auto_denials flag that has something to do with it I don't understand.)
I've no idea how I set max_proofs -- I didn't recognise the Options/Limits sub-screen when I eventually found it. Weird.

Which statistic to use when testing for multicollinearity in python?

I've been reading a lot about multicollinearity but am still unsure whether to use the Durbin-Watson score, the eigenvalues or the variance inflation factor. I only have three independent variables and the eigenvalues are:
1.81768828 0.95241948 0.22989225
How I understood it, only values too close to zero indicate multicollinearity. I wasn't sure if the last one (0.22) counts as "close to zero" but when checking its eigenvector, the result is:
-0.53977799 -0.44013805 0.71757802
and each of them would indicate collinearity as they are NOT close to zero (this time it's the other way around). Am I correct until here?
The Durbin-Watson score is 1.93 (calculated through the summary() function from statsmodels with an added intercept). This does NOT show strong multicollinearity, right?
As nobody gives clear "cutoff" values, I am a bit confusing as to which values count as "close to zero" or not.
Should I calculate the VIF as well, just to be extra sure?
Any help is much appreciated!

How I predict how some formula will behave with integers?

I am making some software that need to work with integers.
Also I need to apply some formula to those integers, repeatedly over time (example, do x/=z several times in a row for a indefinite amount).
All tools, algorithms and formulas I could think or find, or don't work with integers at all, or work as approximations at best.
For example the x/=z several times in a row for example, you can theoretically calculate what x will be in the 10th time by doing x = x/(z^10), but that will be wrong if the result is fractional, you can use floor(x/(z^10)), but the result will STILL be wrong.
Plotting software that I found also don't have integers at all, or has "floor()/ceil()" functions support, at best, and still the result would fall in the problem of the previous paragraph.
So how I do it?
Here's something to get you going for the iteration of x/=z:
(that should have ended in "all three terms are 0 with regard to integer division")
Now if x or z are negative, you can try and see whether this still holds; I did not invest the time to make the necessary case distinctions, but they should be fairly analogous.
As Karoly Horvath mentions in a comment, without a clear specification of the kinds of functions for which you would like to find a shortcut to replace iterative evaluation, helping you out won't be possible since there are uncountably many functions over the integers, and the same approach won't work for all of them.

Is it worth it to rewrite an if statement to avoid branching?

Recently I realized I have been doing too much branching without caring the negative impact on performance it had, therefore I have made up my mind to attempt to learn all about not branching. And here is a more extreme case, in attempt to make the code to have as little branch as possible.
Hence for the code
if(expression)
A = C; //A and C have to be the same type here obviously
expression can be A == B, or Q<=B, it could be anything that resolve to true or false, or i would like to think of it in term of the result being 1 or 0 here
I have come up with this non branching version
A += (expression)*(C-A); //Edited with thanks
So my question would be, is this a good solution that maximize efficiency?
If yes why and if not why?
Depends on the compiler, instruction set, optimizer, etc. When you use a boolean expression as an int value, e.g., (A == B) * C, the compiler has to do the compare, and the set some register to 0 or 1 based on the result. Some instruction sets might not have any way to do that other than branching. Generally speaking, it's better to write simple, straightforward code and let the optimizer figure it out, or find a different algorithm that branches less.
Jeez, no, don't do that!
Anyone who "penalize[s] [you] a lot for branching" would hopefully send you packing for using something that awful.
How is it awful, let me count the ways:
There's no guarantee you can multiply a quantity (e.g., C) by a boolean value (e.g., (A==B) yields true or false). Some languages will, some won't.
Anyone casually reading it is going observe a calculation, not an assignment statement.
You're replacing a comparison, and a conditional branch with two comparisons, two multiplications, a subtraction, and an addition. Seriously non-optimal.
It only works for integral numeric quantities. Try this with a wide variety of floating point numbers, or with an object, and if you're really lucky it will be rejected by the compiler/interpreter/whatever.
You should only ever consider doing this if you had analyzed the runtime properties of the program and determined that there is a frequent branch misprediction here, and that this is causing an actual performance problem. It makes the code much less clear, and its not obvious that it would be any faster in general (this is something you would also have to measure, under the circumstances you are interested in).
After doing research, I came to the conclusion that when there are bottleneck, it would be good to include timed profiler, as these kind of codes are usually not portable and are mainly used for optimization.
An exact example I had after reading the following question below
Why is it faster to process a sorted array than an unsorted array?
I tested my code on C++ using that, that my implementation was actually slower due to the extra arithmetics.
HOWEVER!
For this case below
if(expression) //branched version
A += C;
//OR
A += (expression)*(C); //non-branching version
The timing was as of such.
Branched Sorted list was approximately 2seconds.
Branched unsorted list was aproximately 10 seconds.
My implementation (whether sorted or unsorted) are both 3seconds.
This goes to show that in an unsorted area of bottleneck, when we have a trivial branching that can be simply replaced by a single multiplication.
It is probably more worthwhile to consider the implementation that I have suggested.
** Once again it is mainly for the areas that is deemed as the bottleneck **

Expectation Maximization Reestimation

Typically, the re-estimation iterative procedure stops when lambda.bar - lambda is less than some epsilon value.
How exactly does one determine this epsilon value? I often only see is written as the general epsilon symbol in papers, and never the actual value used, which I assume would change depending on the data.
So, for instance, if the lambda value of my first iteration was 5*10^-22, second iteration was 1.3*10^-15, third was 8.45*10^-15, fourth was 1.65*10^-14, etc., how would I determine when the algorithm needed no more iteratons?
Moreover, what if I were to apply the same alogrithm to a different datset? would I need to change my epsilon definitions?
Sorry for the long question. Pretty puzzled by it... :)
"how would I determine when the algorithm needed no more iteratons?"
When you get a "good-enough" result within a reasonable amount of time. ;-)
"Moreover, what if I were to apply the same alogrithm to a different datset? would I need to
change my epsilon definitions?"
Yes, most probably.
If you can afford it, you can just let it iterate until the updated value <= the old value (it could be < due to floating point error). I would be inclined to go with this until I ran out of patience or cpu budget.

Resources