What does the sigma coefficient in logistic panel regression with pglm indicates? - panel

I am estimating panel data (large N, small T) with a dichotomous outcome variable and a count variable as independent variable using pglm from the pglm package. In interpreting the output of a random effects logit model, I assumed that the coefficient sigma indicates the estimated variance of the individual random effects. However, since in my analysis this value becomes negative w/o an error warning, I wondered if I was interpreting it correctly.
Since the documentation of the pglm-package does not provide any information about the interpretation of sigma, I am grateful for any help.

Related

How to calculate the standard errors of ERGM predicted probabilities?

I'm having trouble estimating the standard errors from the predicted probabilities from a ERGM model, to calculate a confidence interval. Getting the predicted probabilities is not a problem, but I want to get a sense of the uncertainty surrounding the predictions.
Below is a reproduceable example based on the data set of marriage and business ties among Renaissance Florentine families.
library(statnet)
data(flo)
flomarriage <- network(flo,directed=FALSE)
flomarriage
flomarriage %v% "wealth" <- c(10,36,27,146,55,44,20,8,42,103,48,49,10,48,32,3)
flomarriage
gest <- ergm(flomarriage ~ edges +
absdiff("wealth"))
summary(gest)
plogis(coef(gest)[['edges']] + coef(gest)[['absdiff.wealth']]*10)
Based on the model, it is estimated that a wealth difference of 10 corresponds to a 0.182511 probability of a tie. My first question is, is this a correct interpretation? And my second question is, how does one calculate the standard error of this probability?
This is the correct interpretation for a dyad-independent model such as this one. For a dyad-dependent model, it would be the conditional probability given the rest of the network.
You can obtain the standard error of the prediction on the logit scale by rewriting your last line as a dot product of a weight vector and the coefficient vector:
eta <- sum(c(1,10)*coef(gest))
plogis(eta)
then, since vcov(gest) is the covariance matrix of parameter estimates, and using the variance formulas (e.g., http://www.stat.cmu.edu/~larry/=stat401/lecture-13.pdf),
(var.eta <- c(1,10)%*%vcov(gest)%*%c(1,10))
You can then get the variance (and the standard error) of the predicted probability using the Delta Method (e.g., https://blog.methodsconsultants.com/posts/delta-method-standard-errors/). However, unless you require the standard error as such, as opposed to a confidence interval, my advice would be to first calculate the interval for eta (i.e., eta + qnorm(c(0.025,0.0975))*sqrt(var.eta)), then call plogis() on the endpoints.
Really very helpful answer! Many thanks for it.
Here's just a small, almost cosmetic note about the suggested formula: 1) the upper bound in qnorm should be 0.975, not 0.0975, and 2) use matrix multiplication before sqrt --> eta + qnorm(c(0.025,0.975))%*%sqrt(var.eta).

How to find accuracy of matrix multiplication with floating-point numbers?

I am trying to analyze how floating-point computation becomes more inaccurate when the data size decreases. In order to do that, I wanted to perform simple matrix operations on different variations of floating point representation, such as float64, float32, and float16. Since float64 computation will give the most precise and accurate result out of the three, I assume all float64 computation to give the expected result (i.e., error = 0).
The issue is that when I compare the calculated result with the expected result, I don't have an exact idea of how to quantify all the individual errors that I get into a single metric. I know about certain ways to go about it, such as finding the error mean, or the sum of square of errors (SSE), but I just wanted to know if there was a standard way of calculating the overall error of a given matrix computation.
Perhaps a variant of the condition number can be helpful? See here: https://en.wikipedia.org/wiki/Condition_number#Matrices
if there was a standard way of calculating the overall error of a given matrix computation.
Consider the case when a matrix is size 1. Then we are in a familiar 1 dimension domain.
How to compare y_computed_as_float vs y_expected? Even in this case, there is not a standard of how these should compare as floating point numbers. Subtract? Divide? It is often context sensitive. So "no" to OP's question.
Yet there are common practices. So a potential "yes" to OP question for select cases.
Floating point computations are often assessed by the difference between computed and math expected values scaled by the Unit in the last place*.
error = (y_computed_as_float - y_expected)/ulpf((float) y_expected);
For an N dimension matrix, the matrix error could use a root mean square of the N2 element errors.
* Scaling by ULP has some issues near each power of 2 and more near 0.0. There are ways to mitigate that, but we a getting into the weeds.

optimize integral f(x)exp(-x) from x=0,infinity

I need a robust integration algorithm for f(x)exp(-x) between x=0 and infinity, with f(x) a positive, differentiable function.
I do not know the array x a priori (it's an intermediate output of my routine). The x array is typically ~log-equispaced, but highly irregular.
Currently, I'm using the Simpson algorithm, buy my problem is that often the domain is highly undersampled by the x array, which produces unrealistic values for the integral.
On each run of my code I need to do this integration thousands of times (each with a different set of x values), so I need to find an efficient and robust way to integrate this function.
More details:
The x array can have between 2 and N points (N known). The first value is always x[0] = 0.0. The last point is always a value greater than a tunable threshold x_max (such that exp(x_max) approx 0). I only know the values of f at the points x[i] (though the function is a smooth function).
My first idea was to do a Laguerre-Gauss quadrature integration. However, this algorithm seems to be highly unreliable when one does not use the optimal quadrature points.
My current idea is to add a set of auxiliary points, interpolating f, such that the Simpson algorithm becomes more stable. If I do this, is there an optimal selection of auxiliary points?
I'd appreciate any advice,
Thanks.
Set t=1-exp(-x), then dt = exp(-x) dx and the integral value is equal to
integral[ f(-log(1-t)) , t=0..1 ]
which you can evaluate with the standard Simpson formula and hopefully get good results.
Note that piecewise linear interpolation will always result in an order 2 error for the integral, as the result amounts to a trapezoid formula even if the method was Simpson. For better errors in the Simpson method you will need higher interpolation degrees, ideally cubic splines. Cubic Bezier polynomials with estimated derivatives to compute the control points could be a fast compromise.

How to efficiently sample the continuous negative binomial distribution?

First, for context, I am working on a game where when you do something good you earn positive credits and when you do something bad you earn negative credits, and each credit corresponds to flipping a biased coin where if you get heads then something happens (good if its a positive credit, bad if its a negative credit) and otherwise nothing happens.
The deal is that I want to handle the case of multiple credits and fractional credits, and I would like to have flips use up credits so that if something good/bad happens then the leftover credits carry over. A straightforward way of doing this is to just perform a bunch of trials, and in particular for the case of fractional credits we can multiply the number of credits by X and the likelihood of something happening by 1/X (the distribution has the same expectation but slightly different weights); unfortunately, this places a practical limit on how many credits the user can get and also how many decimal places can be in the number of credits since this results in an unbounded amount of work.
What I would like to do is to take advantage of the fact that I am sampling the continuous negative binomial distribution, which is the distribution of how many trials it takes to get heads, i.e. so that if f(X) is the distribution then f(X) gives the probability that there will be X tails before we run into a heads, where X need not be an integer. If I can sample this distribution, then what I can do is that if X is the number of tails then I can see if X is greater or less than the number of credits; if it is greater than then we use up all of the credits but nothing happens, and if it is less than or equal to then something good happens and we subtract X from the number of credits. Furthermore, because the distribution is continuous I can easily handle fractional credits.
Does anyone know of a way for me to be able to efficiently sample the continuous negative binomial distribution (that is, a function that generates random numbers from this distribution)?
This question may be better answered on StatsExchange, but here I will take a stab at it.
You are correct that trying to compute this directly will be computationally expensive as you cannot avoid the beta and/or gamma function dependencies. The only statistically valid approximation I'm aware of is if the number of successes s required is large, and p is neither very small nor very large, then you can approximate it with a normal distribution with special values for the mean and variance. You can read more here but I'm guessing this approximation will not be generally applicable for you.
The negative binomial distribution can also be approximated as a mixture of Poisson distributions, but this doesn't save you from the gamma function dependency.
The only efficient class of negative binomial samplers that I'm aware of use optimized accept-reject techniques. Pages 10-11 of this PDF here describe the concept behind the method. Page 6 (page 295 internally) of this PDF here contains source code for sampling binomial deviates using related techniques. Note that even these methods still require random uniform deviates as well as sqrt(), log(), and gammln() calls. For small numbers of trials (less than 100 maybe?) I wouldn't be surprised at all if just simulating the trials with fast random number generator is faster than even the accept-reject techniques. Definitely start by getting a fast PRNG; they are not all created equal.
Edit:
The following pseudo-code would probably be fairly efficient to draw a random discrete negative binomial-distributed value as long as p is not very large (too close to 1.0). It will return the number of trials required before reaching your first "desired" outcome (which is actually the first "failure" in terms of the distribution):
// assume p and r are the parameters to the neg. binomial dist.
// r = number of failures (you'll set to one for your purpose)
// p = probability of a "success"
double rnd = _rnd.nextDouble(); // [0.0, 1.0)
int k = 0; // represents the # of successes that occur before 1st failure
double lastPmf = (1 - p)^r;
double cdf = lastPmf;
while (cdf < rnd)
{
lastPmf *= (p * (k+r) / (k+1));
cdf += lastPmf;
k++;
}
return k;
// or return (k+1) to also count the trial on which the failure occurred
Using the recurrence relationship saves over repeating the factorial independently at each step. I think using this, combined with limiting your fractional precision to 1 or 2 decimal places (so you only need to multiply by 10 or 100 respectively) might work for your purposes. You are drawing only one random number and the rest is just multiplications--it should be quite fast.

Frequency determination from sparsely sampled data

I'm observing a sinusoidally-varying source, i.e. f(x) = a sin (bx + d) + c, and want to determine the amplitude a, offset c and period/frequency b - the shift d is unimportant. Measurements are sparse, with each source measured typically between 6 and 12 times, and observations are at (effectively) random times, with intervals between observations roughly between a quarter and ten times the period (just to stress, the spacing of observations is not constant for each source). In each source the offset c is typically quite large compared to the measurement error, while amplitudes vary - at one extreme they are only on the order of the measurement error, while at the other extreme they are about twenty times the error. Hopefully that fully outlines the problem, if not, please ask and i'll clarify.
Thinking naively about the problem, the average of the measurements will be a good estimate of the offset c, while half the range between the minimum and maximum value of the measured f(x) will be a reasonable estimate of the amplitude, especially as the number of measurements increase so that the prospects of having observed the maximum offset from the mean improve. However, if the amplitude is small then it seems to me that there is little chance of accurately determining b, while the prospects should be better for large-amplitude sources even if they are only observed the minimum number of times.
Anyway, I wrote some code to do a least-squares fit to the data for the range of periods, and it identifies best-fit values of a, b and d quite effectively for the larger-amplitude sources. However, I see it finding a number of possible periods, and while one is the 'best' (in as much as it gives the minimum error-weighted residual) in the majority of cases the difference in the residuals for different candidate periods is not large. So what I would like to do now is quantify the possibility that the derived period is a 'false positive' (or, to put it slightly differently, what confidence I can have that the derived period is correct).
Does anybody have any suggestions on how best to proceed? One thought I had was to use a Monte-Carlo algorithm to construct a large number of sources with known values for a, b and c, construct samples that correspond to my measurement times, fit the resultant sample with my fitting code, and see what percentage of the time I recover the correct period. But that seems quite heavyweight, and i'm not sure that it's particularly useful other than giving a general feel for the false-positive rate.
And any advice for frameworks that might help? I have a feeling this is something that can likely be done in a line or two in Mathematica, but (a) I don't know it, an (b) don't have access to it. I'm fluent in Java, competent in IDL and can probably figure out other things...
This looks tailor-made for working in the frequency domain. Apply a Fourier transform and identify the frequency based on where the power is located, which should be clear for a sinusoidal source.
ADDENDUM To get an idea of how accurate is your estimate, I'd try a resampling approach such as cross-validation. I think this is the direction that you're heading with the Monte Carlo idea; lots of work is out there, so hopefully that's a wheel you won't need to re-invent.
The trick here is to do what might seem at first to make the problem more difficult. Rewrite f in the similar form:
f(x) = a1*sin(b*x) + a2*cos(b*x) + c
This is based on the identity for the sin(u+v).
Recognize that if b is known, then the problem of estimating {a1, a2, c} is a simple LINEAR regression problem. So all you need to do is use a 1-variable minimization tool, working on the value of b, to minimize the sum of squares of the residuals from that linear regression model. There are many such univariate optimizers to be found.
Once you have those parameters, it is easy to find the parameter a in your original model, since that is all you care about.
a = sqrt(a1^2 + a2^2)
The scheme I have described is called a partitioned least squares.
If you have a reasonable estimate of the size and the nature of your noise (e.g. white Gaussian with SD sigma), you can
(a) invert the Hessian matrix to get an estimate of the error in your position and
(b) should be able to easily derive a significance statistic for your fit residues.
For (a), compare http://www.physics.utah.edu/~detar/phys6720/handouts/curve_fit/curve_fit/node6.html
For (b), assume that your measurement errors are independent and thus the variance of their sum is the sum of their variances.

Resources