envelope function (spatstat) - error "unused arguments" - arguments

I would like to ask your help for finding the reason why when I use the function envelope, my arguments are not accepted, but defined "unused arguments".
The data I'm using are ppp without marks and I would like to create a L function graph with simulated data and my data.
Here the code for my ppp data:
map2008MLW = ppp(xy2008_BNGppp$x, xy2008_BNGppp$y, window = IoM_polygon_MLWowin)
And then:
L2008 = Lest(map2008MLW,correction="Ripley")
OP = par(mar=c(5,5,4,4))
plot(L2008, . -r ~ r, ylab=expression(hat("L")), xlab = "d (m)"); par(OP)
L2008$iso = L$iso - L$r
L2008$theo = L$theo - L$r
Desired number of simulations
n = 9999
Desired p significance level to display
p = 0.05
And at this point the envelope function doesnt seem very happy:
EL2008 = envelope(map2008MLW[W], Lest, nsim=n, rank=(p * (n + 1)))
Error in envelope(map2008MLW[W], Lest, nsim = n, rank = (p * (n + 1))) :
unused arguments (nsim = n, rank = (p * (n + 1)))
It seems a generic error and I am not sure it is caused by the package spatstat. Please, help me in finding a solution to this, as I can't proceed with my analyses.
Thank you very much,
Martina

The argument rank should be nrank.
Also the relationship between the significance level and the argument nrank is not correct in the example. For a two-sided test, the significance level is alpha = 2 * nrank/(nsim+1), so nrank= alpha * (nsim+1)/2.
You have chosen a significance level of 0.95 but I assume you mean 0.05. So with nsim=9999 you want nrank=0.05 * 10000/2 = 250 to get a test with significance level 0.05.
Such a large number of simulations (9999) is unnecessary in this kind of application. Monte Carlo tests are valid with small values of nsim. In your example I would normally use nsim=39 and nrank=1.
See Chapter 10 of the spatstat book.

Related

Why does finding the eigenvalues of a 4*4 matrix by z3py take so much time and do not give any solutions?

I'm trying to calculate the eigenvalues of a 4*4 matrix called A in my code (I know that the eigenvalues are real values). All the elements of A are z3 expressions and need to be calculated from the previous constraints. The code below is the last part of a long code that tries to calculate matrix A, then its eigenvalues. The code is written as an entire but I've split it into two separate parts in order to debug it: part 1 in which the code tries to find the matrix A and part 2 which is eigenvalues' calculation. In part 1, the code works very fast and calculates A in less than a sec, but when I add part 2 to the code, it doesn't give me any solutions after.
I was wondering what could be the reason? Is it because of the order of the polynomial (which is 4) or what? I would appreciate it if anyone can help me find an alternative way to calculate the eigenvalues or give me some hints on how to rewrite the code so it can solve the problem.
(Note that A2 in the actusl code is a matrix with all of its elements as z3 expressions defined by previous constraints in the code. But, here I've defined the elements as real values just to make the code executable. In this way, the code gives a solution so fast but in the real situation it takes so long, like days.
for example, one of the elements of A is almost like this:
0 +
1*Vq0__1 +
2 * -Vd0__1 +
0 +
((5.5 * Iq0__1 - 0)/64/5) *
(0 +
0 * (Vq0__1 - 0) +
-521702838063439/62500000000000 * (-Vd0__1 - 0)) +
((.10 * Id0__1 - Etr_q0__1)/64/5) *
(0 +
521702838063439/62500000000000 * (Vq0__1 - 0) +
0.001 * (-Vd0__1 - 0)) +
0 +
0 + 0 +
0 +
((100 * Iq0__1 - 0)/64/5) * 0 +
((20 * Id0__1 - Etr_q0__1)/64/5) * 0 +
0 +
-5/64
All the variables in this example are z3 variables.)
from z3 import *
import numpy as np
def sub(*arg):
counter = 0
for matrix in arg:
if counter == 0:
counter += 1
Sub = []
for i in range(len(matrix)):
Sub1 = []
for j in range(len(matrix[0])):
Sub1 += [matrix[i][j]]
Sub += [Sub1]
else:
row = len(matrix)
colmn = len(matrix[0])
for i in range(row):
for j in range(colmn):
Sub[i][j] = Sub[i][j] - matrix[i][j]
return Sub
Landa = RealVector('Landa', 2) # Eigenvalues considered as real values
LandaI0 = np.diag( [ Landa[0] for i in range(4)] ).tolist()
ALandaz3 = RealVector('ALandaz3', 4 * 4 )
############# Building ( A - \lambda * I ) to find the eigenvalues ############
A2 = [[1,2,3,4],
[5,6,7,8],
[3,7,4,1],
[4,9,7,1]]
s = Solver()
for i in range(4):
for j in range(4):
s.add( ALandaz3[ 4 * i + j ] == sub(A2, LandaI0)[i][j] )
ALanda = [[ALandaz3[0], ALandaz3[1], ALandaz3[2], ALandaz3[3] ],
[ALandaz3[4], ALandaz3[5], ALandaz3[6], ALandaz3[7] ],
[ALandaz3[8], ALandaz3[9], ALandaz3[10], ALandaz3[11]],
[ALandaz3[12], ALandaz3[13], ALandaz3[14], ALandaz3[15] ]]
Determinant = (
ALandaz3[0] * ALandaz3[5] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) -
ALandaz3[1] * ALandaz3[4] * (ALandaz3[10] * ALandaz3[15] - ALandaz3[14] * ALandaz3[11]) +
ALandaz3[2] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[15] - ALandaz3[13] * ALandaz3[11]) -
ALandaz3[3] * ALandaz3[4] * (ALandaz3[9] * ALandaz3[14] - ALandaz3[13] * ALandaz3[10]) )
tol = 0.001
s.add( And( Determinant >= -tol, Determinant <= tol ) ) # giving some flexibility instead of equalling to zero
print(s.check())
print(s.model())
Note that you seem to be using Z3 for a type of equations it absolutely isn't meant for. Z is a sat/smt solver. Such a solver works internally with a huge number of boolean equations. Integers and fractions can be converted to boolean expressions, but with general floats Z3 quickly reaches its limits. See here and here for a lot of typical examples, and note how floats are avoided.
Z3 can work in a limited way with floats, converting them to fractions, but doesn't work with approximations and accuracies as in needed in numerical algorithms. Therefore, the results are usually not what you are hoping for.
Finding eigenvalues is a typical numerical problem, where accuracy issues are very tricky. Python has libraries such as numpy and scipy to efficiently deal with those. See e.g. numpy.linalg.eig.
If, however your A2 matrix contains some symbolic expressions (and uses fractions instead of floats), sympy's matrix functions could be an interesting alternative.

How to compute and store the digits of sqrt(n) up to 10^6 decimal places?

I am doing research work. for which I need to compute and store the square root of 2 up to 10^6 places. I have googled for this but I got only a NASA page but how they computed that I don't know. I used set_precision of c++. but that is giving the result up to around 50 places only.what should I do?
NASA page link: https://apod.nasa.gov/htmltest/gifcity/sqrt2.1mil
I have tried binary search also but not fruitful.
long double ans = sqrt(n);
cout<<fixed<<setprecision(50)<<ans<<endl;
You have various options here. You can work with an arbitrary-precision floating-point library (for example MPFR with C or C++, or mpmath or the built-in decimal library in Python). Provided you know what error guarantees that library gives, you can ensure that you get the correct decimal digits. For example, both MPFR and Python's decimal guarantee correct rounding here, but MPFR has the disadvantage (for your particular use-case of getting decimal digits) that it works in binary, so you'd also need to analyse the error induced by the binary-to-decimal conversion.
You can also work with pure integer methods, using an arbitrary-precision integer library (like GMP), or a language that supports arbitrary-precision integers out of the box (for example, Java with its BigInteger class: recent versions of Java provide a BigInteger.sqrt method): scale 2 by 10**2n, where n is the number of places after the decimal point that you need, take the integer square root (i.e., the integer part of the exact mathematical square root), and then scale back by 10**n. See below for a relatively simple but efficient algorithm for computing integer square roots.
The simplest out-of-the-box option here, if you're willing to use another language, is to use Python's decimal library. Here's all the code you need, assuming Python 3 (not Python 2, where this will be horribly slow).
>>> from decimal import Decimal, getcontext
>>> getcontext().prec = 10**6 + 1 # number of significant digits needed
>>> sqrt2_digits = str(Decimal(2).sqrt())
The str(Decimal(2).sqrt()) operation takes less than 10 seconds on my machine. Let's check the length, and the first and last hundred digits (we obviously can't reproduce the whole output here):
>>> len(sqrt2_digits)
1000002
>>> sqrt2_digits[:100]
'1.41421356237309504880168872420969807856967187537694807317667973799073247846210703885038753432764157'
>>> sqrt2_digits[-100:]
'2637136344700072631923515210207475200984587509349804012374947972946621229489938420441930169048412044'
There's a slight problem with this: the result is guaranteed to be correctly rounded, but that's rounded, not truncated. So that means that that final "4" digit could be the result of a final round up - that is, the actual digit in that position could be a "3", with an "8" or "9" (for example) following it.
We can get around this by computing a couple of extra digits, and then truncating them (after double checking that rounding of those extra digits doesn't affect the truncation).
>>> getcontext().prec = 10**6 + 3
>>> sqrt2_digits = str(Decimal(2).sqrt())
>>> sqrt2_digits[-102:]
'263713634470007263192351521020747520098458750934980401237494797294662122948993842044193016904841204391'
So indeed the millionth digit after the decimal point is a 3, not a 4. Note that if the last 3 digits computed above had been "400", we still wouldn't have known whether the millionth digit was a "3" or a "4", since that "400" could again be the result of a round up. In that case, you could compute another two digits and try again, and so on, stopping when you have an unambiguous output. (For further reading, search for "The table maker's dilemma".)
(Note that setting the decimal module's rounding mode to ROUND_DOWN does not work here, since the Decimal.sqrt method ignores the rounding mode.)
If you want to do this using pure integer arithmetic, Python 3.8 offers a math.isqrt function for computing exact integer square roots. In this case, we'd use it as follows:
>>> from math import isqrt
>>> sqrt2_digits = str(isqrt(2*10**(2*10**6)))
This takes a little longer: around 20 seconds on my laptop. Half of that time is for the binary-to-decimal conversion implicit in the str call. But this time, we got the truncated result directly, and didn't have to worry about the possibility of rounding giving us the wrong final digit(s).
Examining the results again:
>>> len(sqrt2_digits)
1000001
>>> sqrt2_digits[:100]
'1414213562373095048801688724209698078569671875376948073176679737990732478462107038850387534327641572'
>>> sqrt2_digits[-100:]
'2637136344700072631923515210207475200984587509349804012374947972946621229489938420441930169048412043'
This is a bit of a cheat, because (at the time of writing) Python 3.8 hasn't been released yet, although beta versions are available. But there's a pure Python version of the isqrt algorithm in the CPython source, that you can copy and paste and use directly. Here it is in full:
import operator
def isqrt(n):
"""
Return the integer part of the square root of the input.
"""
n = operator.index(n)
if n < 0:
raise ValueError("isqrt() argument must be nonnegative")
if n == 0:
return 0
c = (n.bit_length() - 1) // 2
a = 1
d = 0
for s in reversed(range(c.bit_length())):
# Loop invariant: (a-1)**2 < (n >> 2*(c - d)) < (a+1)**2
e = d
d = c >> s
a = (a << d - e - 1) + (n >> 2*c - e - d + 1) // a
return a - (a*a > n)
The source also contains an explanation of the above algorithm and an informal proof of its correctness.
You can check that the results by the two methods above agree (modulo the extra decimal point in the first result). They're computed by completely different methods, so that acts as a sanity check on both methods.
You could use big integers, e.g. BigInteger in Java. Then you calculate the square root of 2e12 or 2e14. Note that sqrt(2) = 1.4142... and sqrt(200) = 14.142... Then you can use the Babylonian method to get all the digits: E.g. S = 10^14. x(n+1) = (x(n) + S / x(n)) / 2. Repeat until x(n) doesn't change. Maybe there are more efficient algorithms that converge faster.
// Input: a positive integer, the number of precise digits after the decimal point
// Output: a string representing the long float square root
function findSquareRoot(number, numDigits) {
function get_power(x, y) {
let result = 1n;
for (let i = 0; i < y; i ++) {
result = result * BigInt(x);
}
return result;
}
let a = 5n * BigInt(number);
let b = 5n;
const precision_digits = get_power(10, numDigits + 1);
while (b < precision_digits) {
if (a >= b) {
a = a - b;
b = b + 10n;
} else {
a = a * 100n;
b = (b / 10n) * 100n + 5n;
}
}
let decimal_pos = Math.floor(Math.log10(number))
if (decimal_pos == 0) decimal_pos = 1
let result = (b / 100n).toString()
result = result.slice(0, decimal_pos) + '.' + result.slice(decimal_pos)
return result
}

How to use the trained char-rnn to generate words?

When the char-rnn is trained, the weights of the network is fixed. If I use the same first char, how can I get the different sentence? Such as the two sentences "What is wrong?" and "What can I do for you?"
have the same first word "W". Can the char-rnn generate the two different sentences?
Yes, you can get different results from the same state by sampling. Take a look at min-char-rnn by Andrej Karpathy. The sample code is at line 63:
def sample(h, seed_ix, n):
"""
sample a sequence of integers from the model
h is memory state, seed_ix is seed letter for first time step
"""
x = np.zeros((vocab_size, 1))
x[seed_ix] = 1
ixes = []
for t in xrange(n):
h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h) + bh)
y = np.dot(Why, h) + by
p = np.exp(y) / np.sum(np.exp(y))
ix = np.random.choice(range(vocab_size), p=p.ravel())
x = np.zeros((vocab_size, 1))
x[ix] = 1
ixes.append(ix)
return ixes
Starting from the same hidden vector h and seed char seed_ix, you'll have a deterministic distribution over the next char - p. But the result is random, because the code performs np.random.choice instead of np.argmax. If the distribution is highly peaked at some char, you'll still get the same outcome most of the time, but in most cases several next chars are highly probable and they will be sampled, thus changing the whole generated sequence.
Note that this isn't the only possible sampling procedure: temperature-based sampling is more popular. You can take a look at, for instance, this post for overview.

Matthews Correlation Coefficient yielding values outside of [-1,1]

I'm using the formula found on Wikipedia for calculating Matthew's Correlation Coefficient. It works fairly well, most of the time, but I'm running into problems in my tool's implementation, and I'm not seeing the problem.
MCC = ((TP*TN)-(FP*FN))/sqrt(((TP + FP)( TP + FN )( TN + FP )( TN + FN )))
Where TP, TN, FP, and FN are the non-negative, integer counts of the appropriate fields.
Which should only return values $\epsilon$ [-1,1]
My implementation is as follows:
double ret;
if ((TruePositives + FalsePositives) == 0 || (TruePositives + FalseNegatives) == 0 ||
( TrueNegatives + FalsePositives) == 0 || (TrueNegatives + FalseNegatives) == 0)
//To avoid dividing by zero
ret = (double)(TruePositives * TrueNegatives -
FalsePositives * FalseNegatives);
else{
double num = (double)(TruePositives * TrueNegatives -
FalsePositives * FalseNegatives);
double denom = (TruePositives + FalsePositives) *
(TruePositives + FalseNegatives) *
(TrueNegatives + FalsePositives) *
(TrueNegatives + FalseNegatives);
denom = Math.Sqrt(denom);
ret = num / denom;
}
return ret;
When I use this, as I said it works properly most of the time, but for instance if TP=280, TN = 273, FP = 67, and FN = 20, then we get:
MCC = (280*273)-(67*20)/sqrt((347*300*340*293)) = 75100/42196.06= (approx) 1.78
Is this normal behavior of Matthews Correlation Coefficient? I'm a programmer by trade, so statistics aren't a part of my formal training. Also, I've looked at questions with answers, and none of them discuss this behavior. Is it a bug in my code or in the formula itself?
The code is clear and looks correct. (But one's eyes can always deceive.)
One issue is a concern whether the output is guaranteed to lie between -1 and 1. Assuming all inputs are nonnegative, though, we can round the numerator up and the denominator down, thereby overestimating the result, by zeroing out all the "False*" terms, producing
TP*TN / Sqrt(TP*TN*TP*TN) = 1.
The lower limit is obtained similarly by zeroing out all the "True*" terms. Therefore, working code cannot produce a value larger than 1 in size unless it is presented with invalid input.
I therefore recommend placing a guard (such as an Assert statement) to assure the inputs are nonnegative. (Clearly it matters not in the preceding argument whether they are integral.) Place another assertion to check that the output is in the interval [-1,1]. Together, these will detect either or both of (a) invalid inputs or (b) an error in the calculation.

John Tukey "median median" (or "resistant line") statistical test for R and linear regression

I'm searching the John Tukey algorithm which compute a "resistant line" or "median-median line" on my linear regression with R.
A student on a mailling list explain this algorithm in these terms :
"The way it's calculated is to divide
the data into three groups, find the
x-median and y-median values (called
the summary point) for each group, and
then use those three summary points to
determine the line. The outer two
summary points determine the slope,
and an average of all of them
determines the intercept."
Article about John tukey's median median for curious : http://www.johndcook.com/blog/2009/06/23/tukey-median-ninther/
Do you have an idea of where i could find this algorithm or R function ? In which packages,
Thanks a lot !
There's a description of how to calculate the median-median line here. An R implementation of that is
median_median_line <- function(x, y, data)
{
if(!missing(data))
{
x <- eval(substitute(x), data)
y <- eval(substitute(y), data)
}
stopifnot(length(x) == length(y))
#Step 1
one_third_length <- floor(length(x) / 3)
groups <- rep(1:3, times = switch((length(x) %% 3) + 1,
one_third_length,
c(one_third_length, one_third_length + 1, one_third_length),
c(one_third_length + 1, one_third_length, one_third_length + 1)
))
#Step 2
x <- sort(x)
y <- sort(y)
#Step 3
median_x <- tapply(x, groups, median)
median_y <- tapply(y, groups, median)
#Step 4
slope <- (median_y[3] - median_y[1]) / (median_x[3] - median_x[1])
intercept <- median_y[1] - slope * median_x[1]
#Step 5
middle_prediction <- intercept + slope * median_x[2]
intercept <- intercept + (median_y[2] - middle_prediction) / 3
c(intercept = unname(intercept), slope = unname(slope))
}
To test it, here's an example:
dfr <- data.frame(
time = c(.16, .24, .25, .30, .30, .32, .36, .36, .50, .50, .57, .61, .61, .68, .72, .72, .83, .88, .89),
distance = c(12.1, 29.8, 32.7, 42.8, 44.2, 55.8, 63.5, 65.1, 124.6, 129.7, 150.2, 182.2, 189.4, 220.4, 250.4, 261.0, 334.5, 375.5, 399.1))
median_median_line(time, distance, dfr)
#intercept slope
# -113.6 520.0
Note the slightly odd way of specifying the groups. The instructions are quite picky about how you define group sizes, so the more obvious method of cut(x, quantile(x, seq.int(0, 1, 1/3))) doesn't work.
I'm a little late to the party, but have you tried line() from the stats package?
From the helpfile:
Value
An object of class "tukeyline".
References
Tukey, J. W. (1977). Exploratory Data Analysis, Reading Massachusetts: Addison-Wesley.
As member of the R Core team, I now have digged in the source code, and also studied the history of it.
Conclusion: The source C source code, added in 19961997, when R was still called alpha (and around version 0.14alpha) already computed the quantiles not quite correctly... for some sample sizes.
More about this on the R mailing lists (not yet).

Resources