vim rand() is not deterministic. Is this expected? - random

According to :help rand(),
rand([{expr}])
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
Examples:
:echo rand()
:let seed = srand()
:echo rand(seed)
:echo rand(seed) % 16 " random number 0 - 15
It doesn't explain how a seed is changed every time rand() is called, but I expected it to be deterministically altered because
C++'s std::rand() does so,
and Wikipedia says
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm...
However, in the code below, the value of a is deterministic but the values of b are not deterministic; they take different values when you restart the script.
let seed = srand(0)
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
let seed = [0, 1, 2, 3]
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
Is this an expected behavior? I think the behavior contradicts the documentation.
Environments:
~ $ vi --version
VIM - Vi IMproved 8.2 (2019 Dec 12, compiled Apr 30 2020 13:32:36)
Included patches: 1-664

An algorithm used in Vim is fully deterministic. What creates a confusion is the fact that calling rand(seed) updates the seed "in place", but does not update any internal value(s). Therefore any subsequent rand() uses another (more or less random - quality depends on platform) internal seed value. So if you want to produce fully deterministic sequence, you must consequently invoke rand(seed) with the same variable.
This behaviour is easy to deduce from Vim's source code. Also :h rand() says that:
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
If you find the wording misleading you can open an issue on github.

The documentation is badly written but the behavior is actually the expected one from the source code's perspective.
Analysis
rand() is defined as f_rand() in src/evalfunc.c. From the snippet at the end of this answer, we know some things:
f_rand() has only two sets of static variables: gx, ..., gw and initialized.
gx, ..., gw are the internal seeds. Their values are touched and referenced only when f_rand() is called with no argument (i.e. when argvars[0].v_type == VAR_UNKNOWN).
initialized remembers if f_rand() has ever been called with no argument and it is also touched and referenced only when f_rand() is called with no argument.
When f_rand() is called with a seed,
The value of the seed is used once and that is not saved as a static variable. In other words, the sentence "{expr} can be initialized by srand() and will be updated by rand()" in the documentation is nothing but a "lie"; {expr} is not remembered and thus not updated by the subsequent f_rand().
The value of the seed is updated in place via the pointers lx, ..., lw.
Conclusion
The sentence
{expr} can be initialized by srand() and will be updated by rand()
shall be modified to
{expr} can be initialized by srand() and will be updated by rand({expr}). You may want to store a seed into a variable and pass it to rand() since {expr} is not remembered in the function.
If you need the deterministic rand(), do this:
let seed = srand(0)
let a = rand(seed) "The value of `seed` is changed in place.
let b = rand(seed) "ditto
echo [a, b]
The Source Code of rand()
#define ROTL(x, k) ((x << k) | (x >> (32 - k)))
#define SPLITMIX32(x, z) ( \
z = (x += 0x9e3779b9), \
z = (z ^ (z >> 16)) * 0x85ebca6b, \
z = (z ^ (z >> 13)) * 0xc2b2ae35, \
z ^ (z >> 16) \
)
#define SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w) \
result = ROTL(y * 5, 7) * 9; \
t = y << 9; \
z ^= x; \
w ^= y; \
y ^= z, x ^= w; \
z ^= t; \
w = ROTL(w, 11);
/*
* "rand()" function
*/
static void
f_rand(typval_T *argvars, typval_T *rettv)
{
list_T *l = NULL;
static UINT32_T gx, gy, gz, gw;
static int initialized = FALSE;
listitem_T *lx, *ly, *lz, *lw;
UINT32_T x, y, z, w, t, result;
if (argvars[0].v_type == VAR_UNKNOWN)
{
// When no argument is given use the global seed list.
if (initialized == FALSE)
{
// Initialize the global seed list.
init_srand(&x);
gx = SPLITMIX32(x, z);
gy = SPLITMIX32(x, z);
gz = SPLITMIX32(x, z);
gw = SPLITMIX32(x, z);
initialized = TRUE;
}
SHUFFLE_XOSHIRO128STARSTAR(gx, gy, gz, gw);
}
else if (argvars[0].v_type == VAR_LIST)
{
l = argvars[0].vval.v_list;
if (l == NULL || list_len(l) != 4)
goto theend;
lx = list_find(l, 0L);
ly = list_find(l, 1L);
lz = list_find(l, 2L);
lw = list_find(l, 3L);
if (lx->li_tv.v_type != VAR_NUMBER) goto theend;
if (ly->li_tv.v_type != VAR_NUMBER) goto theend;
if (lz->li_tv.v_type != VAR_NUMBER) goto theend;
if (lw->li_tv.v_type != VAR_NUMBER) goto theend;
x = (UINT32_T)lx->li_tv.vval.v_number;
y = (UINT32_T)ly->li_tv.vval.v_number;
z = (UINT32_T)lz->li_tv.vval.v_number;
w = (UINT32_T)lw->li_tv.vval.v_number;
SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w);
lx->li_tv.vval.v_number = (varnumber_T)x;
ly->li_tv.vval.v_number = (varnumber_T)y;
lz->li_tv.vval.v_number = (varnumber_T)z;
lw->li_tv.vval.v_number = (varnumber_T)w;
}
else
goto theend;
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = (varnumber_T)result;
return;
theend:
semsg(_(e_invarg2), tv_get_string(&argvars[0]));
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = -1;
}

Related

For given two integers A and B, find a pair of numbers X and Y such that A = X*Y and B = X xor Y

I'm struggling with this problem I've found in a competitive programming book, but without a solution how to do it. For given two integers A and B (can fit in 64-bit integer type), where A is odd, find a pair of numbers X and Y such that A = X*Y and B = X xor Y.
My approach was to list all divisors of A and try pairing numbers under sqrt(A) with numbers over sqrt(A) that multiply up to A and see if their xor is equal to B. But I don't know if that's efficient enough.
What would be a good solution/algorithm to this problem?
You know that at least one factor is <= sqrt(A). Let's make that one X.
The length of X in bits will be about half the length of A.
The upper bits of X, therefore -- the ones higher in value than sqrt(A) -- are all 0, and the corresponding bits in B must have the same value as the corresponding bits in Y.
Knowing the upper bits of Y gives you a pretty small range for the corresponding factor X = A/Y. Calculate Xmin and Xmax corresponding to the largest and smallest possible values for Y, respectively. Remember that Xmax must also be <= sqrt(A).
Then just try all the possible Xs between Xmin and Xmax. There won't be too many, so it won't take very long.
The other straightforward way to solve this problem relies on the fact that the lower n bits of XY and X xor Y depend only on the lower n bits of X and Y. Therefore, you can use the possible answers for the lower n bits to restrict the possible answers for the lower n+1 bits, until you're done.
I've worked out that, unfortunately, there can be more than one possibility for a single n. I don't know how often there will be a lot of possibilities, but it's probably not too often if at all, so this may be fine in a competitive context. Probabilistically, there will only be a few possibilities, since a solution for n bits will provide either 0 or two solutions for n+1 bits, with equal probability.
It seems to work out pretty well for random input. Here's the code I used to test it:
public static void solve(long A, long B)
{
List<Long> sols = new ArrayList<>();
List<Long> prevSols = new ArrayList<>();
sols.add(0L);
long tests=0;
System.out.print("Solving "+A+","+B+"... ");
for (long bit=1; (A/bit)>=bit; bit<<=1)
{
tests += sols.size();
{
List<Long> t = prevSols;
prevSols = sols;
sols = t;
}
final long mask = bit|(bit-1);
sols.clear();
for (long prevx : prevSols)
{
long prevy = (prevx^B) & mask;
if ((((prevx*prevy)^A)&mask) == 0)
{
sols.add(prevx);
}
long x = prevx | bit;
long y = (x^B)&mask;
if ((((x*y)^A)&mask) == 0)
{
sols.add(x);
}
}
}
tests += sols.size();
{
List<Long> t = prevSols;
prevSols = sols;
sols = t;
}
sols.clear();
for (long testx: prevSols)
{
if (A/testx >= testx)
{
long testy = B^testx;
if (testx * testy == A)
{
sols.add(testx);
}
}
}
System.out.println("" + tests + " checks -> X=" + sols);
}
public static void main(String[] args)
{
Random rand = new Random();
for (int range=Integer.MAX_VALUE; range > 32; range -= (range>>5))
{
long A = rand.nextLong() & Long.MAX_VALUE;
long X = (rand.nextInt(range)) + 2L;
X|=1;
long Y = A/X;
if (Y==0)
{
Y = rand.nextInt(65536);
}
Y|=1;
solve(X*Y, X^Y);
}
}
You can see the results here: https://ideone.com/cEuHkQ
Looks like it usually only takes a couple thousand checks.
Here's a simple recursion that observes the rules we know: (1) the least significant bits of both X and Y are set since only odd multiplicands yield an odd multiple; (2) if we set X to have the highest set bit of B, Y cannot be greater than sqrt(A); and (3) set bits in X or Y according to the current bit in B.
The following Python code resulted in under 300 iterations for all but one of the random pairs I picked from Matt Timmermans' example code. But the first one took 231,199 iterations :)
from math import sqrt
def f(A, B):
i = 64
while not ((1<<i) & B):
i = i - 1
X = 1 | (1 << i)
sqrtA = int(sqrt(A))
j = 64
while not ((1<<j) & sqrtA):
j = j - 1
if (j > i):
i = j + 1
memo = {"it": 0, "stop": False, "solution": []}
def g(b, x, y):
memo["it"] = memo["it"] + 1
if memo["stop"]:
return []
if y > sqrtA or y * x > A:
return []
if b == 0:
if x * y == A:
memo["solution"].append((x, y))
memo["stop"] = True
return [(x, y)]
else:
return []
bit = 1 << b
if B & bit:
return g(b - 1, x, y | bit) + g(b - 1, x | bit, y)
else:
return g(b - 1, x | bit, y | bit) + g(b - 1, x, y)
g(i - 1, X, 1)
return memo
vals = [
(6872997084689100999, 2637233646), # 1048 checks with Matt's code
(3461781732514363153, 262193934464), # 8756 checks with Matt's code
(931590259044275343, 5343859294), # 4628 checks with Matt's code
(2390503072583010999, 22219728382), # 5188 checks with Matt's code
(412975927819062465, 9399702487040), # 8324 checks with Matt's code
(9105477787064988985, 211755297373604352), # 3204 checks with Matt's code
(4978113409908739575,67966612030), # 5232 checks with Matt's code
(6175356111962773143,1264664368613886), # 3756 checks with Matt's code
(648518352783802375, 6) # B smaller than sqrt(A)
]
for A, B in vals:
memo = f(A, B)
[(x, y)] = memo["solution"]
print "x, y: %s, %s" % (x, y)
print "A: %s" % A
print "x*y: %s" % (x * y)
print "B: %s" % B
print "x^y: %s" % (x ^ y)
print "%s iterations" % memo["it"]
print ""
Output:
x, y: 4251585939, 1616572541
A: 6872997084689100999
x*y: 6872997084689100999
B: 2637233646
x^y: 2637233646
231199 iterations
x, y: 262180735447, 13203799
A: 3461781732514363153
x*y: 3461781732514363153
B: 262193934464
x^y: 262193934464
73 iterations
x, y: 5171068311, 180154313
A: 931590259044275343
x*y: 931590259044275343
B: 5343859294
x^y: 5343859294
257 iterations
x, y: 22180179939, 107776541
A: 2390503072583010999
x*y: 2390503072583010999
B: 22219728382
x^y: 22219728382
67 iterations
x, y: 9399702465439, 43935
A: 412975927819062465
x*y: 412975927819062465
B: 9399702487040
x^y: 9399702487040
85 iterations
x, y: 211755297373604395, 43
A: 9105477787064988985
x*y: 9105477787064988985
B: 211755297373604352
x^y: 211755297373604352
113 iterations
x, y: 68039759325, 73164771
A: 4978113409908739575
x*y: 4978113409908739575
B: 67966612030
x^y: 67966612030
69 iterations
x, y: 1264664368618221, 4883
A: 6175356111962773143
x*y: 6175356111962773143
B: 1264664368613886
x^y: 1264664368613886
99 iterations
x, y: 805306375, 805306369
A: 648518352783802375
x*y: 648518352783802375
B: 6
x^y: 6
59 iterations

Modular Inverse Calculation

Given a arbitrary integers p, g, and r and given y such that y = gx mod p where x is an unknown integer quantity, how would one solve for C where C = gr• (gx)-1 mod p?
My math is below, but when I input it in a verifier function, it says the answer is incorrect.
y•u = 1 mod p
y•u = 1 + mp
uy - mp = 1
where u is the inverse of y and m is the set of natural numbers (as inverse of mod requires this)
If I see it right you are looking for Inverse modpow. The math is like this:
ab = a^b % p
ab + c*p = a^b
log(ab+c*p)/log(a) = b
(ab+c*p)^(1/b) = a
where c is integer c={ 0,1,2,3,4... } converting between normal and modular arithmetics. So in your case you want to compute b. The problem is that log(ab+c*p)/log(a) grows very slow with increasing c if p is not much bigger than a. So in such case its faster to use all combinations of b instead until a fit is found something like this in C++:
//---------------------------------------------------------------------------
ALU32 alu;
DWORD modmul(DWORD a,DWORD b,DWORD p) // ans = a*b % p
{
DWORD ch,cl,c,d;
alu.mul(ch,cl,a,b);
alu.div(c,d,ch,cl,p);
return d;
}
//---------------------------------------------------------------------------
DWORD modinv(DWORD a,DWORD p) // a * ans % p = 1
{
DWORD b,c,db,dc,i=0;
db=p/a;
dc=db*a;
for (b=1,c=a;b<p;i++)
{
if (c==1) return b;
b+=db; c+=dc;
while (c<p){ b++; c+=a; }
c-=p;
}
return 0;
}
//---------------------------------------------------------------------------
DWORD modpow(DWORD a,DWORD b,DWORD p) // ans = a^b % p
{ // b is not mod(p) !
DWORD i,d=1;
for (a%=p,i=0;i<32;i++,b<<=1)
{
d=modmul(d,d,p);
if (DWORD(b&0x80000000)) d=modmul(d,a,p);
}
return d;
}
//---------------------------------------------------------------------------
DWORD imodpow(DWORD ab,DWORD a,DWORD p) // ab = a^ans % p
{ // ans is not mod(p) !
DWORD b,AB;
for (AB=1,b=0;;)
{
if (AB==ab) return b;
b++; if (!b) return 0;
AB=modmul(AB,a,p);
}
}
//---------------------------------------------------------------------------
of coarse this is SLOOOOW that is why is this used for cryptography...
Also beware there are multiple valid solutions and the first one found might not be the one you're seeking so you need to add additional conditions ...
The ALU32.h can be found in here Cant make value propagate through carry
And the modular arithmetics is based on this: Modular arithmetics and NTT (finite field DFT) optimizations
Here a sample for comparison (ignore VCL and tbeg/tend/tstr functions):
DWORD a=87654321,b=12345678,p=0xC0000001,ab,bb;
tbeg(); ab=modpow(a,b,p); tend(); mm_log->Lines->Add(AnsiString().sprintf("%8u^%8u mod %u = %u ",a,b ,p,ab)+tstr(1));
tbeg(); bb=imodpow(ab,a,p); tend(); mm_log->Lines->Add(AnsiString().sprintf("%8u^%8u mod %u = %u ",a,bb,p,ab)+tstr(1));
and output:
87654321^12345678 mod 3221225473 = 3038293251 [ 0.002 ms]
87654321^12345678 mod 3221225473 = 3038293251 [ 421.910 ms]
PS.
There might be some more advanced approaches from number theory if the p is special like prime, composite of two primes or even n-th root of unity ... but that is in galaxy far far away from mine reach of expertise.
[edit1]
from your newly posted question its finally more clear that you really just wanted modular inverse and has nothing to do with imodpow. So what you want is this:
a*b % p = 1
where b is unknown so simply try all b in increasing manner where a*b % p is just truncated by p towards zero and if the result is 1 you found your answer. I updated the code above with modinv function doing exactly that + some optimization. However I think there are even faster approaches for this using GCD or something.
Here another test sample:
DWORD a=87654321,b=12345678,p=0xC0000001,ab,bb;
ab=modmul(a,b,p);
tbeg(); bb=modinv(b,p); tend(); mm_log->Lines->Add(AnsiString().sprintf(" 1/%8u mod %u = %u ",b,p,bb)+tstr(1));
tbeg(); a =modmul(b,bb,p); tend(); mm_log->Lines->Add(AnsiString().sprintf("%8u*%8u mod %u = %u ",b,bb,p,a)+tstr(1));
tbeg(); a =modmul(ab,bb,p); tend(); mm_log->Lines->Add(AnsiString().sprintf("%8u*%8u mod %u = %u ",ab,bb,p,a)+tstr(1));
And output:
1/12345678 mod 3221225473 = 165081805 [ 4.999 ms]
12345678*165081805 mod 3221225473 = 1 [ 0.000 ms]
652073126*165081805 mod 3221225473 = 87654321 [ 0.000 ms]

Lua table hash indexing is faster than array indexing in my speed test. Why?

I am doing some tests to see where I can improve the performance of my lua code.
I was reading this document: https://www.lua.org/gems/sample.pdf
and I thought using integers as table indices should be considerably faster since it uses the array part of tables and does not require hashing.
So I've written this test program:
print('local x=0 local y=0 local z=0')
local x=0 local y=0 local z=0
t0 = os.clock()
for i=1,1e7 do
x = 1
y = 2
z = 3
end
print(os.clock()-t0 .. "\n")
print("tab = {1,2,3}")
tab = {1,2,3}
t0 = os.clock()
for i=1,1e7 do
tab[1] = 1
tab[2] = 2
tab[3] = 3
end
print(os.clock()-t0 .. "\n")
print("tab = {[1]=1,[2]=2,[3]=3}")
tab = {[1]=1,[2]=2,[3]=3}
t0 = os.clock()
for i=1,1e7 do
tab[1] = 1
tab[2] = 2
tab[3] = 3
end
print(os.clock()-t0 .. "\n")
print("tab = {a=1,b=2,c=3}")
tab = {a=1,b=2,c=3}
t0 = os.clock()
for i=1,1e7 do
tab.a = 1
tab.b = 2
tab.c = 3
end
print(os.clock()-t0 .. "\n")
print('tab = {["bli"]=1,["bla"]=2,["blu"]=3}')
tab = {["bli"]=1,["bla"]=2,["blu"]=3}
t0 = os.clock()
for i=1,1e7 do
tab["bli"] = 1
tab["bla"] = 2
tab["blu"] = 3
end
print(os.clock()-t0 .. "\n")
print("tab = {verylongfieldname=1,anotherevenlongerfieldname=2,superincrediblylongfieldname=3}")
tab = {verylongfieldname=1,anotherevenlongerfieldname=2,superincrediblylongfieldname=3}
t0 = os.clock()
for i=1,1e7 do
tab.verylongfieldname = 1
tab.anotherevenlongerfieldname = 2
tab.superincrediblylongfieldname = 3
end
print(os.clock()-t0 .. "\n")
print('local f = function(p1, p2, p3)')
local f = function(p1, p2, p3)
x = p1
y = p2
z = p3
return x,y,z
end
local a=0
local b=0
local c=0
t0 = os.clock()
for i=1,1e7 do
a,b,c = f(1,2,3)
end
print(os.clock()-t0 .. "\n")
print('local g = function(params)')
local g = function(params)
x = params.p1
y = params.p2
z = params.p3
return {x,y,z}
end
t0 = os.clock()
for i=1,1e7 do
t = g{p1=1, p2=2, p3=3}
end
print(os.clock()-t0 .. "\n")
I've ordered the blocks by what I expected to be increasing time consumption. (I wasn't sure about the function calls, that was just a test.) But here are the surprising results:
local x=0 local y=0 local z=0
0.093613
tab = {1,2,3}
0.678514
tab = {[1]=1,[2]=2,[3]=3}
0.83678
tab = {a=1,b=2,c=3}
0.62888
tab = {["bli"]=1,["bla"]=2,["blu"]=3}
0.733916
tab = {verylongfieldname=1,anotherevenlongerfieldname=2,superincrediblylongfieldname=3}
0.536726
local f = function(p1, p2, p3)
0.475592
local g = function(params)
3.576475
And even the long field names that should cause the longest hashing process are faster than array accessing with integers. Am I doing something wrong?
The 6th page(actual page 20) of the document you linked explains what you are seeing.
If you write something like {[1] = true, [2] = true, [3] = true}, however, Lua is not smart enough to detect that the given expressions (literal numbers, in this case) describe array indices, so it creates a table with four slots in
its hash part, wasting memory and CPU time.
You can only gain a major benefit of the array part when you assign a table using no keys.
table = {1,2,3}
If you are reading/writing to a table or array that already exists you will not see a large deviation in processing time.
The example in the document includes the creation of the table in the for loop
for i = 1, 1000000 do
local a = {true, true, true}
a[1] = 1; a[2] = 2; a[3] = 3
end
Results with all local variables inside the loops. Edit: Lengthened long string to 40 bytes as pointed out by siffiejoe
local x=0 local y=0 local z=0
0.18
tab = {1,2,3}
3.089
tab = {[1]=1,[2]=2,[3]=3}
4.59
tab = {a=1,b=2,c=3}
3.79
tab = {["bli"]=1,["bla"]=2,["blu"]=3}
3.967
tab = {verylongfieldnameverylongfieldnameverylongfieldname=1,anotherevenlongerfieldnameanotherevenlongerfieldname=2,superincrediblylongfieldnamesuperincrediblylongfieldname=3}
4.013
local f = function(p1, p2, p3)
1.238
local g = function(params)
6.325
Additionally lua preforms the hashes differently for different key types.
The source code can be viewed here 5.2.4 ltable.c, this contains the code I will be discussing.
The mainposition function handles that decision making on which hash to preform
/*
** returns the `main' position of an element in a table (that is, the index
** of its hash value)
*/
static Node *mainposition (const Table *t, const TValue *key) {
switch (ttype(key)) {
case LUA_TNUMBER:
return hashnum(t, nvalue(key));
case LUA_TLNGSTR: {
TString *s = rawtsvalue(key);
if (s->tsv.extra == 0) { /* no hash? */
s->tsv.hash = luaS_hash(getstr(s), s->tsv.len, s->tsv.hash);
s->tsv.extra = 1; /* now it has its hash */
}
return hashstr(t, rawtsvalue(key));
}
case LUA_TSHRSTR:
return hashstr(t, rawtsvalue(key));
case LUA_TBOOLEAN:
return hashboolean(t, bvalue(key));
case LUA_TLIGHTUSERDATA:
return hashpointer(t, pvalue(key));
case LUA_TLCF:
return hashpointer(t, fvalue(key));
default:
return hashpointer(t, gcvalue(key));
}
}
When the key is a Lua_Number we call hashnum
/*
** hash for lua_Numbers
*/
static Node *hashnum (const Table *t, lua_Number n) {
int i;
luai_hashnum(i, n);
if (i < 0) {
if (cast(unsigned int, i) == 0u - i) /* use unsigned to avoid overflows */
i = 0; /* handle INT_MIN */
i = -i; /* must be a positive value */
}
return hashmod(t, i);
}
Here are the other hash implementations for the other types:
#define hashpow2(t,n) (gnode(t, lmod((n), sizenode(t))))
#define hashstr(t,str) hashpow2(t, (str)->tsv.hash)
#define hashboolean(t,p) hashpow2(t, p)
/*
** for some types, it is better to avoid modulus by power of 2, as
** they tend to have many 2 factors.
*/
#define hashmod(t,n) (gnode(t, ((n) % ((sizenode(t)-1)|1))))
#define hashpointer(t,p) hashmod(t, IntPoint(p))
These hashes resolve down to 2 paths hashpow2 and hashmod. LUA_TNUMBER use hashnum > hashmod and LUA_TSHRSTR use hashstr > hashpow2

Use two random function to get a specific random funciton

There are two random functions f1(),f2().
f1() returns 1 with probability p1, and 0 with probability 1-p1.
f2() returns 1 with probability p2, and 0 with probability 1-p2.
I want to implement a new function f3() which returns 1 with probability p3(a given probability), and returns 0 with probability 1-p3. In the implemetion of function f3(), we can use function f1() and f2(), but you can't use any other random function.
If p3=0.5, an example of implemention:
int f3()
{
do
{
int a = f1();
int b = f1();
if (a==b) continue;
// when reachs here
// a==1 with probability p1(1-p1)
// b==1 with probability (1-p1)p1
if (a==1) return 1;//now returns 1 with probability 0.5
if (b==1) return 0;
}while(1)
}
This implemention of f3() will give a random function returns 1 with probability 0.5, and 0 with probability 0.5. But how to implement the f3() with p3=0.4? I have no idea.
I wonder, is that task possible? And how to implement f3()?
Thanks in advance.
p1 = 0.77 -- arbitrary value between 0 and 1
function f1()
if math.random() < p1 then
return 1
else
return 0
end
end
-- f1() is enough. We don't need f2()
p3 = 0.4 -- arbitrary value between 0 and 1
--------------------------
function f3()
left = 0
rigth = 1
repeat
middle = left + (right - left) * p1
if f1() == 1 then
right = middle
else
left = middle
end
if right < p3 then -- completely below
return 1
elseif left >= p3 then -- completely above
return 0
end
until false -- loop forever
end
This can be solved if p3 is a rational number.
We should use conditional probabilities for this.
For example, if you want to make this for p3=0.4, the method is the following:
Calculate the fractional form of p3. In our case it is p3=0.4=2/5.
Now generate as many random variables from the same distribution (let's say, from f1, we won't use f2 anyway) as the denominator, call them X1, X2, X3, X4, X5.
We should regenerate all these random X variables until their sum equals the numerator in the fractional form of p3.
Once this is achieved then we just return X1 (or any other Xn, where n was chosen independently of the values of the X variables). Since there are 2 1s among the 5 X variables (because their sum equals the numerator), the probability of X1 being 1 is exactly p3.
For irrational p3, the problem cannot be solved by using only f1. I'm not sure now, but I think, it can be solved for p3 of the form p1*q+p2*(1-q), where q is rational with a similar method, generating the appropriate amount of Xs with distribution f1 and Ys with distribution f2, until they have a specific predefined sum, and returning one of them. This still needs to be detailed.
First to say, that's a nice problem to tweak one's brain. I managed to solve the problem for p3 = 0.4, for what you just asked for! And I think, generalisation of such problem, is not so trivial. :D
Here is how, you can solve it for p3 = 0.4:
The intuition comes from your example. If we generate a number from f1() five times in an iteration, (see the code bellow), we can have 32 types of results like bellow:
1: 00000
2: 00001
3: 00010
4: 00011
.....
.....
32: 11111
Among these, there are 10 such results with exactly two 1's in it! After identifying this, the problem becomes simple. Just return 1 for any of the 4 combinations and return 0 for 6 others! (as probability 0.4 means getting 1, 4 times out of 10). You can do that like bellow:
int f3()
{
do{
int a[5];
int numberOfOneInA = 0;
for(int i = 0; i < 5; i++){
a[i] = f1();
if(a[i] == 1){
numberOfOneInA++;
}
}
if (numberOfOneInA != 2) continue;
else return a[0]; //out of 10 times, 4 times a[0] is 1!
}while(1)
}
Waiting to see a generalised solution.
Cheers!
Here is an idea that will work when p3 is of a form a/2^n (a rational number with a denominator that is a power of 2).
Generate n random numbers with probability distribution of 0.5:
x1, x2, ..., xn
Interpret this as a binary number in the range 0...2^n-1; each number in this range has equal probability. If this number is less than a, return 1, else return 0.
Now, since this question is in a context of computer science, it seems reasonable to assume that p3 is in a form of a/2^n (this a common representation of numbers in computers).
I implement the idea of anatolyg and Egor:
inline double random(void)
{
return static_cast<double>(rand()) / static_cast<double>(RAND_MAX);
}
const double p1 = 0.8;
int rand_P1(void)
{
return random() < p1;
}
int rand_P2(void)//return 0 with 0.5
{
int x, y; while (1)
{
mystep++;
x = rand_P1(); y = rand_P1();
if (x ^ y) return x;
}
}
double p3 = random();
int rand_P3(void)//anatolyg's idea
{
double tp = p3; int bit, x;
while (1)
{
if (tp * 2 >= 1) {bit = 1; tp = tp * 2 - 1;}
else {bit = 0; tp = tp * 2;}
x = rand_P2();
if (bit ^ x) return bit;
}
}
int rand2_P3(void)//Egor's idea
{
double left = 0, right = 1, mid;
while (1)
{
dashenstep++;
mid = left + (right - left) * p1;
int x = rand_P1();
if (x) right = mid; else left = mid;
if (right < p3) return 1;
if (left > p3) return 0;
}
}
With massive math computings, I get, assuming P3 is uniformly distributed in [0,1), then the expectation of Egor is (1-p1^2-(1-p1)^2)^(-1). And anatolyg is 2(1-p1^2-(1-p1)^2)^(-1).
Speaking Algorithmically , Yes It is possible to do that task done .
Even Programmatically , It is possible , but a complex problem .
Lets take an example .
Let
F1(1) = .5 which means F1(0) =.5
F2(2) = .8 which means F1(0) =.2
Let Suppose You need a F3, such that F3(1)= .128
Lets try Decomposing it .
.128
= (2^7)*(10^-3) // decompose this into know values
= (8/10)*(8/10)*(2/10)
= F2(1)&F2(1)*(20/100) // as no Fi(1)==2/10
= F2(1)&F2(1)*(5/10)*(4/10)
= F2(1)&F2(1)&F1(1)*(40/100)
= F2(1)&F2(1)&F1(1)*(8/10)*(5/10)
= F2(1)&F2(1)&F1(1)&F2(1)&F1(1)
So F3(1)=.128 if we define F3()=F2()&F2()&F2()&F1()&F1()
Similarly if you want F4(1)=.9 ,
You give it as F4(0)=F1(0) | F2(0) =F1(0)F2(0)=.5.2 =.1 ,which mean F4(1)=1-0.1=0.9
Which means F4 is zero only when both are zero which happens .
So making use this ( & , | and , not(!) , xor(^) if you want ) operations with a combinational use of f1,f2 will surely give you the F3 which is made purely out of f1,f2,
Which may be NP hard problem to find the combination which gives you the exact probability.
So, Finally the answer to your question , whether it is possible or not ? is YES and this is one way of doing it, may be many hacks can be made into it this to optimize this, which gives you any optimal way .

Python performance: iteration and operations on nested lists

Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
Given:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
{
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
))
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
}
}
}
Py_RETURN_NONE;
}
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
{0}
};
void
initexte()
{
Py_InitModule("exte", exteMethods);
}
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
Replace
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
with
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want:
http://docs.python.org/extending/

Resources