Related
According to :help rand(),
rand([{expr}])
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
Examples:
:echo rand()
:let seed = srand()
:echo rand(seed)
:echo rand(seed) % 16 " random number 0 - 15
It doesn't explain how a seed is changed every time rand() is called, but I expected it to be deterministically altered because
C++'s std::rand() does so,
and Wikipedia says
A pseudorandom number generator (PRNG), also known as a deterministic random bit generator (DRBG), is an algorithm...
However, in the code below, the value of a is deterministic but the values of b are not deterministic; they take different values when you restart the script.
let seed = srand(0)
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
let seed = [0, 1, 2, 3]
let a = rand(seed) "deterministic
let b = rand() "not deterministic (why?)
echo [a, b]
Is this an expected behavior? I think the behavior contradicts the documentation.
Environments:
~ $ vi --version
VIM - Vi IMproved 8.2 (2019 Dec 12, compiled Apr 30 2020 13:32:36)
Included patches: 1-664
An algorithm used in Vim is fully deterministic. What creates a confusion is the fact that calling rand(seed) updates the seed "in place", but does not update any internal value(s). Therefore any subsequent rand() uses another (more or less random - quality depends on platform) internal seed value. So if you want to produce fully deterministic sequence, you must consequently invoke rand(seed) with the same variable.
This behaviour is easy to deduce from Vim's source code. Also :h rand() says that:
Return a pseudo-random Number generated with an xoshiro128**
algorithm using seed {expr}. The returned number is 32 bits,
also on 64 bits systems, for consistency.
{expr} can be initialized by srand() and will be updated by
rand(). If {expr} is omitted, an internal seed value is used
and updated.
If you find the wording misleading you can open an issue on github.
The documentation is badly written but the behavior is actually the expected one from the source code's perspective.
Analysis
rand() is defined as f_rand() in src/evalfunc.c. From the snippet at the end of this answer, we know some things:
f_rand() has only two sets of static variables: gx, ..., gw and initialized.
gx, ..., gw are the internal seeds. Their values are touched and referenced only when f_rand() is called with no argument (i.e. when argvars[0].v_type == VAR_UNKNOWN).
initialized remembers if f_rand() has ever been called with no argument and it is also touched and referenced only when f_rand() is called with no argument.
When f_rand() is called with a seed,
The value of the seed is used once and that is not saved as a static variable. In other words, the sentence "{expr} can be initialized by srand() and will be updated by rand()" in the documentation is nothing but a "lie"; {expr} is not remembered and thus not updated by the subsequent f_rand().
The value of the seed is updated in place via the pointers lx, ..., lw.
Conclusion
The sentence
{expr} can be initialized by srand() and will be updated by rand()
shall be modified to
{expr} can be initialized by srand() and will be updated by rand({expr}). You may want to store a seed into a variable and pass it to rand() since {expr} is not remembered in the function.
If you need the deterministic rand(), do this:
let seed = srand(0)
let a = rand(seed) "The value of `seed` is changed in place.
let b = rand(seed) "ditto
echo [a, b]
The Source Code of rand()
#define ROTL(x, k) ((x << k) | (x >> (32 - k)))
#define SPLITMIX32(x, z) ( \
z = (x += 0x9e3779b9), \
z = (z ^ (z >> 16)) * 0x85ebca6b, \
z = (z ^ (z >> 13)) * 0xc2b2ae35, \
z ^ (z >> 16) \
)
#define SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w) \
result = ROTL(y * 5, 7) * 9; \
t = y << 9; \
z ^= x; \
w ^= y; \
y ^= z, x ^= w; \
z ^= t; \
w = ROTL(w, 11);
/*
* "rand()" function
*/
static void
f_rand(typval_T *argvars, typval_T *rettv)
{
list_T *l = NULL;
static UINT32_T gx, gy, gz, gw;
static int initialized = FALSE;
listitem_T *lx, *ly, *lz, *lw;
UINT32_T x, y, z, w, t, result;
if (argvars[0].v_type == VAR_UNKNOWN)
{
// When no argument is given use the global seed list.
if (initialized == FALSE)
{
// Initialize the global seed list.
init_srand(&x);
gx = SPLITMIX32(x, z);
gy = SPLITMIX32(x, z);
gz = SPLITMIX32(x, z);
gw = SPLITMIX32(x, z);
initialized = TRUE;
}
SHUFFLE_XOSHIRO128STARSTAR(gx, gy, gz, gw);
}
else if (argvars[0].v_type == VAR_LIST)
{
l = argvars[0].vval.v_list;
if (l == NULL || list_len(l) != 4)
goto theend;
lx = list_find(l, 0L);
ly = list_find(l, 1L);
lz = list_find(l, 2L);
lw = list_find(l, 3L);
if (lx->li_tv.v_type != VAR_NUMBER) goto theend;
if (ly->li_tv.v_type != VAR_NUMBER) goto theend;
if (lz->li_tv.v_type != VAR_NUMBER) goto theend;
if (lw->li_tv.v_type != VAR_NUMBER) goto theend;
x = (UINT32_T)lx->li_tv.vval.v_number;
y = (UINT32_T)ly->li_tv.vval.v_number;
z = (UINT32_T)lz->li_tv.vval.v_number;
w = (UINT32_T)lw->li_tv.vval.v_number;
SHUFFLE_XOSHIRO128STARSTAR(x, y, z, w);
lx->li_tv.vval.v_number = (varnumber_T)x;
ly->li_tv.vval.v_number = (varnumber_T)y;
lz->li_tv.vval.v_number = (varnumber_T)z;
lw->li_tv.vval.v_number = (varnumber_T)w;
}
else
goto theend;
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = (varnumber_T)result;
return;
theend:
semsg(_(e_invarg2), tv_get_string(&argvars[0]));
rettv->v_type = VAR_NUMBER;
rettv->vval.v_number = -1;
}
I have implemented a FIR filter in Haskell. I don't know that much about FIR filters and my code is heavily based on an existing C# implementation. Therefore, I have a feeling that my implementation is has too much of a C# style and is not really Haskell-like. I would like to know if there is a more idiomatic Haskell way of implementing my code. Ideally, I'm lucky for some combination of higher-order functions (map, filter, fold, etc.) that implement the algorithm.
My Haskell code looks like this:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
help i = if i >= (U.length b - 1) then loop i (U.length b - 1) else 0
loop yi bi = if bi < 0 then 0 else b !! bi * x !! (yi-bi) + loop yi (bi-1)
vec !! i = unsafeIndex vec i -- Shorthand for unsafeIndex
This code is based on the following C# code:
public float[] RunFilter(double[] x)
{
int M = coeff.Length;
int n = x.Length;
//y[n]=b0x[n]+b1x[n-1]+....bmx[n-M]
var y = new float[n];
for (int yi = 0; yi < n; yi++)
{
double t = 0.0f;
for (int bi = M - 1; bi >= 0; bi--)
{
if (yi - bi < 0) continue;
t += coeff[bi] * x[yi - bi];
}
y[yi] = (float) t;
}
return y;
}
As you can see, it's almost a straight copy. How can I turn my implementation into a more Haskell-like one? Do you have any ideas? The only thing I could come up with was using Vector.generate.
I know that the DSP library has an implementation available. But it uses lists and is way too slow for my use case. This Vector implementation is a lot faster than the one in DSP.
I've also tried implementing the algorithm using Repa. It is faster than the Vector implementation. Here is the result:
applyFIR :: V.Vector Float -> Array U DIM1 Float -> Array D DIM1 Float
applyFIR b x = R.traverse x id (\_ (Z :. i) -> if i >= len then loop i (len - 1) else 0)
where
len = V.length b
loop :: Int -> Int -> Float
loop yi bi = if bi < 0 then 0 else (V.unsafeIndex b bi) * x !! (Z :. (yi-bi)) + loop yi (bi-1)
arr !! i = unsafeIndex arr i
First of all, I don't think that your initial vector code is a faithful translation - that is, I think it disagrees with the C# code. For example, suppose that both "x" and "b" ("b" is coeff in C#) have length 3, and have all values of 1.0. Then for y[0] the C# code would produce x[0] * coeff[0], or 1.0. (it would hit continue for all other values of bi)
With your Haskell code, however, help 0 produces 0. Your Repa version seems to suffer from the same problem.
So let's start with a more faithful translation:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
help i = loop i (min i $ U.length b - 1)
loop yi bi = if bi < 0 then 0 else b !! bi * x !! (yi-bi) + loop yi (bi-1)
vec !! i = unsafeIndex vec i -- Shorthand for unsafeIndex
Now, you're basically doing a calculation like this for computing, say, y[3]:
... b[3] | b[2] | b[1] | b[0]
x[0] | x[1] | x[2] | x[3] | x[4] | x[5] | ....
multiply
b[3]*x[0]|b[2]*x[1] |b[1]*x[2] |b[0]*x[3]
sum
y[3] = b[3]*x[0] + b[2]*x[1] + b[1]*x[2] + b[0]*x[3]
So one way to think of what you're doing is "take the b vector, reverse it, and to compute spot i of the result, line b[0] up with x[i], multiply all the corresponding x and b entries, and compute the sum".
So let's do that:
applyFIR :: Vector Double -> Vector Double -> Vector Double
applyFIR b x = generate (U.length x) help
where
revB = U.reverse b
bLen = U.length b
help i = let sliceLen = min (i+1) bLen
bSlice = U.slice (bLen - sliceLen) sliceLen revB
xSlice = U.slice (i + 1 - sliceLen) sliceLen x
in U.sum $ U.zipWith (*) bSlice xSlice
I'm quite new to Haskell, and to learn it better I started solving problems here and there and I ended up with this (project Euler 34).
145 is a curious number, as 1! + 4! + 5! = 1 + 24 + 120 = 145.
Find the sum of all numbers which are equal to the sum of the factorial >of their digits.
Note: as 1! = 1 and 2! = 2 are not sums they are not included.
I wrote a C and an Haskell brute force solution.
Could someone explain me the Haskell version is ~15x (~0.450 s vs ~6.5s )slower than the C implementation and how to possibly tune and speedup the Haskell solution?
unsigned int solve(){
unsigned int result = 0;
unsigned int i=10;
while(i<2540161){
unsigned int sumOfFacts = 0;
unsigned int number = i;
while (number > 0) {
unsigned int d = number % 10;
number /= 10;
sumOfFacts += factorial(d);
}
if (sumOfFacts == i)
result += i;
i++;
}
return result;
}
here the haskell solution
--BRUTE FORCE SOLUTION
solve:: Int
solve = sum (filter (\x-> sfc x 0 == x) [10..2540160])
--sum factorial of digits
sfc :: Int -> Int -> Int
sfc 0 acc = acc
sfc n acc = sfc n' (acc+fc r)
where
n' = div n 10
r = mod n 10 --n-(10*n')
fc 0 =1
fc 1 =1
fc 2 =2
fc 3 =6
fc 4 =24
fc 5 =120
fc 6 =720
fc 7 =5040
fc 8 =40320
fc 9 =362880
First, compile with optimizations. With ghc-7.10.1 -O2 -fllvm, the Haskell version runs in 0.54 secs for me. This is already pretty good.
If we want to do even better, we should first replace div with quot and mod with rem. div and mod do some extra work, because they handle the rounding of negative numbers differently. Since we only have positive numbers here, we should switch to the faster functions.
Second, we should replace the pattern matching in fc with an array lookup. GHC uses a branching construct for Int patterns, and uses binary search when the number of cases is large enough. We can do better here with a lookup.
The new code looks like this:
import qualified Data.Vector.Unboxed as V
facs :: V.Vector Int
facs =
V.fromList [1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880]
--BRUTE FORCE SOLUTION
solve:: Int
solve = sum (filter (\x-> sfc x 0 == x) [10..2540160])
--sum factorial of digits
sfc :: Int -> Int -> Int
sfc 0 acc = acc
sfc n acc = sfc n' (acc + V.unsafeIndex facs r)
where
(n', r) = quotRem n 10
main = print solve
It runs in 0.095 seconds on my computer.
I have written the following code:
combinationsstring = "List of Combinations"
for a = 0, 65 do
for b = 0, 52 do
for c = 0, 40 do
for d = 0, 28 do
for e = 0, 19 do
for f = 0, 11 do
for g = 0, 4 do
if (((1.15^a)-1)+((20/3)*((1.15^b)-1))
+((100/3)*((1.15^c)-1))+(200*((1.15^d)-1))
+((2000/3)*((1.15^e)-1))+((8000/3)*((1.15^f)-1))
+((40000/3)*((1.15^g)-1))) < 10000 then
combinationsstring = combinationsstring
.."\n"..a..", "..b..", "..c..", "..d
..", "..e..", "..f..", "..g
end
end
end
end
end
end
end
end
local file = io.open("listOfCombinations.txt", "w")
file:write(combinationsstring)
file:close()
I need to find all the sets of data that fit the following equation
(((1.15^a)-1)+((20/3)*((1.15^b)-1))+
((100/3)*((1.15^c)-1))+(200*((1.15^d)-1))+
((2000/3)*((1.15^e)-1))+((8000/3)*((1.15^f)-1))+
((40000/3)*((1.15^g)-1))) < 10000
each variable (a-g) is a real integer. So I calculated the maximum values for each of the 7 (the maximum for each variable will be when all the other values are 0). These maximum's are 65, 52, 40, 28, 19, 11 and 4 respectfully (62 = a, 52 = b and so on)
So I created 7 nested for loops (as shown in the code above) and in the middle block, i tested the 7 values to see if they fit the criteria, if they did, they were added onto a string. At the end of the code, the program would write over a file and put that final string in containing all the possible combinations.
The program is working fine, however there are 3.1 billion computations carried out over the course of this simulation and from some testing, I found my computer to be averaging 3000 computations per second. This means that the total simulation time is about 12 days and 5 hours. I don't have this time whatsoever, so I had spent all morning simplifying the equation to be tested for, removing unnecessary code and this was my final result.
Is this method I have done using the nested for loops the most optimal method here? If it is, are there any other ways I can speed this up, if not, can you tell me another way?
P.S. I am using Lua because it's the language I am the most familiar with, but if you have other suggestions/examples, use it in your language and I can try optimise it for this program.
I don't speak lua, but here are a few suggestions:
Before starting the loop on b, compute and store 1.15^a-1; maybe call it fooa.
Likewise, before starting the loop on c, compute fooa+(20/3)*(1.15^b-1); maybe call it foob.
Do similar things before starting each loop.
If foob, for instance, is at least 10000, break out of the loop; the stuff inside
can only make the result bigger.
This might be useless or worse in lua, but do you really need to accumulate the result in a string? I don't know how lua represents strings and does concatenation, but the concatenation might be hurting you badly. Try using a list or array data structure instead.
I'd also add that the nested loops are a perfectly sensible solution and, with the above modifications, exactly what I would do.
I would recommend a static language for brute-forcing things of this nature. I had a problem (this one) that I was having trouble with using python but the C++ brute force 8-for-loop approach computes the solution in 30 seconds.
Since you also asked for solutions in different languages, here is a quick and dirty program in C++, also incorporating the suggestions by #tmyklebu.
#include <iostream>
#include <fstream>
#include <cmath>
int main()
{
std::ofstream os( "listOfCombinations.txt" );
using std::pow;
for( double a = 0; a <= 65; ++a ) {
double aa = (pow(1.15, a) - 1);
if ( aa > 10000 ) break;
for( double b = 0; b <= 52; ++b ) {
double bb = aa + (20/3) * (pow(1.15, b) - 1);
if ( bb > 10000 ) break;
for( double c = 0; c <= 40; ++c ) {
double cc = bb + (100/3) * (pow(1.15, c) - 1);
if ( cc > 10000 ) break;
// The following line provides some visual feedback for the
// user about the progress (it prints current a, b, and c
// values).
std::cout << a << " " << b << " " << c << std::endl;
for( double d = 0; d <= 28; ++d ) {
double dd = cc + 200 * ( pow(1.15, d) - 1);
if ( dd > 10000 ) break;
for( double e = 0; e <= 19; ++e ) {
double ee = dd + (2000/3) * (pow(1.15, e) - 1);
if ( ee > 10000 ) break;
for( double f = 0; f <= 11; ++f ) {
double ff = ee + (8000/3) * (pow(1.15, f) - 1);
if ( ff > 10000 ) break;
for( double g = 0; g <= 4; ++g ) {
double gg = ff + (40000/3) * (pow(1.15, g) - 1);
if ( gg >= 10000 ) break;
os << a << ", " << b << ", "
<< c << ", " << d << ", "
<< e << ", " << f << ", "
<< g << "\n";
}
}
}
}
}
}
}
return 0;
}
local res={}
combinationsstring = "List of Combinations"
--for a = 0, 65 do
a=0
for b = 0, 52 do
for c = 0, 40 do
for d = 0, 28 do
for e = 0, 19 do
for f = 0, 11 do
for g = 0, 4 do
if (((1.15^a)-1)+((20/3)*((1.15^b)-1))
+((100/3)*((1.15^c)-1))+(200*((1.15^d)-1))
+((2000/3)*((1.15^e)-1))+((8000/3)*((1.15^f)-1))
+((40000/3)*((1.15^g)-1))) < 10000 then
res[#res+1]={a,b,c,d,e,f,g}
end
end
end
end
end
end
end
--end
runs in 30s on my machine and fills around 1 gb of memory. You can't put 66 times that in the 32 bit Lua VM, and in 64 bit LuaVM still the array part of tables is limited to 32 bit integer keys.
I've commented the outermost loop, so you'll need around 30s*66=33min. I'd write that to 66 different files perhaps. The results are held in a table first, which can then be concatenated. Check out:
local res={
{1,2,3,4,5,6,7},
{8,9,10,11,12,13,14}
}
for k,v in ipairs(res) do
-- either concatenate each line and produce a huge string
res[k]=table.concat(v,", ")
-- or write each line to a file in this loop
end
local text=table.concat(res,"\n")
print(text)
printing
1, 2, 3, 4, 5, 6, 7
8, 9, 10, 11, 12, 13, 14
The function a = 2 ^ b can quickly be calculated for any b by doing a = 1 << b.
What about the other way round, getting the value of b for any given a? It should be relatively fast, so logs are out of the question. Anything that's not O(1) is also bad.
I'd be happy with can't be done too if its simply not possible to do without logs or a search type thing.
Build a look-up table. For 32-bit integers, there are only 32 entries so it is O(1).
Most architectures also have an instruction to find the position of the most significant bit of a number a, which is the value b. (gcc provides the __builtin_clz function for this.)
For a BigInt, it can be computed in O(log a) by repeatedly dividing by 2.
int b = -1;
while (a != 0) {
a >>= 1;
++ b;
}
For this sort of thing I usually refer to this page with bit hacks:
Bit Twiddling Hacks
For example:
Find the log base 2 of an integer with a lookup table:
static const char LogTable256[256] =
{
#define LT(n) n, n, n, n, n, n, n, n, n, n, n, n, n, n, n, n
-1, 0, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3,
LT(4), LT(5), LT(5), LT(6), LT(6), LT(6), LT(6),
LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7), LT(7)
};
unsigned int v; // 32-bit word to find the log of
unsigned r; // r will be lg(v)
register unsigned int t, tt; // temporaries
if (tt = v >> 16)
{
r = (t = tt >> 8) ? 24 + LogTable256[t] : 16 + LogTable256[tt];
}
else
{
r = (t = v >> 8) ? 8 + LogTable256[t] : LogTable256[v];
}
There are also a couple of O(log(n)) algorithms given on that page.
Some architectures have a "count leading zeros" instruction. For example, on ARM:
MOV R0,#0x80 # load R0 with (binary) 10000000
CLZ R1,R0 # R1 = number of leading zeros in R0, i.e. 7
This is O(1).
Or you can write:
while ((a >>= 1) > 0) b++;
This is O(1). One could imagine this to be expanded to:
b = (((a >> 1) > 0) ? 1 : 0) + (((a >> 2) > 0) ? 1 : 0) + ... + (((a >> 31) > 0) ? 1 : 0);
With a complier optimization, that once (a >> x) > 0) returns false, rest won't be calculated. Also comparing with 0 is faster then any other comparison. Also:
, where k is maximum of 32 and g is 1.
Reference: Big O notation
But in case you where using BigInteger, then my code example would look like:
int b = 0;
String numberS = "306180206916083902309240650087602475282639486413"
+ "866622577088471913520022894784390350900738050555138105"
+ "234536857820245071373614031482942161565170086143298589"
+ "738273508330367307539078392896587187265470464";
BigInteger a = new BigInteger(numberS);
while ((a = a.shiftRight(1)).compareTo(BigInteger.ZERO) > 0) b++;
System.out.println("b is: " + b);
If a is a double rather than an int then it will be represented as mantissa and exponent. The exponent is the part you are looking for, as this is the logarithm of the number.
If you can hack the binary representation then you can get the exponent out. Look up the IEEE standard to see where and how the exponent is stored.
For an integral value, if some method of getting the most significant bit position is not available then you can binary-search the bits for the upper-most 1 which is therefore O(log numbits). Doing this may well actually perform faster than converting to a double first.
In Java you can use Integer.numberOfLeadingZeros to compute the binary logarithm. It returns the number of leading zeros in the binary representation, so
floor(log2(x)) = 31 - numberOfLeadingZeros(x)
ceil(log2(x)) = 32 - numberOfLeadingZeros(x - 1)
It can't be done without testing the high bit, but most modern FPUs support log2 so all is not lost.