Short version: common MT (and MT64) implementations have an extra tempering parameter 'd'. The papers I've looked at describing MT don't include it. Where is it from? When was it added?
Long version! :)
I'm looking at using a family of MT's with different parameters (for reasons). Nishimura published this paper describing a 64-bit MT implementation and a number of alternatives for the A matrix and (u,s,t,l,b,c) tempering parameters. Like the original 32-bit MT paper, this describes the tempering procedure as:
y := x xor (x >> u)
y := x xor ((y << s) and b)
y := x xor ((y << t) and c)
z := x xor (y >> l)
However, real implementations ( a b ) and the parameters described in the MT Wikipedia page have an extra bit mask parameter (referred to as 'd'), applied to the first step of the tempering procedure:
x ^= (x >> 29) & 0x5555555555555555ULL;
x ^= (x << 17) & 0x71D67FFFEDA60000ULL;
x ^= (x << 37) & 0xFFF7EEE000000000ULL;
x ^= (x >> 43);
I don't want to just 'blindly' use the tempering parameters that exclude the parameter 'd' without understanding what it's for... but I can't seem to find any references for why this 'd' parameter was added. (Presumably, it's an improvement...)
Any ideas on what it's for, any references to why it was added?
Related
We try to translate a very simple program in pseudo-code to Predicate Logic.
The program is straightforward and does not contain loops. (sequential)
It only consists of assignments of variables and if-else statements.
Unfortunately we do not have any good information provided to solve the problem. It would be great if someone has some
examples "conversions" of simple 5liner code snippets or
links to sources for free information, which describe the topic on the surface level. ( We only do predicate and prepositional logic and do not want to dive much deeper in the logic space. )
Kind regards
UPDATE:
After enough research I found the solution and can share it inc. examples.
The trick is to think of the program state as a set of all our arbitrary variables inc. a program counter which stands for the current instruction to be executed.
x = input;
x = x*2;
if (y>0)
x = x∗y ;
else
x = y;
We will form the Predicate P(x,i,y,pc).
From here we can build promises e.g.:
∀i∀x∀y(P (x, i, y, 1) => P (i, i, y, 2))
∀i∀x∀y(P (x, i, y, 2) => P (x ∗ 2, i, y, 3))
∀i∀x∀y(P (x, i, y, 3) ∧ y > 0 =⇒ P (x ∗ y, i, y, 4))
∀i∀x∀y(P (x, i, y, 3) ∧ ¬y > 0 =⇒ P (y, i, y, 4))
By incrementing the Program counter we make sure that the promises follow in order. Now we are able to define make a proof when given a premise for the Input e.g. P(x,4,7,1).
I was reading the code of the implementation of (^) of the standard haskell library :
(^) :: (Num a, Integral b) => a -> b -> a
x0 ^ y0 | y0 < 0 = errorWithoutStackTrace "Negative exponent"
| y0 == 0 = 1
| otherwise = f x0 y0
where -- f : x0 ^ y0 = x ^ y
f x y | even y = f (x * x) (y `quot` 2)
| y == 1 = x
| otherwise = g (x * x) ((y - 1) `quot` 2) x
-- g : x0 ^ y0 = (x ^ y) * z
g x y z | even y = g (x * x) (y `quot` 2) z
| y == 1 = x * z
| otherwise = g (x * x) ((y - 1) `quot` 2) (x * z)
Now this part where g is defined seems odd to me why not just implement it like this:
expo :: (Num a ,Integral b) => a -> b ->a
expo x0 y0
| y0 == 0 = 1
| y0 < 0 = errorWithoutStackTrace "Negative exponent"
| otherwise = f x0 y0
where
f x y | even y = f (x*x) (y `quot` 2)
| y==1 = x
| otherwise = x * f x (y-1)
But indeed plugging in say 3^1000000 shows that (^) is about 0,04 seconds faster than expo.
Why is (^) faster than expo?
As the person who wrote the code, I can tell you why it's complex. :)
The idea is to be tail recursive to get loops, and also to perform the minimum number of multiplications. I don't like the complexity, so if you find a more elegant way please file a bug report.
A function is tail-recursive if the return value of a recursive call is returned as-is, without further processing. In expo, f is not tail-recursive, because of otherwise = x * f x (y-1): the return value of f is multiplied by x before it is returned. Both f and g in (^) are tail-recursive, because their return values are returned unmodified.
Why does this matter? Tail-recursive functions can implemented much more efficiently than general recursive functions. Because the compiler doesn't need to create a new context (stack frame, what have you) for a recursive call, it can reuse the caller's context as the context of the recursive call. This saves a lot of the overhead of calling a function, much like in-lining a function is more efficient than calling the function proper.
Whenever you see a bread-and-butter function in the standard library and it's implemented weirdly, the reason is almost always "because doing it like that triggers some special performance-critical optimization [possibly in a different version of the compiler]".
These odd workarounds are usually to "force" the compiler to notice that some specific, important optimization is possible (e.g., to force a particular argument to be considered strict, to allow worker/wrapper transformation, whatever). Typically some person has compiled their program, noticed it's epicly slow, complained to the GHC devs, and they looked at the compiled code and thought "oh, GHC isn't seeing that it can inline that 3rd worker function... how do I fix that?" The result is that if you rephrase the code just slightly, the desired optimization then fires.
You say you tested it and there's not much speed difference. You didn't say for what type. (Is the exponent Int or Integer? What about the base? It's quite possible it makes a significant difference in some obscure case.)
Occasionally functions are also implemented weirdly to maintain strictness / laziness guarantees. (E.g., the library spec says it has to work a certain way, and implementing it the most obvious way would make the function more strict / less strict than the spec claims.)
I don't know what's up with this specific function, but I would suggest #chi is probably onto something.
Given the following two functions in C language:
int f(int x, int y, int z) {
return (x & y) | ((~x) & z);
}
int g(int x, int y, int z) {
return z ^ (x & (y ^ z));
}
The results of the two functions are equal for any valid integer.
I just wonder the mathematics between the two expressions.
I've first seen the expression for function f in the SHA-1 algorithm on wikipedia.
http://en.wikipedia.org/wiki/Sha1
In the "SHA-1 pseudocode" part, inside the Main loop:
if 0 ≤ i ≤ 19 then
f = (b and c) or ((not b) and d)
k = 0x5A827999
...
In some open source implementation, it uses the form in function g: z ^ (x & (y ^ z)).
I write a program and iterate all the possible values for x, y, z, and all the results are equal.
How to deduce the form
(x & y) | ((~x) & z)
to the form
z ^ (x & (y ^ z))
in mathematics? Not just only proving the equality.
Since bitwise operations are equivalent to boolean operations on the individual bits, you can prove the equivalence simply by enumerating the eight assignments of the {x, y, z} three-tuples.
Fill out the truth tables for each of these two functions, and then compare the eight positions to each other. If all eight positions match, the two functions are equivalent; otherwise, the functions are different.
You do not need to do it manually either: plug in both functions in three nested loops that give x, y, and z values from zero to one, inclusive, and compare the results of invoking f(x,y,z) to g(x,y,z).
You can do this using a Karnaugh Map. Given the truth table for z ^ (x & (y ^ z)), the Karnaugh map is:
As can be seen, you can make two groups from the diagram, giving you (x & y) | (~x & z)
When I XOR any two numbers, I am getting either absolute value of their difference or sum.
I have searched a lot on Google trying to find out any relevant formula for this. But no apparent formula or statement is available on this.
Example:
10 XOR 2 = 1010 XOR 10 = 1000(8)
1 XOR 2 = 01 XOR 10 = 11(3)
Is it true for all the numbers?
No, it's not always true.
6 = 110
3 = 11
----
XOR 101 = 5
SUM 9
DIFF 3
This is by no means a complete analysis, but here's what I see:
For your first example, the least significant bits of 1010 are the same as the bits of 10, which would cause that you get the difference when XORing.
For your second example, all the corresponding bits are different, which would cause that you get the sum when XORing.
Why these properties hold should be fairly easy to see.
As shown by Dukelings answer and CmdrMoozy's comment, it's not always true. As shown by your post, it's true at least sometimes. So here's a slightly more detailed analysis.
The +-side
Obviously, if (but not only if) (x & y) == 0 then (x ^ y) == x + y, because
x + y = (x ^ y) + ((x & y) << 1)
That accounts for 332 cases (for every bit position, there are 3 choices that result in a 0 after the AND) where (x ^ y) == (x + y).
Then there are the cases where (x & y) != 0. Those cases are precisely the cases such that
(x & y) == 0x80000000, because the carry out of the highest bit is the only carry that doesn't affect anything.
That adds 331 cases (for 31 bit positions there are 3 choices, for the highest bit there is only 1 choice).
The --side
For subtraction, there's the lesser known identity x - y == (x ^ y) - ((~x & y) << 1).
That's really not too different from addition, and the analysis almost the same. This time, if (but not only if) (~x & y) == 0 then (x ^ y) == x - y. That ~ doesn't change the number of cases: still 332. Most of them are different cases than before, but not all (consider y = 0, then x can be anything).
There are again 331 extra cases, this time from (~x & y) == 0x80000000.
Both sides
The + and - sides aren't disjoint. Sometimes, x ^ y = x + y = x - y. That can only happen when either y = 0 or y = 0x80000000. If y = 0, x can be anything because (x & 0) == 0 and (~x & 0) == 0 for all x. If y = 0x80000000, x can again be anything, this time because x & 0x80000000 and ~x & 0x80000000 can both either work out to 0 or to 0x80000000, and both are fine.
That gives 233 cases where x ^ y = x + y = x - y.
It also gives (332 + 331) * 2 - 233 cases where x ^ y is x + y or x - y or both, which is 4941378580336984 or in base 16, 118e285afb5158, which is also the answer given by this site.
That's a lot of cases, but only roughly 0.02679% of the total space of 264.
Actually, there's an interesting answer to your observation and it can be explained why you observe this for so many numbers.
There's a relationship between a + b and a ^ b. It is given by:
a + b = a^b + 2*(a & b)
Hence,
a^b = a + b - 2*(a & b)
(where ^ is the bitwise XOR and & is bitwise AND)
see this link to get some more idea about the above relation. Hence, for every a and b, where a & b = 0 you will get a+b = a^b which explains the sum part. And if a & b is not equal to 0, then that explains the difference part. Hope it clarifies your question! :D
Let assume N is power of 2 for some K (N=2 pow k)
then
0<=X<N --> N XOR X is always sum of N and X
N<=Y<(2 pow k+1) --> N XOR Y is always diff of N and Y
What would be an idiomatic way of generating an unique number (say, a 64bit unsigned int) from two values, in such a way that the input values (also numbers of the same type) could be regenerated from the number, as a Haskell function?
On C/C++ I would probably use something like
result = (((value1) << BITS) + ((value2) & ((1 << BITS) - 1)))
and, accordingly,
value1 = (result >> BITS)
and
value2 = (result & ((1 << BITS) - 1))
for regenerating the values, but I don't think I should be trying to use bitwise operations in Haskell.
After consideration, I simply abandoned the idea of using bitwise operations and resorted to Cantor's pairing function:
pair :: (Fractional a) => a -> a -> a
pair x y = (1 / 2) * (x + y) * (x + y + 1) + y
unpair :: (RealFrac a, Floating a) => a -> (a, a)
unpair z = (x, y) where
q = (-1 / 2) + sqrt (1 / 4 + 2 * z)
j = fromInteger (truncate q)
y = z - ((1 / 2) * j * (j + 1))
x = j - y
This is probably the way I should have thought from the beginning. Thank you all very much for helping me to better understand bit operations on Haskell, though.
You can use the exact same way in Haskell. Bitwise operations can be found in Data.Bits and unsigned, fixed-sized integer types in Data.Word. For example:
import Data.Bits
import Data.Word
combine :: Word32 -> Word32 -> Word64
combine a b = (fromIntegral a `shiftL` 32) + fromIntegral b
separate :: Word64 -> (Word32, Word32)
separate w = (fromIntegral $ w `shiftR` 32, fromIntegral $ w .&. 0xffff)
The thing that might trip you up compared to C is that Haskell never converts between different numeric types implicitly, so you need to use fromIntegral to convert between e.g. 32bit and 64bit unsigned integers.