How does ConstantTimeByteEq work? - go

In Go's crytography library, I found this function ConstantTimeByteEq. What does it do, how does it work?
// ConstantTimeByteEq returns 1 if x == y and 0 otherwise.
func ConstantTimeByteEq(x, y uint8) int {
z := ^(x ^ y)
z &= z >> 4
z &= z >> 2
z &= z >> 1
return int(z)

x ^ y is x XOR y, where the result is 1 when the arguments are different and 0 when the arguments are the same:
x = 01010011
y = 00010011
x ^ y = 01000000
^(x ^ y) negates this, i.e., you get 0 when the arguments are different and 1 otherwise:
^(x ^ y) = 10111111 => z
Then we start shifting z to the right for masking its bits by itself. A shift pads the left side of the number with zero bits:
z >> 4 = 00001011
With the goal of propagating any zeros in z to the result, start ANDing:
z = 10111111
z >> 4 = 00001011
z & (z >> 4) = 00001011
also fold the new value to move any zero to the right:
z = 00001011
z >> 2 = 00000010
z & (z >> 2) = 00000010
further fold to the last bit:
z = 00000010
z >> 1 = 00000001
z & (z >> 1) = 00000000
On the other hand, if you have x == y initially, it goes like this:
z = 11111111
z (& z >> 4) = 00001111
z (& z >> 2) = 00000011
z (& z >> 1) = 00000001
So it really returns 1 when x == y, 0 otherwise.
Generally, if both x and y are zero the comparison can take less time than other cases. This function tries to make it so that all calls take the same time regardless of the values of its inputs. This way, an attacker can't use timing based attacks.

It does exactly what the documentation says: It checks if x and y are equal. From a functional point it is just x == y, dead simple.
Doing x == y in this cryptic bit-fiddling-way prevent timing side attacks to algorithms: A x == y may get compiled to code which performs faster if x = y and slower if x != y (or the other way around) due to branch prediction in CPUs. This can be used by an attacker to learn something about the data handled by the cryptographic routines and thus compromise security.


For given two integers A and B, find a pair of numbers X and Y such that A = X*Y and B = X xor Y

I'm struggling with this problem I've found in a competitive programming book, but without a solution how to do it. For given two integers A and B (can fit in 64-bit integer type), where A is odd, find a pair of numbers X and Y such that A = X*Y and B = X xor Y.
My approach was to list all divisors of A and try pairing numbers under sqrt(A) with numbers over sqrt(A) that multiply up to A and see if their xor is equal to B. But I don't know if that's efficient enough.
What would be a good solution/algorithm to this problem?
You know that at least one factor is <= sqrt(A). Let's make that one X.
The length of X in bits will be about half the length of A.
The upper bits of X, therefore -- the ones higher in value than sqrt(A) -- are all 0, and the corresponding bits in B must have the same value as the corresponding bits in Y.
Knowing the upper bits of Y gives you a pretty small range for the corresponding factor X = A/Y. Calculate Xmin and Xmax corresponding to the largest and smallest possible values for Y, respectively. Remember that Xmax must also be <= sqrt(A).
Then just try all the possible Xs between Xmin and Xmax. There won't be too many, so it won't take very long.
The other straightforward way to solve this problem relies on the fact that the lower n bits of XY and X xor Y depend only on the lower n bits of X and Y. Therefore, you can use the possible answers for the lower n bits to restrict the possible answers for the lower n+1 bits, until you're done.
I've worked out that, unfortunately, there can be more than one possibility for a single n. I don't know how often there will be a lot of possibilities, but it's probably not too often if at all, so this may be fine in a competitive context. Probabilistically, there will only be a few possibilities, since a solution for n bits will provide either 0 or two solutions for n+1 bits, with equal probability.
It seems to work out pretty well for random input. Here's the code I used to test it:
public static void solve(long A, long B)
List<Long> sols = new ArrayList<>();
List<Long> prevSols = new ArrayList<>();
long tests=0;
System.out.print("Solving "+A+","+B+"... ");
for (long bit=1; (A/bit)>=bit; bit<<=1)
tests += sols.size();
List<Long> t = prevSols;
prevSols = sols;
sols = t;
final long mask = bit|(bit-1);
for (long prevx : prevSols)
long prevy = (prevx^B) & mask;
if ((((prevx*prevy)^A)&mask) == 0)
long x = prevx | bit;
long y = (x^B)&mask;
if ((((x*y)^A)&mask) == 0)
tests += sols.size();
List<Long> t = prevSols;
prevSols = sols;
sols = t;
for (long testx: prevSols)
if (A/testx >= testx)
long testy = B^testx;
if (testx * testy == A)
System.out.println("" + tests + " checks -> X=" + sols);
public static void main(String[] args)
Random rand = new Random();
for (int range=Integer.MAX_VALUE; range > 32; range -= (range>>5))
long A = rand.nextLong() & Long.MAX_VALUE;
long X = (rand.nextInt(range)) + 2L;
long Y = A/X;
if (Y==0)
Y = rand.nextInt(65536);
solve(X*Y, X^Y);
You can see the results here:
Looks like it usually only takes a couple thousand checks.
Here's a simple recursion that observes the rules we know: (1) the least significant bits of both X and Y are set since only odd multiplicands yield an odd multiple; (2) if we set X to have the highest set bit of B, Y cannot be greater than sqrt(A); and (3) set bits in X or Y according to the current bit in B.
The following Python code resulted in under 300 iterations for all but one of the random pairs I picked from Matt Timmermans' example code. But the first one took 231,199 iterations :)
from math import sqrt
def f(A, B):
i = 64
while not ((1<<i) & B):
i = i - 1
X = 1 | (1 << i)
sqrtA = int(sqrt(A))
j = 64
while not ((1<<j) & sqrtA):
j = j - 1
if (j > i):
i = j + 1
memo = {"it": 0, "stop": False, "solution": []}
def g(b, x, y):
memo["it"] = memo["it"] + 1
if memo["stop"]:
return []
if y > sqrtA or y * x > A:
return []
if b == 0:
if x * y == A:
memo["solution"].append((x, y))
memo["stop"] = True
return [(x, y)]
return []
bit = 1 << b
if B & bit:
return g(b - 1, x, y | bit) + g(b - 1, x | bit, y)
return g(b - 1, x | bit, y | bit) + g(b - 1, x, y)
g(i - 1, X, 1)
return memo
vals = [
(6872997084689100999, 2637233646), # 1048 checks with Matt's code
(3461781732514363153, 262193934464), # 8756 checks with Matt's code
(931590259044275343, 5343859294), # 4628 checks with Matt's code
(2390503072583010999, 22219728382), # 5188 checks with Matt's code
(412975927819062465, 9399702487040), # 8324 checks with Matt's code
(9105477787064988985, 211755297373604352), # 3204 checks with Matt's code
(4978113409908739575,67966612030), # 5232 checks with Matt's code
(6175356111962773143,1264664368613886), # 3756 checks with Matt's code
(648518352783802375, 6) # B smaller than sqrt(A)
for A, B in vals:
memo = f(A, B)
[(x, y)] = memo["solution"]
print "x, y: %s, %s" % (x, y)
print "A: %s" % A
print "x*y: %s" % (x * y)
print "B: %s" % B
print "x^y: %s" % (x ^ y)
print "%s iterations" % memo["it"]
print ""
x, y: 4251585939, 1616572541
A: 6872997084689100999
x*y: 6872997084689100999
B: 2637233646
x^y: 2637233646
231199 iterations
x, y: 262180735447, 13203799
A: 3461781732514363153
x*y: 3461781732514363153
B: 262193934464
x^y: 262193934464
73 iterations
x, y: 5171068311, 180154313
A: 931590259044275343
x*y: 931590259044275343
B: 5343859294
x^y: 5343859294
257 iterations
x, y: 22180179939, 107776541
A: 2390503072583010999
x*y: 2390503072583010999
B: 22219728382
x^y: 22219728382
67 iterations
x, y: 9399702465439, 43935
A: 412975927819062465
x*y: 412975927819062465
B: 9399702487040
x^y: 9399702487040
85 iterations
x, y: 211755297373604395, 43
A: 9105477787064988985
x*y: 9105477787064988985
B: 211755297373604352
x^y: 211755297373604352
113 iterations
x, y: 68039759325, 73164771
A: 4978113409908739575
x*y: 4978113409908739575
B: 67966612030
x^y: 67966612030
69 iterations
x, y: 1264664368618221, 4883
A: 6175356111962773143
x*y: 6175356111962773143
B: 1264664368613886
x^y: 1264664368613886
99 iterations
x, y: 805306375, 805306369
A: 648518352783802375
x*y: 648518352783802375
B: 6
x^y: 6
59 iterations

Solving linear equations

I have to find out the integral solution of a equation ax+by=c such that x>=0 and y>=0 and value of (x+y) is minimum.
I know if c%gcd(a,b)}==0 then it's always possible. How to find the values of x and y?
My approach
for(i 0 to 2*c):
y= (c-a*i)/b
if(y is integer)
ans = min(ans,x+y)
Is there any better way to do this ? Having better time complexity.
Using the Extended Euclidean Algorithm and the theory of linear Diophantine equations there is no need to search. Here is a Python 3 implementation:
def egcd(a,b):
s,t = 1,0 #coefficients to express current a in terms of original a,b
x,y = 0,1 #coefficients to express current b in terms of original a,b
q,r = divmod(a,b)
while(r > 0):
a,b = b,r
old_x, old_y = x,y
x,y = s - q*x, t - q*y
s,t = old_x, old_y
q,r = divmod(a,b)
return b, x ,y
def smallestSolution(a,b,c):
d,x,y = egcd(a,b)
if c%d != 0:
return "No integer solutions"
u = a//d #integer division
v = b//d
w = c//d
x = w*x
y = w*y
k1 = -x//v if -x % v == 0 else 1 + -x//v #k1 = ceiling(-x/v)
x1 = x + k1*v # x + k1*v is solution with smallest x >= 0
y1 = y - k1*u
if y1 < 0:
return "No nonnegative integer solutions"
k2 = y//u #floor division
x2 = x + k2*v #y-k2*u is solution with smallest y >= 0
y2 = y - k2*u
if x2 < 0 or x1+y1 < x2+y2:
return (x1,y1)
return (x2,y2)
Typical run:
>>> smallestSolution(1001,2743,160485)
(111, 18)
The way it works: first use the extended Euclidean algorithm to find d = gcd(a,b) and one solution, (x,y). All other solutions are of the form (x+k*v,y-k*u) where u = a/d and v = b/d. Since x+y is linear, it has no critical points, hence is minimized in the first quadrant when either x is as small as possible or y is as small as possible. The k above is an arbitrary integer parameter. By appropriate use of floor and ceiling you can locate the integer points with either x as small as possible or y is as small as possible. Just take the one with the smallest sum.
On Edit: My original code used the Python function math.ceiling applied to -x/v. This is problematic for very large integers. I tweaked it so that the ceiling is computed with just int operations. It can now handle arbitrarily large numbers:
>>> a = 236317407839490590865554550063
>>> b = 127372335361192567404918884983
>>> c = 475864993503739844164597027155993229496457605245403456517677648564321
>>> smallestSolution(a,b,c)
(2013668810262278187384582192404963131387, 120334243940259443613787580180)
>>> x,y = _
>>> a*x+b*y
Most of the computation takes place in the running the extended Euclidean algorithm, which is known to be O(min(a,b)).
First let assume a,b,c>0 so:
a.x+b.y = c
x+y = min(xi+yi)
x,y >= 0
a,b,c > 0
x = ( c - b.y )/a
y = ( c - a.x )/b
c - a.x >= 0
c - b.y >= 0
c >= b.y
c >= a.x
x <= c/x
y <= c/b
So naive O(n) solution is in C++ like this:
void compute0(int &x,int &y,int a,int b,int c) // naive
int xx,yy;
xx=-1; yy=-1;
for (y=0;;y++)
x = c - b*y;
if (x<0) break; // y out of range stop
if (x%a) continue; // non integer solution
x/=a; // remember minimal solution
if ((xx<0)||(x+y<=xx+yy)) { xx=x; yy=y; }
x=xx; y=yy;
if no solution found it returns -1,-1 If you think about the equation a bit then you should realize that min solution will be when x or y is minimal (which one depends on a<b condition) so adding such heuristics we can increase only the minimal coordinate until first solution found. This will speed up considerably the whole thing:
void compute1(int &x,int &y,int a,int b,int c)
if (a<=b){ for (x=0,y=c;y>=0;x++,y-=a) if (y%b==0) { y/=b; return; } }
else { for (y=0,x=c;x>=0;y++,x-=b) if (x%a==0) { x/=a; return; } }
x=-1; y=-1;
I measured this on my setup:
x y ax+by x+y a=50 b=105 c=500000000
[ 55.910 ms] 10 4761900 500000000 4761910 naive
[ 0.000 ms] 10 4761900 500000000 4761910 opt
x y ax+by x+y a=105 b=50 c=500000000
[ 99.214 ms] 4761900 10 500000000 4761910 naive
[ 0.000 ms] 4761900 10 500000000 4761910 opt
The ~2.0x difference for naive method times is due to a/b=~2.0and selecting worse coordinate to iterate in the second run.
Now just handle special cases when a,b,c are zero (to avoid division by zero)...

Can anyone explain how this division algorithm works?

I saw this in an algorithm textbook. I am confused about the middle recursive function. If you can explain it with an example, such as 4/2, that would be great!
function divide(x, y)
Input: Two n-bit integers x and y, where y ≥ 1
Output: The quotient and remainder of x divided by y
if x = 0: return (q, r) = (0, 0)
(q, r) = divide(floor(x/2), y)
q = 2 · q, r = 2 · r
if x is odd: r = r + 1
if r ≥ y: r = r − y, q = q + 1
return (q, r)
You're seeing how many times it's divisible by 2. This is essentially performing bit shifts and operating on the binary digits. A more interesting case would be 13/3 (13 is 1101 in binary).
divide(13, 3) // initial binary value - 1101
divide(6, 3) // shift right - 110
divide(3, 3) // shift right - 11
divide(1, 3) // shift right - 1 (this is the most significant bit)
divide(0, 3) // shift right - 0 (no more significant bits)
return(0, 0) // roll it back up
return(0, 1) // since x is odd (1)
return(1, 0) // r = r * 2 = 2; x is odd (3) so r = 3 and the r > y condition is true
return(2, 0) // q = 2 * 1; r = 2 * 1 - so r >= y and q = 2 + 1
return(4, 1) // q = 2 * 2; x is odd to r = 0 + 1

Can I avoid "rightward drift" in Haskell?

When I use an imperative language I often write code like
foo (x) {
if (x < 0) return True;
y = getForX(x);
if (y < 0) return True;
return x < y;
That is, I check conditions off one by one, breaking out of the block as soon
as possible.
I like this because it keeps the code "flat" and obeys the principle of "end
weight". I consider it to be more readable.
But in Haskell I would have written that as
foo x = do
if x < 0
then return x
else do
y <- getForX x
if y < 0
then return True
else return $ x < y
Which I don't like as much. I could use a monad that allows breaking out, but
since I'm already using a monad I'd have to lift everything, which adds words
I'd like to avoid if I can.
I suppose there's not really a perfect solution to this but does anyone have
any advice?
For your specific question: How about dangling do notation and the usage of logic?
foo x = do
if x < 0 then return x else do
y <- getForX x
return $ y < 0 || x < y
Combined with what hammar said, you can even get more beautiful code:
foo x | x < 0 = return x
| otherwise = do y <- getForX x
return $ y < 0 || x < y
Using patterns and guards can help a lot:
foo x | x < 0 = return x
foo x = do
y <- getForX x
if y < 0
then return True
else return $ x < y
You can also introduce small helper functions in a where clause. That tends to help readability as well.
foo x | x < 0 = return x
foo x = do
y <- getForX x
return $ bar y
bar y | y < 0 = True
| otherwise = x < y
(Or if the code really is as simple as this example, use logic as FUZxxl suggested).
The best way to do this is using guards, but then you need to have the y value first in order to use it in the guard. That needs to be gotten from getForX wich might be tucked away into some monad that you cannot get the value out from except through getForX (for example the IO monad) and then you have to lift the pure function that uses guards into that monad. One way of doing this is by using liftM.
foo x = liftM go (getForX x)
go y | x < 0 = True
| y < 0 = True
| otherwise = x < y
Isn't it just
foo x = x < y || y < 0 where y = getForX x
EDIT: As Owen pointed out - getForX is monadic so my code above would not work. The below version probably should:
foo x = do
y <- getForX x
return (x < y || y < 0)

Bit interleaving optimized for Ruby

Granted, optimizing bit twiddling in Ruby is a bit of a mismatch to begin with. That aside, I'm looking for a snippet or a gem that can interleave two arbitrary integer coords optimized as best can be for MRI (1.9) or a native gem.
Some approaches in C are:
As an example or starting point, here's "Interleave bits the obvious way" in Ruby, somewhat uglified to keep it from creating temp arrays (which increase the runtime by about 2X per array) and with a binary length method inlined for a further 6% decrease (If you know neither input is ever zero, you can omit that check for a few percent more..)
def interleave(y)
z = 0
bl = self > 0 ? Math.log2(self) : 1
ybl = y > 0 ? Math.log2(y) : 1
((((bl <=> ybl) == -1) ? ybl : bl).floor + 1).times{|i| z |= (self & 1 << i) << i | (y & 1 << i) << (i + 1)}
return z
Results from a 2.66Ghz i5 with 1.9.2p180:
x = y = 0b11111111_11111111_11111111_11111111{|bm|{1000000.times{x.interleave(y)}}}
user system total real
18.360000 0.010000 18.370000 ( 18.356196)
Surely there's a better way?
I included the zero fix from #Wayne Conrad, albeit far uglier than his and only marginally faster. Also moved the floor and + 1 so as to be executed once instead of twice per.
Here is a Gist of this with matching de-interleave.
Here's a quick & cheesy implementation to get you going until a good one comes along:
def mortanize(x, y)
xs, ys = [x, y].map do |n|
nbits = [xs, ys].map(&:size).max
xs, ys = [xs, ys].map do |n|
('0' * (nbits - n.size) + n).chars
As you might expect, it's no speed deamon. On my box, with MRI 1.8.7, it computes about 35,000 16-bit results per second. Yours computes 68,000 16-bit results per second. Or, see the next algorithm for 256,000 16-bit results per second.
If you're willing to trade a little memory and startup time for speed, then:
def base_mortanize(x, y)
xs, ys = [x, y].map do |n|
nbits = [xs, ys].map(&:size).max
xs, ys = [xs, ys].map do |n|
('0' * (nbits - n.size) + n).chars
base_mortanize(x, 0)
base_mortanize(0, y)
def mortanize(x, y)
z = []
while (x > 0 || y > 0)
z << (MORTON_TABLE_X[x & 0xff] | MORTON_TABLE_Y[y & 0xff])
x >>= 8
y >>= 8
z.reverse.inject(0) do |result, word|
result << 16 | word
This one computes 256,000 16-bit results per second.
There's a bug in your answer if either argument is zero. Here's one possible fix for it. First define this function:
def bit_size(x)
return 1 if x == 0
Math.log2(x).floor + 1
And then, inside interleave, replace:
z, bl, ybl = 0, (Math.log2(self)).floor + 1, (Math.log2(y)).floor + 1
z = 0
bl = bit_size(x)
ybl = bit_size(y)
Here is the rspec test case I used:
describe "mortanize" do
it "should interleave integers" do
mortanize(0, 0).should eql 0
mortanize(0, 1).should eql 2
mortanize(1, 0).should eql 1
mortanize(0xf, 0x3).should eql 0x5f
mortanize(0x3, 0xf).should eql 0xaf
mortanize(0xf, 0x0).should eql 0x55
mortanize(0x0, 0xf).should eql 0xaa
mortanize(0x3, 0xc).should eql 0xa5
mortanize(0xf, 0xf).should eql 0xff
mortanize(0x1234, 0x4321).should eql 0x210e0d12
Here's another solution, benchmarked about 50% faster than the accepted one, and for 16-bit integers (where the first one only does 8-bit):
Magic = [0x55555555, 0x33333333, 0x0F0F0F0F, 0x00FF00FF]
# Interleave lower 16 bits of x and y, so the bits of x
# are in the even positions and bits from y in the odd;
# z gets the resulting 32-bit Morton Number.
# x and y must initially be less than 65536.
# Rubyfied from
def _interleave_bits_16b(x,y)
x = (x | (x << 8)) & Magic[3]
x = (x | (x << 4)) & Magic[2]
x = (x | (x << 2)) & Magic[1]
x = (x | (x << 1)) & Magic[0]
y = (y | (y << 8)) & Magic[3]
y = (y | (y << 4)) & Magic[2]
y = (y | (y << 2)) & Magic[1]
y = (y | (y << 1)) & Magic[0]
z = x | (y << 1)
If you have an implementation already in C, you can use FFI, otherwise you can write it directly with the help of RubyInline
