Can I avoid "rightward drift" in Haskell? - coding-style

When I use an imperative language I often write code like
foo (x) {
if (x < 0) return True;
y = getForX(x);
if (y < 0) return True;
return x < y;
}
That is, I check conditions off one by one, breaking out of the block as soon
as possible.
I like this because it keeps the code "flat" and obeys the principle of "end
weight". I consider it to be more readable.
But in Haskell I would have written that as
foo x = do
if x < 0
then return x
else do
y <- getForX x
if y < 0
then return True
else return $ x < y
Which I don't like as much. I could use a monad that allows breaking out, but
since I'm already using a monad I'd have to lift everything, which adds words
I'd like to avoid if I can.
I suppose there's not really a perfect solution to this but does anyone have
any advice?

For your specific question: How about dangling do notation and the usage of logic?
foo x = do
if x < 0 then return x else do
y <- getForX x
return $ y < 0 || x < y
Edit
Combined with what hammar said, you can even get more beautiful code:
foo x | x < 0 = return x
| otherwise = do y <- getForX x
return $ y < 0 || x < y

Using patterns and guards can help a lot:
foo x | x < 0 = return x
foo x = do
y <- getForX x
if y < 0
then return True
else return $ x < y
You can also introduce small helper functions in a where clause. That tends to help readability as well.
foo x | x < 0 = return x
foo x = do
y <- getForX x
return $ bar y
where
bar y | y < 0 = True
| otherwise = x < y
(Or if the code really is as simple as this example, use logic as FUZxxl suggested).

The best way to do this is using guards, but then you need to have the y value first in order to use it in the guard. That needs to be gotten from getForX wich might be tucked away into some monad that you cannot get the value out from except through getForX (for example the IO monad) and then you have to lift the pure function that uses guards into that monad. One way of doing this is by using liftM.
foo x = liftM go (getForX x)
where
go y | x < 0 = True
| y < 0 = True
| otherwise = x < y

Isn't it just
foo x = x < y || y < 0 where y = getForX x
EDIT: As Owen pointed out - getForX is monadic so my code above would not work. The below version probably should:
foo x = do
y <- getForX x
return (x < y || y < 0)

Related

A simple prolog program

I want to implement the following function f(x,y) in Prolog
f(x,y) = a*x+b*y
where a = 1 if x > 0; a = -1 if x < 0; a = 0 if x = 0
and b = -1 if y > 0; b = 1 if y < 0; b = 0 if y = 0
For example,
f(2,-1) = 1*2 + 1*(-1) = 1
f(-2,-1) = (-1)*(-2) + (-1)*1 = 1
f(0,0) = 0*0 + 0*0 = 0
Any one can help?
How about using the following formulation?
f(X,Y,Result) :-
Result is abs(X) - abs(Y).
Let's run some queries:
?- f(0,0,0).
true.
?- f(-2,-1,1).
true.
?- f(2,-1,1).
true.
(Assuming you have a typo when defining y, y>0 not y>=0.)
You need to define a relation between the input vars and the result of the function. Prolog can then answer yes/true with substitutions or no/false.
f(X,Y,Answer):-
a_is(X,A),
b_is(Y,B),
Answer is A*X+B*Y.
a_is(X,1):-
X>0.
a_is(X,-1):-
X<0.
a_is(0,0).
b_is(Y,1):-
Y<0.
b_is(Y,-1):-
Y>0.
b_is(0,0).
Example:
?-f(2,-1,Answer).
Answer =1;
false.
Shouldn't be much more complicated than this one-liner:
f(X,Y,Z) :- Z is sign(X)*X + -sign(Y)*Y

How does ConstantTimeByteEq work?

In Go's crytography library, I found this function ConstantTimeByteEq. What does it do, how does it work?
// ConstantTimeByteEq returns 1 if x == y and 0 otherwise.
func ConstantTimeByteEq(x, y uint8) int {
z := ^(x ^ y)
z &= z >> 4
z &= z >> 2
z &= z >> 1
return int(z)
}
x ^ y is x XOR y, where the result is 1 when the arguments are different and 0 when the arguments are the same:
x = 01010011
y = 00010011
x ^ y = 01000000
^(x ^ y) negates this, i.e., you get 0 when the arguments are different and 1 otherwise:
^(x ^ y) = 10111111 => z
Then we start shifting z to the right for masking its bits by itself. A shift pads the left side of the number with zero bits:
z >> 4 = 00001011
With the goal of propagating any zeros in z to the result, start ANDing:
z = 10111111
z >> 4 = 00001011
z & (z >> 4) = 00001011
also fold the new value to move any zero to the right:
z = 00001011
z >> 2 = 00000010
z & (z >> 2) = 00000010
further fold to the last bit:
z = 00000010
z >> 1 = 00000001
z & (z >> 1) = 00000000
On the other hand, if you have x == y initially, it goes like this:
z = 11111111
z (& z >> 4) = 00001111
z (& z >> 2) = 00000011
z (& z >> 1) = 00000001
So it really returns 1 when x == y, 0 otherwise.
Generally, if both x and y are zero the comparison can take less time than other cases. This function tries to make it so that all calls take the same time regardless of the values of its inputs. This way, an attacker can't use timing based attacks.
It does exactly what the documentation says: It checks if x and y are equal. From a functional point it is just x == y, dead simple.
Doing x == y in this cryptic bit-fiddling-way prevent timing side attacks to algorithms: A x == y may get compiled to code which performs faster if x = y and slower if x != y (or the other way around) due to branch prediction in CPUs. This can be used by an attacker to learn something about the data handled by the cryptographic routines and thus compromise security.

Is {true} x := y { x = y } a valid Hoare triple?

I am not sure that
{ true } x := y { x = y }
is a valid Hoare triple.
I am not sure one is allowed to reference a variable (in this case, y), without explicitly defining it first either in the triple program body or in the pre-condition.
{ y=1 } x := y { x = y } //valid
{true} y := 1; x := y { x = y } //valid
How is it?
I am not sure that
{ true } x := y { x = y }
is a valid Hoare triple.
The triple should be read as follows:
"Regardless of starting state, after executing x:=y x equals y."
and it does hold. The formal argument for why it holds is that
the weakest precondition of x := y given postcondition { x = y } is { y = y }, and
{ true } implies { y = y }.
However, I completely understand why you feel uneasy about this triple, and you're worried for a good reason!
The triple is badly formulated because the pre- and post condition do not provide a useful specification. Why? Because (as you've discovered) x := 0; y := 0 also satisfies the spec, since x = y holds after execution.
Clearly, x := 0; y := 0 is not a very useful implementation and the reason why it still satisfies the specification, is (according to me) due to a specification bug.
How to fix this:
The "correct" way of expressing the specification is to make sure the specification is self contained by using some meta variables that the program can't possible access (x₀ and y₀ in this case):
{ x=x₀ ∧ y=y₀ } x := y { x=y₀ ∧ y=y₀ }
Here x := 0; y := 0 no longer satisfies the post condition.
{ true } x := y { x = y } is a valid Hoare triple. The reason is as follows:
x := y is an assignment, therefore, replace that in the precondition.
The precondition stands as {y=y}, which implies {true}.
In other words, {y=y} => {true}.
* If x:=y, then Q. Q.E.D. _*

Can You Use Arithmetic Operators to Flip Between 0 and 1

Is there a way without using logic and bitwise operators, just arithmetic operators, to flip between integers with the value 0 and 1?
ie.
variable ?= variable will make the variable 1 if it 0 or 0 if it is 1.
x = 1 - x
Will switch between 0 and 1.
Edit: I misread the question, thought the OP could use any operator
A Few more...(ignore these)
x ^= 1 // bitwise operator
x = !x // logical operator
x = (x <= 0) // kinda the same as x != 1
Without using an operator?
int arr[] = {1,0}
x = arr[x]
Yet another way:
x = (x + 1) % 2
Assuming that it is initialized as a 0 or 1:
x = 1 - x
Comedy variation on st0le's second method
x = "\1"[x]
Another way to flip a bit.
x = ABS(x - 1) // the absolute of (x - 1)
int flip(int i){
return 1 - i;
};
Just for a bit of variety:
x = 1 / (x + 1);
x = (x == 0);
x = (x != 1);
Not sure whether you consider == and != to be arithmetic operators. Probably not, and obviously although they work in C, more strongly typed languages wouldn't convert the result to integer.
you can simply try this
+(!0) // output:1
+(!1) // output:0
You can use simple:
abs(x-1)
or just:
int(not x)

Python performance: iteration and operations on nested lists

Problem Hey folks. I'm looking for some advice on python performance. Some background on my problem:
Given:
A (x,y) mesh of nodes each with a value (0...255) starting at 0
A list of N input coordinates each at a specified location within the range (0...x, 0...y)
A value Z that defines the "neighborhood" in count of nodes
Increment the value of the node at the input coordinate and the node's neighbors. Neighbors beyond the mesh edge are ignored. (No wrapping)
BASE CASE: A mesh of size 1024x1024 nodes, with 400 input coordinates and a range Z of 75 nodes.
Processing should be O(x*y*Z*N). I expect x, y and Z to remain roughly around the values in the base case, but the number of input coordinates N could increase up to 100,000. My goal is to minimize processing time.
Current results Between my start and the comments below, we've got several implementations.
Running speed on my 2.26 GHz Intel Core 2 Duo with Python 2.6.1:
f1: 2.819s
f2: 1.567s
f3: 1.593s
f: 1.579s
f3b: 1.526s
f4: 0.978s
f1 is the initial naive implementation: three nested for loops.
f2 is replaces the inner for loop with a list comprehension.
f3 is based on Andrei's suggestion in the comments and replaces the outer for with map()
f is Chris's suggestion in the answers below
f3b is kriss's take on f3
f4 is Alex's contribution.
Code is included below for your perusal.
Question How can I further reduce the processing time? I'd prefer sub-1.0s for the test parameters.
Please, keep the recommendations to native Python. I know I can move to a third-party package such as numpy, but I'm trying to avoid any third party packages. Also, I've generated random input coordinates, and simplified the definition of the node value updates to keep our discussion simple. The specifics have to change slightly and are outside the scope of my question.
thanks much!
**`f1` is the initial naive implementation: three nested `for` loops.**
def f1(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
for j in xrange(max(0, topleft[1]), min(topleft[1]+(z*2), y)):
if rows[i][j] <= 255: rows[i][j] += 1
f2 is replaces the inner for loop with a list comprehension.
def f2(x,y,n,z):
rows = [[0]*x for i in xrange(y)]
for i in range(n):
inputX, inputY = (int(x*random.random()), int(y*random.random()))
topleft = (inputX - z, inputY - z)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE: f3 is based on Andrei's suggestion in the comments and replaces the outer for with map(). My first hack at this requires several out-of-local-scope lookups, specifically recommended against by Guido: local variable lookups are much faster than global or built-in variable lookups I hardcoded all but the reference to the main data structure itself to minimize that overhead.
rows = [[0]*x for i in xrange(y)]
def f3(x,y,n,z):
inputs = [(int(x*random.random()), int(y*random.random())) for i in range(n)]
rows = map(g, inputs)
def g(input):
inputX, inputY = input
topleft = (inputX - 75, inputY - 75)
for i in xrange(max(0, topleft[0]), min(topleft[0]+(75*2), 1024)):
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE3: ChristopeD also pointed out a couple improvements.
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
UPDATE4: kriss added a few improvements to f3, replacing min/max with the new ternary operator syntax.
def f3b(x,y,n,z):
rn = random.random
rows = [g1(x, y, z) for x, y in [(int(x*rn()), int(y*rn())) for i in xrange(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
UPDATE5: Alex weighed in with his substantive revision, adding a separate map() operation to cap the values at 255 and removing all non-local-scope lookups. The perf differences are non-trivial.
def f4(x,y,n,z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
Also, since we all seem to be hacking around with variations, here's my test harness to compare speeds: (improved by ChristopheD)
def timing(f,x,y,z,n):
fn = "%s(%d,%d,%d,%d)" % (f.__name__, x, y, z, n)
ctx = "from __main__ import %s" % f.__name__
results = timeit.Timer(fn, ctx).timeit(10)
return "%4.4s: %.3f" % (f.__name__, results / 10.0)
if __name__ == "__main__":
print timing(f, 1024, 1024, 400, 75)
#add more here.
On my (slow-ish;-) first-day Macbook Air, 1.6GHz Core 2 Duo, system Python 2.5 on MacOSX 10.5, after saving your code in op.py I see the following timings:
$ python -mtimeit -s'import op' 'op.f1()'
10 loops, best of 3: 5.58 sec per loop
$ python -mtimeit -s'import op' 'op.f2()'
10 loops, best of 3: 3.15 sec per loop
So, my machine is slower than yours by a factor of a bit more than 1.9.
The fastest code I have for this task is:
def f3(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
inc = (1).__add__
sat = (0xff).__and__
for i in range(n):
inputX, inputY = rr(x), rr(y)
b = max(0, inputX - z)
t = min(inputX + z, x)
l = max(0, inputY - z)
r = min(inputY + z, y)
for i in range(b, t):
rows[i][l:r] = map(inc, rows[i][l:r])
for i in range(x):
rows[i] = map(sat, rows[i])
which times as:
$ python -mtimeit -s'import op' 'op.f3()'
10 loops, best of 3: 3 sec per loop
so, a very modest speedup, projecting to more than 1.5 seconds on your machine - well above the 1.0 you're aiming for:-(.
With a simple C-coded extensions, exte.c...:
#include "Python.h"
static PyObject*
dopoint(PyObject* self, PyObject* args)
{
int x, y, z, px, py;
int b, t, l, r;
int i, j;
PyObject* rows;
if(!PyArg_ParseTuple(args, "iiiiiO",
&x, &y, &z, &px, &py, &rows
))
return 0;
b = px - z;
if (b < 0) b = 0;
t = px + z;
if (t > x) t = x;
l = py - z;
if (l < 0) l = 0;
r = py + z;
if (r > y) r = y;
for(i = b; i < t; ++i) {
PyObject* row = PyList_GetItem(rows, i);
for(j = l; j < r; ++j) {
PyObject* pyitem = PyList_GetItem(row, j);
long item = PyInt_AsLong(pyitem);
if (item < 255) {
PyObject* newitem = PyInt_FromLong(item + 1);
PyList_SetItem(row, j, newitem);
}
}
}
Py_RETURN_NONE;
}
static PyMethodDef exteMethods[] = {
{"dopoint", dopoint, METH_VARARGS, "process a point"},
{0}
};
void
initexte()
{
Py_InitModule("exte", exteMethods);
}
(note: I haven't checked it carefully -- I think it doesn't leak memory due to the correct interplay of reference stealing and borrowing, but it should be code inspected very carefully before being put in production;-), we could do
import exte
def f4(x=x,y=y,n=n,z=z):
rows = [[0]*y for i in range(x)]
rr = random.randrange
for i in range(n):
inputX, inputY = rr(x), rr(y)
exte.dopoint(x, y, z, inputX, inputY, rows)
and the timing
$ python -mtimeit -s'import op' 'op.f4()'
10 loops, best of 3: 345 msec per loop
shows an acceleration of 8-9 times, which should put you in the ballpark you desire. I've seen a comment saying you don't want any third-party extension, but, well, this tiny extension you could make entirely your own;-). ((Not sure what licensing conditions apply to code on Stack Overflow, but I'll be glad to re-release this under the Apache 2 license or the like, if you need that;-)).
1. A (smaller) speedup could definitely be the initialization of your rows...
Replace
rows = []
for i in range(x):
rows.append([0 for i in xrange(y)])
with
rows = [[0] * y for i in xrange(x)]
2. You can also avoid some lookups by moving random.random out of the loops (saves a little).
3. EDIT: after corrections -- you could arrive at something like this:
def f(x,y,n,z):
rows = [[0] * y for i in xrange(x)]
rn = random.random
for i in xrange(n):
topleft = (int(x*rn()) - z, int(y*rn()) - z)
l = max(0, topleft[1])
r = min(topleft[1]+(z*2), y)
for u in xrange(max(0, topleft[0]), min(topleft[0]+(z*2), x)):
rows[u][l:r] = [j+(j<255) for j in rows[u][l:r]]
EDIT: some new timings with timeit (10 runs) -- seems this provides only minor speedups:
import timeit
print timeit.Timer("f1(1024,1024,400,75)", "from __main__ import f1").timeit(10)
print timeit.Timer("f2(1024,1024,400,75)", "from __main__ import f2").timeit(10)
print timeit.Timer("f(1024,1024,400,75)", "from __main__ import f3").timeit(10)
f1 21.1669280529
f2 12.9376120567
f 11.1249599457
in your f3 rewrite, g can be simplified. (Can also be applied to f4)
You have the following code inside a for loop.
l = max(0, topleft[1])
r = min(topleft[1]+(75*2), 1024)
However, it appears that those values never change inside the for loop. So calculate them once, outside the loop instead.
Based on your f3 version I played with the code. As l and r are constants you can avoid to compute them in g1 loop. Also using new ternary if instead of min and max seems to be consistently faster. Also simplified expression with topleft. On my system it appears to be about 20% faster using with the code below.
def f3b(x,y,n,z):
rows = [g1(x, y, z) for x, y in [(int(x*random.random()), int(y*random.random())) for i in range(n)]]
def g1(x, y, z):
l = y - z if y - z > 0 else 0
r = y + z if y + z < 1024 else 1024
for i in xrange(x - z if x - z > 0 else 0, x + z if x + z < 1024 else 1024 ):
rows[i][l:r] = [j+(j<255) for j in rows[i][l:r]]
You can create your own Python module in C, and control the performance as you want:
http://docs.python.org/extending/

Resources