Performance penalty using anonymous function in Julia - performance

I have noticed that there is a performance penalty associated with using anonymous functions in Julia. To illustrate I have two implementations of quicksort (taken from the micro performance benchmarks in the Julia distribution). The first sorts in ascending order
function qsort!(a,lo,hi)
i, j = lo, hi
while i < hi
pivot = a[(lo+hi)>>>1]
while i <= j
while a[i] < pivot; i += 1; end
while pivot < a[j]; j -= 1; end
if i <= j
a[i], a[j] = a[j], a[i]
i, j = i+1, j-1
end
end
if lo < j; qsort!(a,lo,j); end
lo, j = i, hi
end
return a
end
The second takes an additional parameter: an anonymous function that can be used to specify ascending or descending sort, or comparison for more exotic types
function qsort_generic!(a,lo,hi,op=(x,y)->x<y)
i, j = lo, hi
while i < hi
pivot = a[(lo+hi)>>>1]
while i <= j
while op(a[i], pivot); i += 1; end
while op(pivot, a[j]); j -= 1; end
if i <= j
a[i], a[j] = a[j], a[i]
i, j = i+1, j-1
end
end
if lo < j; qsort_generic!(a,lo,j,op); end
lo, j = i, hi
end
return a
end
There is a significant performance penalty when sorting Arrays of Int64, with the default version an order of magnitude faster. Here are times for sorting arrays of length N in seconds:
N qsort_generic qsort
2048 0.00125 0.00018
4096 0.00278 0.00029
8192 0.00615 0.00061
16384 0.01184 0.00119
32768 0.04482 0.00247
65536 0.07773 0.00490
The question is: Is this due to limitations in the compiler that will be ironed out in time, or is there an idiomatic way to pass functors/anonymous functions that should be used in cases like this?
update From the answers it looks like this is something that will be fixed up in the compiler.
In the mean time, there were two suggested work arounds. Both approaches are fairly straightforward, though they do start to feel like the sort of jiggery-pokery that you have to use in C++ (though not on the same scale of awkward).
The first is the FastAnon package suggested by #Toivo Henningsson. I didn't try this approach, but it looks good.
I tried out the second method suggested by #simonstar, which gave me equivalent performance to the non-generic qsort! implementation:
abstract OrderingOp
immutable AscendingOp<:OrderingOp end
immutable DescendingOp<:OrderingOp end
evaluate(::AscendingOp, x, y) = x<y
evaluate(::DescendingOp, x, y) = x>y
function qsort_generic!(a,lo,hi,op=AscendingOp())
i, j = lo, hi
while i < hi
pivot = a[(lo+hi)>>>1]
while i <= j
while evaluate(op, a[i], pivot); i += 1; end
while evaluate(op, pivot, a[j]); j -= 1; end
if i <= j
a[i], a[j] = a[j], a[i]
i, j = i+1, j-1
end
end
if lo < j; qsort_generic!(a,lo,j,op); end
lo, j = i, hi
end
return a
end
Thanks everyone for the help.

It's a problem and will be fixed with an upcoming type system overhaul.
Update: This has now been fixed in the 0.5 version of Julia.

As others have noted, the code you've written is idiomatic Julia and will someday be fast, but the compiler isn't quite there yet. Besides using FastAnonymous, another option is to pass types instead of anonymous functions. For this pattern, you define an immutable with no fields and a method (let's call it evaluate) that accepts an instance of the type and some arguments. Your sorting function would then accept an op object instead of a function and call evaluate(op, x, y) instead of op(x, y). Because functions are specialized on their input types, there is no runtime overhead to the abstraction. This is the basis for reductions and specification of sort order in the standard library, as well as NumericExtensions.
For example:
immutable AscendingSort; end
evaluate(::AscendingSort, x, y) = x < y
function qsort_generic!(a,lo,hi,op=AscendingSort())
i, j = lo, hi
while i < hi
pivot = a[(lo+hi)>>>1]
while i <= j
while evaluate(op, a[i], pivot); i += 1; end
while evaluate(op, pivot, a[j]); j -= 1; end
if i <= j
a[i], a[j] = a[j], a[i]
i, j = i+1, j-1
end
end
if lo < j; qsort_generic!(a,lo,j,op); end
lo, j = i, hi
end
return a
end

Yes, it's due to limitations in the compiler, and there are plans to fix it, see e.g. this issue. In the meantime, the FastAnonymous package might provide a workaround.
The way that you have done it looks pretty idiomatic, there's unfortunately no magic trick that you are missing (except for possibly the FastAnonymous package).

Related

Calculating Big "O" for the following example

Let's say I have a following code sample:
int number;
for(int i = 0; i < A; i++)
for(int j = 0; j < B; j++)
if(i == j) // some condition...
do{
number = rand();
}while(number > 100);
I would like to know the Big "O" for this example. Outer loops are O(A * B), but I'm not sure what to think about the do-while loop and it's Big "O". In the worst case scenario it can be an infinite loop and in the best case O(1) and ignored.
Edit: updated condition inside the if statement (replaced function call with a simple comparison).
While rand() is a random function and it has a specified range of output, we can say the do while statement is O(1).
So, it depends on the someCondition() function.
Total complexity is O(A * B) * O(someCondition).

Invert Arnold's Cat map - negative array indexes

I'm trying to implement Arnold's Cat map for N*N images using the following formula
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
desMatrix[(i + j) % N][(i + 2 * j) % N] = srcMatrix[i][j];
}
}
To invert the process I do:
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
srcMatrix[(j-i) % N][(2*i-j) % N] = destMatrix[i][j];
}
}
Is the implementation correct?
It seems to me that for certain values of j and i I might get negative indexes from (j-i) and (2*i-j); how should I handle those cases, since matrix indexes are only positive?
In general, when a modulo (%) operation needs to work on negative indexes, you can simply add the modulo argument as many times as it's needed. Since
x % N == ( x + a*N ) % N
for all natural a's, and in this case you have i and j constrained in [0, N), then you can write (N + i - j) and ensure that even if i is 0 and j is N-1 (or even N for that matter), the result will always be non-negative. By the same token, (2*N + i - 2*j) or equivalently (i + 2*(N-j)) is always non-negative.
In this case, though, this is not necessary. To invert your map, you would repeat the forward step reversing the assignments. Since the matrix has unary determinant and is area-preserving, you're assured that you'll get all your points eventually (i.e. covering M(i+1) will yield a covering of M(i)).
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
newMatrix[i][j] = desMatrix[(i + j) % N][(i + 2 * j) % N];
}
}
At this point newMatrix and srcMatrix ought to be identical.
(Actually, you're already running the reverse transformation as your forward one. The one I set up to reverse yours is the one commonly used form for the forward transformation).

Selection Sort in Dafny

I am trying to implement selection sort in Dafny.
My sorted and FindMin functions do work, but selectionsort itself contains assertions which Dafny will not prove, even if they are correct.
Here is my program:
predicate sorted(a:array<int>,i:int)
requires a != null;
requires 0 <= i <= a.Length;
reads a;
{
forall k :: 0 < k < i ==> a[k-1] < a[k]
}
method FindMin(a:array<int>,i:int) returns(m:int)
requires a != null;
requires 0 <= i < a.Length;
ensures i <= m < a.Length;
ensures forall k :: i <= k < a.Length ==> a[k] >= a[m];
{
var j := i;
m := i;
while(j < a.Length)
decreases a.Length - j;
invariant i <= j <= a.Length;
invariant i <= m < a.Length;
invariant forall k :: i <= k < j ==> a[k] >= a[m];
{
if(a[j] < a[m]){m := j;}
j := j + 1;
}
}
method selectionsort(a:array<int>) returns(s:array<int>)
requires a != null;
modifies a;
ensures s != null;
ensures sorted(s,s.Length);
{
var c,m := 0,0;
var t;
s := a;
assert s != null;
assert s.Length == a.Length;
while(c<s.Length)
decreases s.Length-c;
invariant 0 <= c <= s.Length;
invariant c-1 <= m <= s.Length;
invariant sorted(s,c);
{
m := FindMin(s,c);
assert forall k :: c <= k < s.Length ==> s[k] >= s[m];
assert forall k :: 0 <= k < c ==> s[k] <= s[m];
assert s[c] >= s[m];
t := s[c];
s[m] := t;
s[c] := s[m];
assert s[m] >= s[c];
assert forall k :: c <= k < s.Length ==> s[k] >= s[c];
c := c+1;
assert c+1 < s.Length ==> s[c-1] <= s[c];
}
}
Why is this wrong? What does "postcondtion may not hold" mean? Could Dafny give an counter-example?
You seem to understand the basic idea behind loop invariants, which is needed to verify programs using Dafny.
Your program is not correct. One way to discover this is to use the verification debugger inside the Dafny IDE in Visual Studio. Click on the last error reported (the assertion on the line before the increment of c) and you will see that the upper half of the array contains an element that is smaller than both s[c] and s[m]. Then select the program points around your 3-statement swap operation and you will notice that your swap does not actually swap.
To fix the swap, exchange the second and third statement of the 3-statement swap. Better yet, make use of Dafny's multiple assignment statement, which makes the code easier to get right:
s[c], s[m] := s[m], s[c];
There are two other problems. One is that the second assertion inside the loop does not verify:
assert forall k :: 0 <= k < c ==> s[k] <= s[m];
While s[m] is the smallest element in the upper part of the array, the loop invariant needs to document that the elements in the lower part of the array are no greater than the elements in the upper part--an essential property of the selection sort algorithm. The following loop invariant does the trick:
invariant forall k, l :: 0 <= k < c <= l < a.Length ==> s[k] <= s[l];
Finally, the complaint about the property sorted(s,c) not being maintained by the loop stems from the fact that you defined sorted as strictly increasing, which swapping will never achieve unless the array's elements are initially all distinct. Dafny thus points out a design decision that you have to make about your sorting routine. You can either decide that your selectionsort method will apply only to arrays with no duplicate elements, which you do by adding
forall k, l :: 0 <= k < l < a.Length ==> a[k] != a[l];
as a precondition to (and loop invariant in) selectionsort. Or, more conventionally, you can fix your definition of sorted to replace a[k] > a[m] with a[k] >= a[m].
To clean up your code a little, you can now delete all assert statements and the declaration of t. Since m is used only inside the loop, you can move the declaration of m to the statement that calls FindMin, which also makes it evident that the loop invariant c-1 <= m <= s.Length is not needed. The two decreases clauses can be omitted; for your program, Dafny will supply these automatically. Lastly, your selectionsort method modifies the given array in place, so there is no real reason to return the reference a in the out-parameter s; instead, you can just omit the out-parameter and replace s by a everywhere.

Using multiple genvar in Verilog loop

Is It possible using different "genvar" in a loop? Is there an alternative mode to realize it?
I try with this example:
genvar i;
genvar j;
genvar k;
generate
k=0;
for (i = 0; i < N; i = i + 1)
begin: firstfor
for (j = 0; j < N; j = j + 1)
begin: secondfor
if(j>i)
begin
assign a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
endgenerate
And when I run "Check Syntax" display this error:
Syntax error near "=". (k=k+1)
I like this question because unless very familiar with generates it looks like it should work, however there is a similar question which tries to use an extra genvar.
The syntax is not allowed because of how the generates are unrolled. Integers can only be used inside always/initial processes.
If it is just combinatorial wiring rather than parametrised instantiation you might be able to do what you need just using integers (I would not normally recommend this):
integer i;
integer j;
integer k;
localparam N = 2;
reg [N*N:0] a ;
reg [N*N:0] in ;
reg [N*N:0] p ;
always #* begin
k=0;
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = 0; j < N; j = j + 1) begin: secondfor
if(j>i) begin
a[i+j*N] = in[i] && p[k];
k=k+1;
end
end
end
end
Not sure how synthesis will like this but the assignments are static, it might work.
It is possible to avoid always #* when you want to do more advanced math with genvar loops. Use localparam and a function.
Make k a localparam derived from the genvars with a function, and use k as originally intended.
The getk function seems to violate the principles of code reuse by basically recreating the loops from the generate block, but getk allows each unrolled loop iteration to derive the immutable localparam k from genvars i and j. There is no separate accumulating variable k that is tracked across all the unrolled loops. Both iverilog and ncvlog are happy with this.
(Note that the original example could optimize with j=i+1 as well, but there is still an issue with deriving k.)
module top();
localparam N=4;
function automatic integer getk;
input integer istop;
input integer jstop;
integer i,j,k;
begin
k=0;
for (i=0; i<=istop; i=i+1) begin: firstfor
for (j=i+1; j<((i==istop)? jstop : N); j=j+1) begin: secondfor
k=k+1;
end
end
getk=k;
end
endfunction
genvar i,j;
generate
for (i = 0; i < N; i = i + 1) begin: firstfor
for (j = i+1; j < N; j = j + 1) begin: secondfor
localparam k = getk(i,j);
initial $display("Created i=%0d j=%0d k=%0d",i,j,k);
end
end
endgenerate
endmodule
Output:
$ iverilog tmptest.v
$ ./a.out
Created i=0 j=1 k=0
Created i=0 j=2 k=1
Created i=0 j=3 k=2
Created i=1 j=2 k=3
Created i=1 j=3 k=4
Created i=2 j=3 k=5
I discovered the 'trick' of using functions to derive values from the genvars here:
https://electronics.stackexchange.com/questions/53327/generate-gate-with-a-parametrized-number-of-inputs

Efficient Mergesort Confusion

I have been reading a mergesort example (the efficient one) since yesterday and I still can't understand how it works despite looking at the code:
private static void sort(int[] list) {
a = list;
int n = a.length;
// according to variant either/or:
b = new int[n];
b = new int[(n + 1) / 2];
mergesort(0, n - 1);
}
private static void mergesort(int first, int last) {
if (first < last) {
int mid = (first + last) / 2;
mergesort(first, mid);
mergesort(mid + 1, last);
merge(first, mid, last);
}
}
No problem understanding the algorithm up until this point but the confusion is in the following method:
private static void merge(int first, int mid, int last) {
int i, j, k;
i = 0;
j = first;
while (j <= mid)
b[i++] = a[j++]; // *j's value is now mid*
i = 0; // *i is reset to 0, nothing's been done to j*
k = first;
// *before entering the following while loop, j still carries mid's value*
while (k < j && j <= last)
if (b[i] <= a[j])
a[k++] = b[i++];
else
a[k++] = a[j++];
// copy back remaining elements of first half (if any)
while (k < j)
a[k++] = b[i++];
}
Entering the second while loop while (k < j && j <= last) is where I don't understand how this sorting works. From what I understood, the first half of the array a is already copied to the auxiliary array b, and now we want to arrange the entire array by comparing a[j++] (the second half) to the auxiliary array b[i++] so that we can get the smaller array element and place it in array a to sort the array in ascending order.
But why while (k < j && j <= last)? k < j sounds logical enough because we need to get all the values back from the auxiliary array but why j <= last? And why can't we just do while (k <= last) ?
And also, could somebody please affirm that my understanding of j's value in the above code is correct?
k < j denotes that auxillary array b still contains elements
j <= last denotes that the second part of a still contains elements
We cannot use k <= last here, because we may access array a indexes beyond the border, when j becomes last+1
Too long for comments, added here:
This variant is useful when available memory is limited (large dataset). It is mentioned in some tutorials (I've met it in J.Bucknall book about algorithms in Delphi). It is stable ( if (b[i] <= a[j]) holds stability. It is usually not faster, because it is better not to copy data at all , but, for example, 'trigger' source and destination array (pointers) at every stage

Resources