Predict running times of extracted Coq code to Haskell - performance

I have the following version of isPrime written (and proved) in Coq.
It takes around 30 seconds for Compute (isPrime 330)
to finish on my machine.
The extracted Haskell code takes around 1 second to verify that 9767 is prime.
According to a comment in this post,
the timing difference means nothing, but I wonder why is that?
and is there any other way to predict performance when extracting Coq code? after all, sometimes performance does matter, and it's quite hard to change Coq source once you labored to prove it's correct.
Here is my Coq code:
(***********)
(* IMPORTS *)
(***********)
Require Import Coq.Arith.PeanoNat.
(************)
(* helper'' *)
(************)
Fixpoint helper' (p m n : nat) : bool :=
match m with
| 0 => false
| 1 => false
| S m' => (orb ((mult m n) =? p) (helper' p m' n))
end.
(**********)
(* helper *)
(**********)
Fixpoint helper (p m : nat) : bool :=
match m with
| 0 => false
| S m' => (orb ((mult m m) =? p) (orb (helper' p m' m) (helper p m')))
end.
(***********)
(* isPrime *)
(***********)
Fixpoint isPrime (p : nat) : bool :=
match p with
| 0 => false
| 1 => false
| S p' => (negb (helper p p'))
end.
(***********************)
(* Compute isPrime 330 *)
(***********************)
Compute (isPrime 330).
(********************************)
(* Extraction Language: Haskell *)
(********************************)
Extraction Language Haskell.
(***************************)
(* Use Haskell basic types *)
(***************************)
Require Import ExtrHaskellBasic.
(****************************************)
(* Use Haskell support for Nat handling *)
(****************************************)
Require Import ExtrHaskellNatNum.
Extract Inductive Datatypes.nat => "Prelude.Integer" ["0" "succ"]
"(\fO fS n -> if n Prelude.== 0 then fO () else fS (n Prelude.- 1))".
(***************************)
(* Extract to Haskell file *)
(***************************)
Extraction "/home/oren/GIT/CoqIt/FOLDER_2_PRESENTATION/FOLDER_2_EXAMPLES/EXAMPLE_03_PrintPrimes_Performance_Haskell.hs" isPrime.

Your Coq code is using a Peano encoding of the naturals. The evaluation of mult 2 2 literally proceeds by the reduction:
mult (S (S 0)) (S (S 0)))
= (S (S 0)) + mult (S 0) (S (S 0)))
= (S (S 0)) + ((S (S 0)) + mult 0 (S (S 0)))
= (S (S 0)) + ((S (S 0)) + 0)
= (S (S 0)) + ((S 0) + (S 0))
= (S (S 0)) + (0 + (S (S 0))
= (S (S 0)) + (S (S 0))
= (S 0) + (S (S (S 0)))
= 0 + (S (S (S (S 0)))
= (S (S (S (S 0))))
and then checking the equality mult 2 2 =? 5 proceeds by the further reduction:
(S (S (S (S 0)))) =? (S (S (S (S (S 0)))))
(S (S (S 0))) =? (S (S (S (S 0))))
(S (S 0)) =? (S (S (S 0)))
(S 0) =? (S (S 0))
0 =? (S 0)
false
Meanwhile, on the Haskell side, the evaluation of 2 * 2 == 5 proceeds by multiplying two Integers and comparing them to another Integer. This is somewhat faster. ;)
What's incredible here is that Coq's evaluation of isPrime 330 only takes 30 seconds instead of, say, 30 years.
I don't know what to say about predicting the speed of extracted code, except to say that primitive operations on Peano numbers will be massively accelerated, and other code will probably be modestly faster, simply because a lot of work has gone into making GHC generate fast code, and performance hasn't been an emphasis in Coq's development.

Related

How to use Math.Pow with integers in Golang

I keep getting the error "cannot use a (type int) as type float64 in argument to math.Pow, cannot use x (type int) as type float64 in argument to math.Pow,
invalid operation: math.Pow(a, x) % n (mismatched types float64 and int)"
func pPrime(n int) bool {
var nm1 int = n - 1
var x int = nm1/2
a := 1;
for a < n {
if (math.Pow(a, x)) % n == nm1 {
return true
}
}
return false
}
func powInt(x, y int) int {
return int(math.Pow(float64(x), float64(y)))
}
In case you have to reuse it and keep it a little more clean.
If your inputs are int and the output is always expected to be int, then you're dealing with 32-bit numbers. It's more efficient to write your own function to handle this multiplication rather than using math.Pow. math.Pow, as mentioned in the other answers, expects 64-bit values.
Here's a Benchmark comparison for 15^15 (which approaches the upper limits for 32-bit representation):
// IntPow calculates n to the mth power. Since the result is an int, it is assumed that m is a positive power
func IntPow(n, m int) int {
if m == 0 {
return 1
}
result := n
for i := 2; i <= m; i++ {
result *= n
}
return result
}
// MathPow calculates n to the mth power with the math.Pow() function
func MathPow(n, m int) int {
return int(math.Pow(float64(n), float64(m)))
}
The result:
go test -cpu=1 -bench=.
goos: darwin
goarch: amd64
pkg: pow
BenchmarkIntPow15 195415786 6.06 ns/op
BenchmarkMathPow15 40776524 27.8 ns/op
I believe the best solution is that you should write your own function similar to IntPow(m, n int) shown above. My benchmarks show that it runs more than 4x faster on a single CPU core compared to using math.Pow.
Since nobody mentioned an efficient way (logarithmic) to do Pow(x, n) for integers x and n is as follows if you want to implement it yourself:
// Assumption: n >= 0
func PowInts(x, n int) int {
if n == 0 { return 1 }
if n == 1 { return x }
y := PowInts(x, n/2)
if n % 2 == 0 { return y*y }
return x*y*y
}
If you want the exact exponentiation of integers, use (*big.Int).Exp. You're likely to overflow int64 pretty quickly with powers larger than two.

Porting MeiYan hash function to Go

I wanted to port a state-of-the-art hash function MeiYan from C to Go. (As far as I know this is one of the best if not just the best hash function for hash tables in terms of speed and collision rate, it beats MurMur at least.)
I am new to Go, just spent one weekend with it, and came up with this version:
func meiyan(key *byte, count int) uint32 {
type P *uint32;
var h uint32 = 0x811c9dc5;
for ;count >= 8; {
a := ((*(*uint32)(unsafe.Pointer(key))) << 5)
b := ((*(*uint32)(unsafe.Pointer(key))) >> 27)
c := *(*uint32)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 8))
}
if (count & 4) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 2) != 0 {
h = (h ^ uint32(*(*uint16)(unsafe.Pointer(key)))) * 0xad3e7
key = (*byte)(unsafe.Pointer(uintptr(unsafe.Pointer(key)) + 2))
}
if (count & 1) != 0 {
h = (h ^ uint32(*key));
h = h * 0xad3e7
}
return h ^ (h >> 16);
}
Looks messy, but I do not think I can make it look better. Now I measure the speed and it is frustratingly slow, 3 times slower than C/C++ when compiled with gccgo -O3. Can this be made faster? Is this just as good as compiler can make it or unsafe.Pointer conversion is just as slow as it gets? In fact this surprised me, because I have seen that some other number crunching style code was just as fast as C or even faster. Am I doing something inneficiently here?
Here is the original C code I am porting from:
u32 meiyan(const char *key, int count) {
typedef u32* P;
u32 h = 0x811c9dc5;
while (count >= 8) {
h = (h ^ ((((*(P)key) << 5) | ((*(P)key) >> 27)) ^ *(P)(key + 4))) * 0xad3e7;
count -= 8;
key += 8;
}
#define tmp h = (h ^ *(u16*)key) * 0xad3e7; key += 2;
if (count & 4) { tmp tmp }
if (count & 2) { tmp }
if (count & 1) { h = (h ^ *key) * 0xad3e7; }
#undef tmp
return h ^ (h >> 16);
}
Here is how I measure speed:
func main(){
T := time.Now().UnixNano()/1e6
buf := []byte("Hello World!")
var controlSum uint64 = 0
for x := 123; x < 1e8; x++ {
controlSum += uint64(meiyan(&buf[0], 12))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
fmt.Println("controlSum:", controlSum)
}
After some careful research I found out why my code was slow, and improved it so it is now faster than the C version in my tests:
package main
import (
"fmt"
"time"
"unsafe"
)
func meiyan(key *byte, count int) uint32 {
type un unsafe.Pointer
type p32 *uint32
type p16 *uint16
type p8 *byte
var h uint32 = 0x811c9dc5;
for ;count >= 8; {
a := *p32(un(key)) << 5
b := *p32(un(key)) >> 27
c := *p32(un(uintptr(un(key)) + 4))
h = (h ^ ((a | b) ^ c)) * 0xad3e7
count -= 8
key = p8(un(uintptr(un(key)) + 8))
}
if (count & 4) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count & 2) != 0 {
h = (h ^ uint32(*p16(un(key)))) * 0xad3e7
key = p8(un(uintptr(un(key)) + 2))
}
if (count & 1) != 0 {
h = h ^ uint32(*key)
h = h * 0xad3e7
}
return h ^ (h >> 16);
}
func main() {
T := time.Now().UnixNano()/1e6
buf := []byte("ABCDEFGHABCDEFGH")
var controlSum uint64 = 0
start := &buf[0]
size := len(buf)
for x := 123; x < 1e8; x++ {
controlSum += uint64(meiyan(start, size))
}
fmt.Println(time.Now().UnixNano()/1e6 - T, "ms")
fmt.Println("controlSum:", controlSum)
}
The hash function itself was already fast, but dereferencing the array on each iteration is what made it slow: &buf[0] was replaced with start := &buf[0] and then use start on each iteration.
The implementation from NATS looks impressive! On my machine, for a data of length 30 (bytes) op/sec 157175656.56 and nano-sec/op 6.36! Take a look at it. You might find some ideas.

How to get the quotient and remainder effectively without using "/" and "%"?

I have implemented a simple function which returns the quotient and remainder when the divisor is the power of 10:
func getQuotientAndRemainder(num int64, digits uint) (int64, int64) {
divisor := int64(math.Pow(10, float64(digits)))
if num >= divisor {
return num / divisor, num % divisor
} else {
return 0, num
}
}
Just curious, except using directly / and % operators, is there any better algorithm to get the the quotient and remainder? Or only in the case when the divisor is the power of 10?
return num / divisor, num % divisor
The "algorithm" is sound and written in arguably the best way possible: expressively. If anything, this part of your code may be overly complicated:
int64(math.Pow(10, float64(digits)))
Converting to and from float64 is arguably sub-optimal. Also, 10 to the power of anything greater than 18 will overflow int64. I suggest you add a sanity check and replace the code with a multiplying loop and measure its performance.
But then: if performance is your concern, just implement it in assembly.
Obviously, you should run some Go benchmarks: Benchmarks, Package testing.
Your solution doesn't look very efficient. Try this:
package main
import "fmt"
func pow(base, exp int64) int64 {
p := int64(1)
for exp > 0 {
if exp&1 != 0 {
p *= base
}
exp >>= 1
base *= base
}
return p
}
func divPow(n, base, exp int64) (q int64, r int64) {
p := pow(base, exp)
q = n / p
r = n - q*p
return q, r
}
func main() {
fmt.Println(divPow(42, 10, 1))
fmt.Println(divPow(-42, 10, 1))
}
Output:
4 2
-4 -2
Benchmark:
BenchmarkDivPow 20000000 77.4 ns/op
BenchmarkGetQuotientAndRemainder 5000000 296 ns/op

Is there a pow method in bigInt package in Go

I am looking at the documentation of a big integer arithmetic in Go and trying to find a method suitable for calculation of a^n (something like pow(a, n) in python).
To my surprise among some straightforward functions like GCD, Binomial and not really straightforward as modinverse I can not find pow. Am I missing it or should I write my own?
func (z *Int) Exp(x, y, m *Int) *Int
Exp sets z = x^y mod |m| (i.e. the sign of m is ignored), and returns z. If y <= 0, the result is 1 mod |m|; if m == nil or m == 0, z = x^y. See Knuth, volume 2, section 4.6.3.
Because I almost finished my own implementation (Daniel's recommendation does not work, because you always have to provide a modulo there) I am adding it here in case someone would like to see how it might be implemented efficiently. Here is Go Playground and my function:
func powBig(a, n int) *big.Int{
tmp := big.NewInt(int64(a))
res := big.NewInt(1)
for n > 0 {
temp := new(big.Int)
if n % 2 == 1 {
temp.Mul(res, tmp)
res = temp
}
temp = new(big.Int)
temp.Mul(tmp, tmp)
tmp = temp
n /= 2
}
return res
}

A generic quicksort in Scala

I've been playing around with Scala recently and was thinking about how to implement a generic version of quicksort in it (just to get a better feeling for the language)
I came up with something like this
object Main {
def qs[T](a: List[T], f: (T, T) => Boolean): List[T] = {
if (a == Nil) return a
val (l, g) = a drop 1 partition (f(a(0),(_:T)))
qs(l, f) ::: List(a(0)) ::: qs(g, f)
}
def main(args: Array[String]): Unit = {
val a = List(5,3,2,1,7,8,9,4,6)
val qsInt = qs(_: List[Int], (_: Int) > (_: Int))
println(qsInt(a))
}
}
This is not as generic as I wanted it to be, since I have to explicitly state how to order the elements rather then just doing something like
val (l, g) = a drop 1 partition (a(0) >)
How can I tell the compiler that T only needs to implement the greater-than operator to be sortable by this function?
Regards
def qsort[T <% Ordered[T]](list: List[T]): List[T] = {
list match {
case Nil => Nil
case x::xs =>
val (before, after) = xs partition (_ < x)
qsort(before) ++ (x :: qsort(after))
}
}
Since Roger covered the Ordered case, let me cover Ordering:
def qsort[T](list: List[T])(implicit ord: Ordering[T]): List[T] = list match {
// import ord._ // enables "_ < x" syntax
case Nil => Nil
case x :: xs =>
val (before, after) = xs partition (ord.lt(_, x))
qsort(before) ::: x :: qsort(after)
}
Using Ordering has two main advantages:
The T type does not need to have been created as Ordered.
One can easily provide alternate orderings.
For instance, on Scala 2.8:
def sortIgnoreCase(strs: List[String]) = {
val myOrdering = Ordering.fromLessThan { (x: String, y: String) =>
x.toLowerCase < y.toLowerCase
}
qsort(strs)(myOrdering)
}

Resources