Scala generic math function for Int and BigInt with #specialized(Int)

Scala generic math function for Int and BigInt with #specialized(Int) - performance

I'm new to scala and I wonder whether it is possible to define generic math function that works with both BigInt and Int and in the case of Int the arguments of function will be treated as primitives (without any boxing and unboxing in function body).
So, for example I can do something like
def foo[#specialized(Int) T: Numeric](a: T, b: T) = {
val n = implicitly[Numeric[T]]
import n._
//some code with the use of operators '+-*/'
a * b - a * a + b * b * b
}
//works for primitive Int
val i1 : Int = 1
val i2 : Int = 2
val i3 : Int = foo(i1, i2)
//works for BigInt
val b1 : BigInt = BigInt(1)
val b2 : BigInt = BigInt(2)
val b3 : BigInt = foo(b1, b2)
Here in foo I can use all math operators for both primitive ints and BigInts (that is what I need). However, function foo(Int, Int) compiles to the following:
public int foo$mIc$sp(int a, int b, Numeric<Object> evidence$1) {
Numeric n = (Numeric)Predef..MODULE$.implicitly(evidence$1);
return BoxesRunTime.unboxToInt((Object)n.mkNumericOps(n.mkNumericOps(n.mkNumericOps((Object)BoxesRunTime.boxToInteger((int)a)).$times((Object)BoxesRunTime.boxToInteger((int)b))).$minus(n.mkNumericOps((Object)BoxesRunTime.boxToInteger((int)a)).$times((Object)BoxesRunTime.boxToInteger((int)a)))).$plus(n.mkNumericOps(n.mkNumericOps((Object)BoxesRunTime.boxToInteger((int)b)).$times((Object)BoxesRunTime.boxToInteger((int)b))).$times((Object)BoxesRunTime.boxToInteger((int)b))));
}
instead of plain:
//this is what I really need and expect from `#specialized(Int)`
public int foo$mIc$sp(int a, int b) {
return a * b - a * a + b * b * b;
}
which makes #specialized(Int) useless because the performance is unacceptably low with all these (un)boxings and unnecessary invocations n.mkNumericOps(...).
So, is there a way to implement such generic function as foo that will compile to "just as is" code for primitive types?

The problem is that the Numeric typeclass is not specialized.
If you want to do generic math with high performance, I highly recommend the spire math library.
It has a very elaborate mathematical type class hierarchy, instead of just Numeric.
Here is how your example would look using spire:
import spire.implicits._ // typeclass instances etc.
import spire.syntax._ // syntax such as +-*/
import spire.algebra._ // typeclassses such as Field
def foo[#specialized T: Field](a: T, b: T) = {
//some code with the use of operators '+-*/'
a * b - a * a + b * b * b
}
Here you are saying that there has to be a Field instance for T. Field refers to the algebraic concept.
Spire is highly modular:
spire.algebra contains many well known algebraic concepts such as groups, fields etc, encoded as scala typeclasses
spire.syntax contains the implicit conversions to add operators and other syntax to types for which typeclass instances are available
spire.implicits contains instances for the typeclasses in spire.algebra for common types such as JVM primitives.
This is why you need the three imports.
Regarding the performance: if your code is specialized, and you are using primitives, the performance will be exactly the same as working with primitives directly.
Here is the code of the foo method when specialized for Int:
public int foo$mIc$sp(int, int, spire.algebra.Field<java.lang.Object>);
Code:
0: aload_3
1: aload_3
2: aload_3
3: iload_1
4: iload_2
5: invokeinterface #116, 3 // InterfaceMethod spire/algebra/Field.times$mcI$sp:(II)I
10: aload_3
11: iload_1
12: iload_1
13: invokeinterface #116, 3 // InterfaceMethod spire/algebra/Field.times$mcI$sp:(II)I
18: invokeinterface #119, 3 // InterfaceMethod spire/algebra/Field.minus$mcI$sp:(II)I
23: aload_3
24: aload_3
25: iload_2
26: iload_2
27: invokeinterface #116, 3 // InterfaceMethod spire/algebra/Field.times$mcI$sp:(II)I
32: iload_2
33: invokeinterface #116, 3 // InterfaceMethod spire/algebra/Field.times$mcI$sp:(II)I
38: invokeinterface #122, 3 // InterfaceMethod spire/algebra/Field.plus$mcI$sp:(II)I
43: ireturn
Note that there is no boxing, and the invokeinterface calls will be inlined by the JVM.

Related

Is there any difference in expressiveness between an extern praxi and an extern castfn?

Consider:
#include "share/atspre_staload.hats"
fun only_zero(n: int(0)): void =
println!("This is definitely zero: ", n)
fun less_than{n,m:int | n < m}(n: int(n), m: int(m)): void =
println!(n, " is less than ", m)
implement main0() = (
only_zero(zeroify(n));
only_zero(m);
less_than(b, a);
less_than(f, e) where { val (f, e) = make_less_than((d, c)) };
) where {
val n = 5
val m = ~5
val (a, b, c, d) = (1, 2, 3, 4)
extern castfn zeroify{n:int}(n: int(n)): int(0)
extern praxi lemma_this_is_zero{n:int}(n: int(n)): [n == 0] void
extern castfn make_less_than{n,m:int}(t: (int(n), int(m))): [o,p:int | o < p] (int(o), int(p))
extern praxi lemma_less_than{n,m:int}(n: int(n), m: int(m)): [n < m] void
prval _ = lemma_this_is_zero(m)
}
which has this output:
This is definitely zero: 5
This is definitely zero: -5
2 is less than 1
4 is less than 3
Are there cases that demand one of these over the other?

If you use 'castfn', you need to make sure that there is a corresponding implicit cast function in the target language. For instance, int2double is a castfn if C is the target language.
On the other hand, praxi/prfun is completely erased, having no trace in the generated code.
I would say that praxi/prfun is more general, but int2double is definitely not a praxi/prfun.

How does middle rounds of AES encryption work in Go runtime?

I learn how AES`s implementation work in Go and I do not understand how middle rounds work while encrypt block in https://github.com/golang/go/blob/master/src/crypto/aes/block.go:
// Middle rounds shuffle using tables.
// Number of rounds is set by length of expanded key.
nr := len(xk)/4 - 2 // - 2: one above, one more below
k := 4
for r := 0; r < nr; r++ {
t0 = xk[k+0] ^ te0[uint8(s0>>24)] ^ te1[uint8(s1>>16)] ^ te2[uint8(s2>>8)] ^ te3[uint8(s3)]
t1 = xk[k+1] ^ te0[uint8(s1>>24)] ^ te1[uint8(s2>>16)] ^ te2[uint8(s3>>8)] ^ te3[uint8(s0)]
t2 = xk[k+2] ^ te0[uint8(s2>>24)] ^ te1[uint8(s3>>16)] ^ te2[uint8(s0>>8)] ^ te3[uint8(s1)]
t3 = xk[k+3] ^ te0[uint8(s3>>24)] ^ te1[uint8(s0>>16)] ^ te2[uint8(s1>>8)] ^ te3[uint8(s2)]
k += 4
s0, s1, s2, s3 = t0, t1, t2, t3
}
I understand that this code do SybButes, ShiftRows, MixColumns and AddRoundKey of AES, but I do not understand how this code do it by using
"te0", "te1", "te2", "te3" arrays. It's precomputed arrays which are defined in https://github.com/golang/go/blob/master/src/crypto/aes/const.go.
May someone explain to me how these arrays was precomputed? Thank you so much for you help.

I have found an answer to my question in:
https://crypto.stackexchange.com/questions/19175/efficient-aes-use-of-t-tables
Generating AES (AES-256) Lookup Tables
https://golang.org/src/crypto/aes/aes_test.go

How to add or subtract two enum values in swift

So I have this enum that defines different view positions on a View controller when a side bar menu is presented. I need to add, subtract, multiply, or divide the different values based on different situations. How exactly do I form a method to allow me to use -, +, *, or / operators on the values in the enum. I can find plenty examples that use the compare operator ==. Although I haven't been able to find any that use >=. Which I also need to be able to do.
Here is the enum
enum FrontViewPosition: Int {
case None
case LeftSideMostRemoved
case LeftSideMost
case LeftSide
case Left
case Right
case RightMost
case RightMostRemoved
}
Now I'm trying to use these operators in functions like so.
func getAdjustedFrontViewPosition(_ frontViewPosition: FrontViewPosition, forSymetry symetry: Int) {
var frontViewPosition = frontViewPosition
if symetry < 0 {
frontViewPosition = .Left + symetry * (frontViewPosition - .Left)
}
}
Also in another function like so.
func rightRevealToggle(animated: Bool) {
var toggledFrontViewPosition: FrontViewPosition = .Left
if self.frontViewPosition >= .Left {
toggledFrontViewPosition = .LeftSide
}
self.setFrontViewPosition(toggledFrontViewPosition, animated: animated)
}
I know that i need to directly create the functions to allow me to use these operators. I just don't understand how to go about doing it. A little help would be greatly appreciated.

The type you are trying to define has a similar algebra to pointers in that you can add an offset to a pointer to get a pointer and subtract two pointers to get a difference. Define these two operators on your enum and your other functions will work.
Any operators over your type should produce results in your type. There are different ways to achieve this, depending on your requirements. Here we shall treat your type as a wrap-around ("modulo") one - add 1 to the last literal and you get the first. To do this we use raw values from 0 to n for your types literals and use modulo arithmetic.
First we need a modulo operator which always returns a +ve result, the Swift % can return a -ve one which is not what is required for modulo arithmetic.
infix operator %% : MultiplicationPrecedence
func %%(_ a: Int, _ n: Int) -> Int
{
precondition(n > 0, "modulus must be positive")
let r = a % n
return r >= 0 ? r : r + n
}
Now your enum assigning suitable raw values:
enum FrontViewPosition: Int
{
case None = 0
case LeftSideMostRemoved = 1
case LeftSideMost = 2
case LeftSide = 3
case Left = 4
case Right = 5
case RightMost = 6
case RightMostRemoved = 7
Now we define the appropriate operators.
For addition we can add an integer to a FrontViewPosition and get a FrontViewPosition back. To do this we convert to raw values, add, and then reduce modulo 8 to wrap-around. Note the need for a ! to return a non-optional FrontViewPosition - this will always succeed due to the modulo math:
static func +(_ x : FrontViewPosition, _ y : Int) -> FrontViewPosition
{
return FrontViewPosition(rawValue: (x.rawValue + y) %% 8)!
}
For subtraction we return the integer difference between two FrontViewPosition values:
static func -(_ x : FrontViewPosition, _ y : FrontViewPosition) -> Int
{
return x.rawValue - y.rawValue
}
}
You can define further operators as needed, say a subtraction operator which takes a FrontViewPosition and an Int and returns a FrontViewPosition.
HTH

Enum could have function~
enum Tst:Int {
case A = 10
case B = 20
case C = 30
static func + (t1:Tst,t2:Tst) -> Tst {
return Tst.init(rawValue: t1.rawValue+t2.rawValue)! //here could be wrong!
}
}
var a = Tst.A
var b = Tst.B
var c = a+b

Which condition is technically more efficient, i >= 0 or i > -1?

This kind of usage is common while writing loops.
I was wondering if i >=0 will need more CPU cycles as it has two conditions greater than OR equal to when compared to i > -1. Is one known to be better than the other, and if so, why?

This is not correct. The JIT will implement both tests as a single machine language instruction.
And the number of CPU clock cycles is not determined by the number of comparisons to zero or -1, because the CPU should do one comparison and set flags to indicate whether the result of the comparison is <, > or =.
It's possible that one of those instructions will be more efficient on certain processors, but this kind of micro-optimization is almost always not worth doing. (It's also possible that the JIT - or javac - will actually generate the same instructions for both tests.)

On the contrary, comparsions (including non-strict) with zero takes one CPU instruction less. x86 architecture supports conditional jumps after any arithmetic or loading operation. It is reflected in Java bytecode instruction set, there is a group of instructions to compare the value on the top of the stack and jump: ifeq/ifgt/ifge/iflt/ifle/ifne. (See the full list). Comparsion with -1 requires additional iconst_m1 operation (loading -1 constant onto the stack).
The are two loops with different comparsions:
#GenerateMicroBenchmark
public int loopZeroCond() {
int s = 0;
for (int i = 1000; i >= 0; i--) {
s += i;
}
return s;
}
#GenerateMicroBenchmark
public int loopM1Cond() {
int s = 0;
for (int i = 1000; i > -1; i--) {
s += i;
}
return s;
}
The second version is one byte longer:
public int loopZeroCond();
Code:
0: iconst_0
1: istore_1
2: sipush 1000
5: istore_2
6: iload_2
7: iflt 20 //
10: iload_1
11: iload_2
12: iadd
13: istore_1
14: iinc 2, -1
17: goto 6
20: iload_1
21: ireturn
public int loopM1Cond();
Code:
0: iconst_0
1: istore_1
2: sipush 1000
5: istore_2
6: iload_2
7: iconst_m1 //
8: if_icmple 21 //
11: iload_1
12: iload_2
13: iadd
14: istore_1
15: iinc 2, -1
18: goto 6
21: iload_1
22: ireturn
It is slightly more performant on my machine (to my surprise. I expected JIT to compile these loops into identical assembly.)
Benchmark Mode Thr Mean Mean error Units
t.LoopCond.loopM1Cond avgt 1 0,319 0,004 usec/op
t.LoopCond.loopZeroCond avgt 1 0,302 0,004 usec/op
Сonclusion
Compare with zero whenever sensible.

scala implicit performance

This comes up regularly. Functions coded up using generics are signifficnatly slower in scala. See example below. Type specific version performs about a 1/3 faster than the generic version. This is doubly surprising given that the generic component is outside of the expensive loop. Is there a known explanation for this?
def xxxx_flttn[T](v: Array[Array[T]])(implicit m: Manifest[T]): Array[T] = {
val I = v.length
if (I <= 0) Array.ofDim[T](0)
else {
val J = v(0).length
for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
val flt = Array.ofDim[T](I * J)
for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
flt
}
}
def flttn(v: Array[Array[Double]]): Array[Double] = {
val I = v.length
if (I <= 0) Array.ofDim[Double](0)
else {
val J = v(0).length
for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
val flt = Array.ofDim[Double](I * J)
for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
flt
}
}

You can't really tell what you're measuring here--not very well, anyway--because the for loop isn't as fast as a pure while loop, and the inner operation is quite inexpensive. If we rewrite the code with while loops--the key double-iteration being
var i = 0
while (i<I) {
var j = 0
while (j<J) {
flt(i * J + j) = v(i)(j)
j += 1
}
i += 1
}
flt
then we see that the bytecode for the generic case is actually dramatically different. Non-generic:
133: checkcast #174; //class "[D"
136: astore 6
138: iconst_0
139: istore 5
141: iload 5
143: iload_2
144: if_icmpge 191
147: iconst_0
148: istore 4
150: iload 4
152: iload_3
153: if_icmpge 182
// The stuff above implements the loop; now we do the real work
156: aload 6
158: iload 5
160: iload_3
161: imul
162: iload 4
164: iadd
165: aload_1
166: iload 5
168: aaload // v(i)
169: iload 4
171: daload // v(i)(j)
172: dastore // flt(.) = _
173: iload 4
175: iconst_1
176: iadd
177: istore 4
// Okay, done with the inner work, time to jump around
179: goto 150
182: iload 5
184: iconst_1
185: iadd
186: istore 5
188: goto 141
It's just a bunch of jumps and low-level operations (daload and dastore being the key ones that load and store a double from an array). If we look at the key inner part of the generic bytecode, it instead looks like
160: getstatic #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
163: aload 7
165: iload 6
167: iload 4
169: imul
170: iload 5
172: iadd
173: getstatic #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
176: aload_1
177: iload 6
179: aaload
180: iload 5
182: invokevirtual #107; //Method scala/runtime/ScalaRunTime$.array_apply:(Ljava/lang/Object;I)Ljava/lang/Object;
185: invokevirtual #111; //Method scala/runtime/ScalaRunTime$.array_update:(Ljava/lang/Object;ILjava/lang/Object;)V
188: iload 5
190: iconst_1
191: iadd
192: istore 5
which, as you can see, has to call methods to do the array apply and update. The bytecode for that is a huge mess of stuff like
2: aload_3
3: instanceof #98; //class "[Ljava/lang/Object;"
6: ifeq 18
9: aload_3
10: checkcast #98; //class "[Ljava/lang/Object;"
13: iload_2
14: aaload
15: goto 183
18: aload_3
19: instanceof #100; //class "[I"
22: ifeq 37
25: aload_3
26: checkcast #100; //class "[I"
29: iload_2
30: iaload
31: invokestatic #106; //Method scala/runtime/BoxesRunTime.boxToInteger:
34: goto 183
37: aload_3
38: instanceof #108; //class "[D"
41: ifeq 56
44: aload_3
45: checkcast #108; //class "[D"
48: iload_2
49: daload
50: invokestatic #112; //Method scala/runtime/BoxesRunTime.boxToDouble:(
53: goto 183
which basically has to test each type of array and box it if it's the type you're looking for. Double is pretty near the front (3rd of 10), but it's still a pretty major overhead, even if the JVM can recognize that the code ends up being box/unbox and therefore doesn't actually need to allocate memory. (I'm not sure it can do that, but even if it could it wouldn't solve the problem.)
So, what to do? You can try [#specialized T], which will expand your code tenfold for you, as if you wrote each primitive array operation by yourself. Specialization is buggy in 2.9 (should be less so in 2.10), though, so it may not work the way you hope. If speed is of the essence--well, first, write while loops instead of for loops (or at least compile with -optimise which helps for loops out by a factor of two or so!), and then consider either specialization or writing the code by hand for the types you require.

This is due to boxing, when you apply the generic to a primitive type and use containing arrays (or the type appearing plain in method signatures or as member).
Example
In the following trait, after compilation, the process method will take an erased Array[Any].
trait Foo[A]{
def process(as: Array[A]): Int
}
If you choose A to be a value/primitive type, like Double it has to be boxed. When writing the trait in a non-generic way (e.g. with A=Double), process is compiled to take an Array[Double], which is a distinct array type on the JVM. This is more efficient, since in order to store a Double inside the Array[Any], the Double has to be wrapped (boxed) into an object, a reference to which gets stored inside the array. The special Array[Double] can store the Double directly in memory as a 64-Bit value.
The #specialized-Annotation
If you feel adventerous, you can try the #specialized keyword (it's pretty buggy and crashes the compiler often). This makes scalac compile special versions of a class for all or selected primitive types. This only makes sense, if the type parameter appears plain in type signatures (get(a: A), but not get(as: Seq[A])) or as a type paramter to Array. I think you'll receive a warning if speicialization is pointless.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Scala generic math function for Int and BigInt with #specialized(Int) - performance

Related

Is there any difference in expressiveness between an extern praxi and an extern castfn?

How does middle rounds of AES encryption work in Go runtime?

How to add or subtract two enum values in swift

Which condition is technically more efficient, i >= 0 or i > -1?

scala implicit performance

Categories

Resources