How is pattern matching in Scala implemented at the bytecode level? - performance

How is pattern matching in Scala implemented at the bytecode level?
Is it like a series of if (x instanceof Foo) constructs, or something else? What are its performance implications?
For example, given the following code (from Scala By Example pages 46-48), how would the equivalent Java code for the eval method look like?
abstract class Expr
case class Number(n: Int) extends Expr
case class Sum(e1: Expr, e2: Expr) extends Expr
def eval(e: Expr): Int = e match {
case Number(x) => x
case Sum(l, r) => eval(l) + eval(r)
}
P.S. I can read Java bytecode, so a bytecode representation would be good enough for me, but probably it would be better for the other readers to know how it would look like as Java code.
P.P.S. Does the book Programming in Scala give an answer to this and similar questions about how Scala is implemented? I have ordered the book, but it has not yet arrived.

The low level can be explored with a disassembler but the short answer is that it's a bunch of if/elses where the predicate depends on the pattern
case Sum(l,r) // instance of check followed by fetching the two arguments and assigning to two variables l and r but see below about custom extractors
case "hello" // equality check
case _ : Foo // instance of check
case x => // assignment to a fresh variable
case _ => // do nothing, this is the tail else on the if/else
There's much more that you can do with patterns like or patterns and combinations like "case Foo(45, x)", but generally those are just logical extensions of what I just described. Patterns can also have guards, which are additional constraints on the predicates. There are also cases where the compiler can optimize pattern matching, e.g when there's some overlap between cases it might coalesce things a bit. Advanced patterns and optimization are an active area of work in the compiler, so don't be surprised if the byte code improves substantially over these basic rules in current and future versions of Scala.
In addition to all that, you can write your own custom extractors in addition to or instead of the default ones Scala uses for case classes. If you do, then the cost of the pattern match is the cost of whatever the extractor does. A good overview is found in http://lamp.epfl.ch/~emir/written/MatchingObjectsWithPatterns-TR.pdf

James (above) said it best. However, if you're curious it's always a good exercise to look at the disassembled bytecode. You can also invoke scalac with the -print option, which will print your program with all Scala-specific features removed. It's basically Java in Scala's clothing. Here's the relevant scalac -print output for the code snippet you gave:
def eval(e: Expr): Int = {
<synthetic> val temp10: Expr = e;
if (temp10.$isInstanceOf[Number]())
temp10.$asInstanceOf[Number]().n()
else
if (temp10.$isInstanceOf[Sum]())
{
<synthetic> val temp13: Sum = temp10.$asInstanceOf[Sum]();
Main.this.eval(temp13.e1()).+(Main.this.eval(temp13.e2()))
}
else
throw new MatchError(temp10)
};

Since version 2.8, Scala has had the #switch annotation. The goal is to ensure, that pattern matching will be compiled into tableswitch or lookupswitch instead of series of conditional if statements.

To expand on #Zifre's comment: if you are reading this in the future and the scala compiler has adopted new compilation strategies and you want to know what they are, here's how you find out what it does.
Copy-paste your match code into a self-contained example file. Run scalac on that file. Then run javap -v -c theClassName$.class.
For example, I put the following into /tmp/question.scala:
object question {
abstract class Expr
case class Number(n: Int) extends Expr
case class Sum(e1: Expr, e2: Expr) extends Expr
def eval(e: Expr): Int = e match {
case Number(x) => x
case Sum(l, r) => eval(l) + eval(r)
}
}
Then I ran scalac question.scala, which produced a bunch of *.class files. Poking around a bit, I found the match statement inside question$.class. The javap -c -v question$.class output is available below.
Since we're looking for a condition control flow construct, knowing about the java bytecode instruction set suggests that looking for "if" should be a good place to start.
In two locations we find a pair of consecutive lines on the form isinstanceof <something>; ifeq <somewhere>, which means: if the most recently computed value is not an instance of something then goto somewhere. (ifeq is jump if zero, and isinstanceof gives you a zero to represent false.)
If you follow the control flow around, you'll see that it agrees with the answer given by #Jorge Ortiz: we do if (blah isinstanceof something) { ... } else if (blah isinstanceof somethingelse) { ... }.
Here is the javap -c -v question$.class output:
Classfile /tmp/question$.class
Last modified Nov 20, 2020; size 956 bytes
MD5 checksum cfc788d4c847dad0863a797d980ad2f3
Compiled from "question.scala"
public final class question$
minor version: 0
major version: 50
flags: (0x0031) ACC_PUBLIC, ACC_FINAL, ACC_SUPER
this_class: #2 // question$
super_class: #4 // java/lang/Object
interfaces: 0, fields: 1, methods: 3, attributes: 4
Constant pool:
#1 = Utf8 question$
#2 = Class #1 // question$
#3 = Utf8 java/lang/Object
#4 = Class #3 // java/lang/Object
#5 = Utf8 question.scala
#6 = Utf8 MODULE$
#7 = Utf8 Lquestion$;
#8 = Utf8 <clinit>
#9 = Utf8 ()V
#10 = Utf8 <init>
#11 = NameAndType #10:#9 // "<init>":()V
#12 = Methodref #2.#11 // question$."<init>":()V
#13 = Utf8 eval
#14 = Utf8 (Lquestion$Expr;)I
#15 = Utf8 question$Number
#16 = Class #15 // question$Number
#17 = Utf8 n
#18 = Utf8 ()I
#19 = NameAndType #17:#18 // n:()I
#20 = Methodref #16.#19 // question$Number.n:()I
#21 = Utf8 question$Sum
#22 = Class #21 // question$Sum
#23 = Utf8 e1
#24 = Utf8 ()Lquestion$Expr;
#25 = NameAndType #23:#24 // e1:()Lquestion$Expr;
#26 = Methodref #22.#25 // question$Sum.e1:()Lquestion$Expr;
#27 = Utf8 e2
#28 = NameAndType #27:#24 // e2:()Lquestion$Expr;
#29 = Methodref #22.#28 // question$Sum.e2:()Lquestion$Expr;
#30 = NameAndType #13:#14 // eval:(Lquestion$Expr;)I
#31 = Methodref #2.#30 // question$.eval:(Lquestion$Expr;)I
#32 = Utf8 scala/MatchError
#33 = Class #32 // scala/MatchError
#34 = Utf8 (Ljava/lang/Object;)V
#35 = NameAndType #10:#34 // "<init>":(Ljava/lang/Object;)V
#36 = Methodref #33.#35 // scala/MatchError."<init>":(Ljava/lang/Object;)V
#37 = Utf8 this
#38 = Utf8 e
#39 = Utf8 Lquestion$Expr;
#40 = Utf8 x
#41 = Utf8 I
#42 = Utf8 l
#43 = Utf8 r
#44 = Utf8 question$Expr
#45 = Class #44 // question$Expr
#46 = Methodref #4.#11 // java/lang/Object."<init>":()V
#47 = NameAndType #6:#7 // MODULE$:Lquestion$;
#48 = Fieldref #2.#47 // question$.MODULE$:Lquestion$;
#49 = Utf8 question
#50 = Class #49 // question
#51 = Utf8 Sum
#52 = Utf8 Expr
#53 = Utf8 Number
#54 = Utf8 Code
#55 = Utf8 LocalVariableTable
#56 = Utf8 LineNumberTable
#57 = Utf8 StackMapTable
#58 = Utf8 SourceFile
#59 = Utf8 InnerClasses
#60 = Utf8 ScalaInlineInfo
#61 = Utf8 Scala
{
public static final question$ MODULE$;
descriptor: Lquestion$;
flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL
public static {};
descriptor: ()V
flags: (0x0009) ACC_PUBLIC, ACC_STATIC
Code:
stack=1, locals=0, args_size=0
0: new #2 // class question$
3: invokespecial #12 // Method "<init>":()V
6: return
public int eval(question$Expr);
descriptor: (Lquestion$Expr;)I
flags: (0x0001) ACC_PUBLIC
Code:
stack=3, locals=9, args_size=2
0: aload_1
1: astore_2
2: aload_2
3: instanceof #16 // class question$Number
6: ifeq 27
9: aload_2
10: checkcast #16 // class question$Number
13: astore_3
14: aload_3
15: invokevirtual #20 // Method question$Number.n:()I
18: istore 4
20: iload 4
22: istore 5
24: goto 69
27: aload_2
28: instanceof #22 // class question$Sum
31: ifeq 72
34: aload_2
35: checkcast #22 // class question$Sum
38: astore 6
40: aload 6
42: invokevirtual #26 // Method question$Sum.e1:()Lquestion$Expr;
45: astore 7
47: aload 6
49: invokevirtual #29 // Method question$Sum.e2:()Lquestion$Expr;
52: astore 8
54: aload_0
55: aload 7
57: invokevirtual #31 // Method eval:(Lquestion$Expr;)I
60: aload_0
61: aload 8
63: invokevirtual #31 // Method eval:(Lquestion$Expr;)I
66: iadd
67: istore 5
69: iload 5
71: ireturn
72: new #33 // class scala/MatchError
75: dup
76: aload_2
77: invokespecial #36 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
80: athrow
LocalVariableTable:
Start Length Slot Name Signature
0 81 0 this Lquestion$;
0 81 1 e Lquestion$Expr;
20 61 4 x I
47 34 7 l Lquestion$Expr;
54 27 8 r Lquestion$Expr;
LineNumberTable:
line 6: 0
line 7: 2
line 8: 27
line 6: 69
StackMapTable: number_of_entries = 3
frame_type = 252 /* append */
offset_delta = 27
locals = [ class question$Expr ]
frame_type = 254 /* append */
offset_delta = 41
locals = [ top, top, int ]
frame_type = 248 /* chop */
offset_delta = 2
}
SourceFile: "question.scala"
InnerClasses:
public static #51= #22 of #50; // Sum=class question$Sum of class question
public static abstract #52= #45 of #50; // Expr=class question$Expr of class question
public static #53= #16 of #50; // Number=class question$Number of class question
ScalaInlineInfo: length = 0xE (unknown attribute)
01 01 00 02 00 0A 00 09 01 00 0D 00 0E 01
Scala: length = 0x0 (unknown attribute)

Related

Efficient string concatenation in Scala

The JVM optimzes String concatenation with + and replaces it with a StringBuilder. This should be the same in Scala. But what happens if strings are concatenated with ++=?
var x = "x"
x ++= "y"
x ++= "z"
As far as I know this methods treats strings like char seqences, so even if the JVM would create a StringBuilder it would lead to many method calls, right? Would it be better to use a StringBuilder instead?
To what type is the String converted implicitly?
There is a huge HUGE difference in time taken.
If you add strings repeatedly using += you do not optimize away the O(n^2) cost of creating incrementally longer strings. So for adding one or two you won't see a difference, but it doesn't scale; by the time you get to adding 100 (short) strings, using a StringBuilder is over 20x faster. (Precise data: 1.3 us vs. 27.1 us to add the string representations of the numbers 0 to 100; timings should be reproducible to about += 5% and of course are for my machine.)
Using ++= on a var String is far far worse yet, because you are then instructing Scala to treat a string as a character-by-character collection which then requires all sorts of wrappers to make the String look like a collection (including boxed character-by-character addition using the generic version of ++!). Now you're 16x slower again on 100 additions! (Precise data: 428.8 us for ++= on a var string instead of +='s 26.7 us.)
If you write a single statement with a bunch of +es then the Scala compiler will use a StringBuilder and end up with an efficient result (Data: 1.8 us on non-constant strings pulled out of an array).
So, if you add strings with anything other than + in line, and you care about efficiency, use a StringBuilder. Definitely don't use ++= to add another String to a var String; there just isn't any reason to do it, and there's a big runtime penalty.
(Note--very often you don't care at all how efficient your string additions are! Don't clutter your code with extra StringBuilders unless you have reason to suspect that this particular code path is getting called a lot.)
Actually, the inconvenient truth is StringOps usually remains an allocation:
scala> :pa
// Entering paste mode (ctrl-D to finish)
class Concat {
var x = "x"
x ++= "y"
x ++= "z"
}
// Exiting paste mode, now interpreting.
defined class Concat
scala> :javap -prv Concat
Binary file Concat contains $line3.$read$$iw$$iw$Concat
Size 1211 bytes
MD5 checksum 1900522728cbb0ed0b1d3f8b962667ad
Compiled from "<console>"
public class $line3.$read$$iw$$iw$Concat
SourceFile: "<console>"
[snip]
public $line3.$read$$iw$$iw$Concat();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=6, locals=1, args_size=1
0: aload_0
1: invokespecial #19 // Method java/lang/Object."<init>":()V
4: aload_0
5: ldc #20 // String x
7: putfield #10 // Field x:Ljava/lang/String;
10: aload_0
11: new #22 // class scala/collection/immutable/StringOps
14: dup
15: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
18: aload_0
19: invokevirtual #30 // Method x:()Ljava/lang/String;
22: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
25: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
28: new #22 // class scala/collection/immutable/StringOps
31: dup
32: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
35: ldc #38 // String y
37: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
40: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
43: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
46: invokevirtual #42 // Method scala/Predef$.StringCanBuildFrom:()Lscala/collection/generic/CanBuildFrom;
49: invokevirtual #46 // Method scala/collection/immutable/StringOps.$plus$plus:(Lscala/collection/GenTraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;
52: checkcast #48 // class java/lang/String
55: invokevirtual #50 // Method x_$eq:(Ljava/lang/String;)V
See more demonstration at this answer.
Edit: To say more, you're building the String on each reassignment, so, no you're not using a single StringBuilder.
However, the optimization is done by javac and not the JIT compiler, so to compare fruits of the same kind:
public class Strcat {
public String strcat(String s) {
String t = " hi ";
String u = " by ";
return s + t + u; // OK
}
public String strcat2(String s) {
String t = s + " hi ";
String u = t + " by ";
return u; // bad
}
}
whereas
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> :se -Xprint:typer
scala> class K { def f(s: String, t: String, u: String) = s ++ t ++ u }
[[syntax trees at end of typer]] // <console>
def f(s: String, t: String, u: String): String = scala.this.Predef.augmentString(scala.this.Predef.augmentString(s).++[Char, String](scala.this.Predef.augmentString(t))(scala.this.Predef.StringCanBuildFrom)).++[Char, String](scala.this.Predef.augmentString(u))(scala.this.Predef.StringCanBuildFrom)
is bad. Or, worse, to unroll Rex's explanation:
"abc" ++ "def"
augmentString("abc").++[Char, String](augmentString("def"))(StringCanBuildFrom)
collection.mutable.StringBuilder.newBuilder ++= new WrappedString(augmentString("def"))
val b = collection.mutable.StringBuilder.newBuilder
new WrappedString(augmentString("def")) foreach b.+=
As Rex explained, StringBuilder overrides ++=(String) but not Growable.++=(Traversable[Char]).
In case you've ever wondered what unaugmentString is for:
28: invokevirtual #40 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
31: invokevirtual #43 // Method scala/Predef$.unaugmentString:(Ljava/lang/String;)Ljava/lang/String;
34: invokespecial #46 // Method scala/collection/immutable/WrappedString."<init>":(Ljava/lang/String;)V
And just to show that you do finally call unadorned +=(Char) but after boxing and unboxing:
public final scala.collection.mutable.StringBuilder apply(char);
flags: ACC_PUBLIC, ACC_FINAL
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: getfield #19 // Field b$1:Lscala/collection/mutable/StringBuilder;
4: iload_1
5: invokevirtual #24 // Method scala/collection/mutable/StringBuilder.$plus$eq:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun$1;
0 9 1 x C
LineNumberTable:
line 9: 0
public final java.lang.Object apply(java.lang.Object);
flags: ACC_PUBLIC, ACC_FINAL, ACC_BRIDGE, ACC_SYNTHETIC
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: aload_1
2: invokestatic #35 // Method scala/runtime/BoxesRunTime.unboxToChar:(Ljava/lang/Object;)C
5: invokevirtual #37 // Method apply:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun$1;
0 9 1 v1 Ljava/lang/Object;
LineNumberTable:
line 9: 0
A good laugh does get some oxygen into the bloodstream.

NPP_NewStream: seekable set to 0 (false) for local file

I am trying to implement an NPAPI plugin with streaming capabilities (NP_SEEK+NPN_RequestRead). No matter what I try the boolean NPBool seekable is always set to 0 (false).
I am starting firefox (iceweasel on Debian) from the command line on a local file:
$ iceweasel test1.html
If I attach gdb to the npapi plugin here is what I see:
(gdb)
#2 0x00007f7e9da54e14 in mozilla::plugins::BrowserStreamChild::StreamConstructed (this=0x7f7e925cf310, mimeType=..., seekable=<optimized out>, stype=<optimized out>)
at /tmp/buildd/iceweasel-24.6.0esr/dom/plugins/ipc/BrowserStreamChild.cpp:62
62 &mStream, seekable, stype);
(gdb)
#3 0x00007f7e9da5688e in mozilla::plugins::PluginInstanceChild::AnswerPBrowserStreamConstructor (this=<optimized out>, aActor=<optimized out>, url=...,
length=<optimized out>, lastmodified=<optimized out>, notifyData=<optimized out>, headers=..., mimeType=..., seekable=#0x7fff25ed51df: false, rv=0x7fff25ed51e0,
stype=0x7fff25ed51e2) at /tmp/buildd/iceweasel-24.6.0esr/dom/plugins/ipc/PluginInstanceChild.cpp:2285
2285 ->StreamConstructed(mimeType, seekable, stype);
(gdb) p seekable
$1 = (const bool &) #0x7fff25ed51df: false
(gdb) up
#4 0x00007f7e9da8f77f in mozilla::plugins::PPluginInstanceChild::OnCallReceived (this=0x7f7e925f2c00, __msg=..., __reply=#0x7fff25ed5470: 0x0)
at /tmp/buildd/iceweasel-24.6.0esr/build-xulrunner/ipc/ipdl/PPluginInstanceChild.cpp:2479
warning: Source file is more recent than executable.
2479 if ((!(AnswerPBrowserStreamConstructor(actor, url, length, lastmodified, notifyData, headers, mimeType, seekable, (&(rv)), (&(stype)))))) {
(gdb) list -
2469 if ((!(actor))) {
2470 return MsgValueError;
2471 }
2472 (actor)->mId = RegisterID(actor, (__handle).mId);
2473 (actor)->mManager = this;
2474 (actor)->mChannel = mChannel;
2475 (mManagedPBrowserStreamChild).InsertElementSorted(actor);
2476 (actor)->mState = mozilla::plugins::PBrowserStream::__Start;
2477
2478 int32_t __id = mId;
(gdb) list -
2459 FatalError("Error deserializing 'bool'");
2460 return MsgValueError;
2461 }
2462 (__msg).EndRead(__iter);
2463 if ((!(PPluginInstance::Transition(mState, Trigger(Trigger::Send, PPluginInstance::Msg_PBrowserStreamConstructor__ID), (&(mState)))))) {
2464 NS_WARNING("bad state transition!");
2465 }
2466 NPError rv;
2467 uint16_t stype;
2468 actor = AllocPBrowserStream(url, length, lastmodified, notifyData, headers, mimeType, seekable, (&(rv)), (&(stype)));
(gdb) list -
2449 }
2450 if ((!(Read((&(headers)), (&(__msg)), (&(__iter)))))) {
2451 FatalError("Error deserializing 'nsCString'");
2452 return MsgValueError;
2453 }
2454 if ((!(Read((&(mimeType)), (&(__msg)), (&(__iter)))))) {
2455 FatalError("Error deserializing 'nsCString'");
2456 return MsgValueError;
2457 }
2458 if ((!(Read((&(seekable)), (&(__msg)), (&(__iter)))))) {
(gdb) up
#5 0x00007f7e9da868f0 in mozilla::plugins::PPluginModuleChild::OnCallReceived (this=<optimized out>, __msg=..., __reply=#0x7fff25ed5470: 0x0)
at /tmp/buildd/iceweasel-24.6.0esr/build-xulrunner/ipc/ipdl/PPluginModuleChild.cpp:1023
warning: Source file is more recent than executable.
1023 return (__routed)->OnCallReceived(__msg, __reply);
(gdb) list -
1013 PPluginModuleChild::OnCallReceived(
1014 const Message& __msg,
1015 Message*& __reply)
1016 {
1017 int32_t __route = (__msg).routing_id();
1018 if ((MSG_ROUTING_CONTROL) != (__route)) {
1019 ChannelListener* __routed = Lookup(__route);
1020 if ((!(__routed))) {
1021 return MsgRouteError;
1022 }
(gdb) bt
If I copy test1.html over to /var/www, and then point to http://localhost/test1.html everything works as expected.
However the documentation mention
seekable
Boolean indicating whether the stream is seekable:
true: Seekable. Stream supports random access through calls to NPN_RequestRead (for example, local files or HTTP servers that support byte-range requests).
The documentation is outright lying.
The seekable flag in the call to NPP_NewStream
originates from OnStartBinding
which calls into nsPluginStreamListenerPeer::IsSeekable
which just returns nsPluginStreamListenerPeer::mSeekable.
The only time mSeekable is ever set true is when (source)
The stream is http (https, spdy)
The http response has no Content-Encoding
The http response provides a Content-Length.
The http response has Accept-Ranges: bytes (omitting the header is not supported)
For all other stream types (incl. file://) and http streams not matching the requirements the seekable flag is hence always false.
Moreover, NPN_RequestRead is only implemented for http streams, but doesn't actually care about seekable and furthermore does not actually check if the server returns 206.
Conclusion
You can only use NP_SEEKstreams with http (https, spdy). This is why stuff works from http://localhost, but not from a local file (file://).

Does Inheritance in implicit value classes introduce an overhead?

I want to apply scala's value classes to one of my projects because they enable me to enrich certain primitive types without great overhead (I hope) and stay type-safe.
object Position {
implicit class Pos( val i: Int ) extends AnyVal with Ordered[Pos] {
def +( p: Pos ): Pos = i + p.i
def -( p: Pos ): Pos = if ( i - p.i < 0 ) 0 else i - p.i
def compare( p: Pos ): Int = i - p.i
}
}
My question: Will the inheritance of Ordered force the allocation of Pos objects whenever I use them (thereby introduce great overhead) or not? If so: Is there a way to circumvent this?
Everytime Pos will be treated as an Ordered[Pos], allocation will happen.
There are several cases when allocation has to happen, see http://docs.scala-lang.org/overviews/core/value-classes.html#when_allocation_is_necessary.
So when doing something as simple as calling <, you will get allocations:
val x = Pos( 1 )
val y = Pos( 2 )
x < y // x & y promoted to an actual instance (allocation)
The relevant rules are (quoted from the above article):
Whenever a value class is treated as another type, including a universal trait, an instance of the actual value class must be instantiated
and:
Another instance of this rule is when a value class is used as a type argument.
Disassembling the above code snippet confirms this:
0: aload_0
1: iconst_1
2: invokevirtual #21 // Method Pos:(I)I
5: istore_1
6: aload_0
7: iconst_2
8: invokevirtual #21 // Method Pos:(I)I
11: istore_2
12: new #23 // class test/Position$Pos
15: dup
16: iload_1
17: invokespecial #26 // Method test/Position$Pos."<init>":(I)V
20: new #23 // class test/Position$Pos
23: dup
24: iload_2
25: invokespecial #26 // Method test/Position$Pos."<init>":(I)V
28: invokeinterface #32, 2 // InterfaceMethod scala/math/Ordered.$less:(Ljava/lang/Object;)Z
As can be seen we do have two instances of the "new" opcode for class Position$Pos
UPDATE: to avoid the allocation in simples cases like this, you can manually override each method (even if they only forward to the originlal implementation):
override def < (that: Pos): Boolean = super.<(that)
override def > (that: Pos): Boolean = super.>(that)
override def <= (that: Pos): Boolean = super.<=(that)
override def >= (that: Pos): Boolean = super.>=(that)
This will remove the allocation when doing x < y by example.
However, this still leaves the cases when Pos is treated as an Ordered[Pos] (as when passed to a method taking a Ordered[Pos] or an Ordered[T] with T being a type parameter). In this particular case, you will still get an allocation and there no way around that.

Is this a bug in Scala 2.9.1 lazy implementation or just an artifact of decompilation

I am considering using Scala on a pretty computationally intensive program. Profiling the C++ version of our code reveals that we could benefit significantly from Lazy evaluation. I have tried it out in Scala 2.9.1 and really like it. However, when I ran the class through a decompiler the implemenation didn't look quite right. I'm assuming that it's an artifact of the decompiler, but I wanted to get a more conclusive answer...
consider the following trivial example:
class TrivialAngle(radians : Double)
{
lazy val sin = math.sin(radians)
}
when I decompile it, I get this:
import scala.ScalaObject;
import scala.math.package.;
import scala.reflect.ScalaSignature;
#ScalaSignature(bytes="omitted")
public class TrivialAngle
implements ScalaObject
{
private final double radians;
private double sin;
public volatile int bitmap$0;
public double sin()
{
if ((this.bitmap$0 & 0x1) == 0);
synchronized (this)
{
if (
(this.bitmap$0 & 0x1) == 0)
{
this.sin = package..MODULE$.sin(this.radians);
this.bitmap$0 |= 1;
}
return this.sin;
}
}
public TrivialAngle(double radians)
{
}
}
To me, the return block is in the wrong spot, and you will always acquire the lock. This can't be what the real code is doing, but I am unable to confirm this. Can anyone confirm or deny that I have a bogus decompilation, and that the lazy implementation is somewhat reasonable (ie, only locks when it is computing the value, and doesn't acquire the lock for subsequent calls?)
Thanks!
For reference, this is the decompiler I used:
http://java.decompiler.free.fr/?q=jdgui
scala -Xprint:jvm reveals the true story:
[[syntax trees at end of jvm]]// Scala source: lazy.scala
package <empty> {
class TrivialAngle extends java.lang.Object with ScalaObject {
#volatile protected var bitmap$0: Int = 0;
<paramaccessor> private[this] val radians: Double = _;
lazy private[this] var sin: Double = _;
<stable> <accessor> lazy def sin(): Double = {
if (TrivialAngle.this.bitmap$0.&(1).==(0))
{
TrivialAngle.this.synchronized({
if (TrivialAngle.this.bitmap$0.&(1).==(0))
{
TrivialAngle.this.sin = scala.math.`package`.sin(TrivialAngle.this.radians);
TrivialAngle.this.bitmap$0 = TrivialAngle.this.bitmap$0.|(1);
()
};
scala.runtime.BoxedUnit.UNIT
});
()
};
TrivialAngle.this.sin
};
def this(radians: Double): TrivialAngle = {
TrivialAngle.this.radians = radians;
TrivialAngle.super.this();
()
}
}
}
It's a (since JVM 1.5) safe, and very fast, double checked lock.
More details:
What's the (hidden) cost of Scala's lazy val?
Be aware that if you have multiple lazy val members in a class, only one of them can be initialized at once, as they are guarded by synchronized(this) { ... }.
What I get with javap -c does not correspond to your decompile. In particular, there is no monitor enter when the field is found to be initialized. Version 2.9.1 too. There is still the memory barrier implied by the volatile access of course, so it does not come completely free. Comments starting with /// are mine
public double sin();
Code:
0: aload_0
1: getfield #14; //Field bitmap$0:I
4: iconst_1
5: iand
6: iconst_0
7: if_icmpne 54 /// if getField & 1 == O goto 54, skip lock
10: aload_0
11: dup
12: astore_1
13: monitorenter
/// 14 to 52 reasonably equivalent to synchronized block
/// in your decompiled code, without the return
53: monitorexit
54: aload_0
55: getfield #27; //Field sin:D
58: dreturn /// return outside lock
59: aload_1 /// (this would be the finally implied by the lock)
60: monitorexit
61: athrow
Exception table:
from to target type
14 54 59 any

What's the (hidden) cost of Scala's lazy val?

One handy feature of Scala is lazy val, where the evaluation of a val is delayed until it's necessary (at first access).
Of course, a lazy val must have some overhead - somewhere Scala must keep track of whether the value has already been evaluated and the evaluation must be synchronized, because multiple threads might try to access the value for the first time at the same time.
What exactly is the cost of a lazy val - is there a hidden boolean flag associated with a lazy val to keep track if it has been evaluated or not, what exactly is synchronized and are there any more costs?
In addition, suppose I do this:
class Something {
lazy val (x, y) = { ... }
}
Is this the same as having two separate lazy vals x and y or do I get the overhead only once, for the pair (x, y)?
This is taken from the scala mailing list and gives implementation details of lazy in terms of Java code (rather than bytecode):
class LazyTest {
lazy val msg = "Lazy"
}
is compiled to something equivalent to the following Java code:
class LazyTest {
public int bitmap$0;
private String msg;
public String msg() {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
msg = "Lazy";
}
}
bitmap$0 = bitmap$0 | 1;
}
}
return msg;
}
}
It looks like the compiler arranges for a class-level bitmap int field to flag multiple lazy fields as initialized (or not) and initializes the target field in a synchronized block if the relevant xor of the bitmap indicates it is necessary.
Using:
class Something {
lazy val foo = getFoo
def getFoo = "foo!"
}
produces sample bytecode:
0 aload_0 [this]
1 getfield blevins.example.Something.bitmap$0 : int [15]
4 iconst_1
5 iand
6 iconst_0
7 if_icmpne 48
10 aload_0 [this]
11 dup
12 astore_1
13 monitorenter
14 aload_0 [this]
15 getfield blevins.example.Something.bitmap$0 : int [15]
18 iconst_1
19 iand
20 iconst_0
21 if_icmpne 42
24 aload_0 [this]
25 aload_0 [this]
26 invokevirtual blevins.example.Something.getFoo() : java.lang.String [18]
29 putfield blevins.example.Something.foo : java.lang.String [20]
32 aload_0 [this]
33 aload_0 [this]
34 getfield blevins.example.Something.bitmap$0 : int [15]
37 iconst_1
38 ior
39 putfield blevins.example.Something.bitmap$0 : int [15]
42 getstatic scala.runtime.BoxedUnit.UNIT : scala.runtime.BoxedUnit [26]
45 pop
46 aload_1
47 monitorexit
48 aload_0 [this]
49 getfield blevins.example.Something.foo : java.lang.String [20]
52 areturn
53 aload_1
54 monitorexit
55 athrow
Values initialed in tuples like lazy val (x,y) = { ... } have nested caching via the same mechanism. The tuple result is lazily evaluated and cached, and an access of either x or y will trigger the tuple evaluation. Extraction of the individual value from the tuple is done independently and lazily (and cached). So the above double-instantiation code generates an x, y, and an x$1 field of type Tuple2.
With Scala 2.10, a lazy value like:
class Example {
lazy val x = "Value";
}
is compiled to byte code that resembles the following Java code:
public class Example {
private String x;
private volatile boolean bitmap$0;
public String x() {
if(this.bitmap$0 == true) {
return this.x;
} else {
return x$lzycompute();
}
}
private String x$lzycompute() {
synchronized(this) {
if(this.bitmap$0 != true) {
this.x = "Value";
this.bitmap$0 = true;
}
return this.x;
}
}
}
Note that the bitmap is represented by a boolean. If you add another field, the compiler will increase the size of the field to being able to represent at least 2 values, i.e. as a byte. This just goes on for huge classes.
But you might wonder why this works? The thread-local caches must be cleared when entering a synchronized block such that the non-volatile x value is flushed into memory. This blog article gives an explanation.
Scala SIP-20 proposes a new implementation of lazy val, which is more correct but ~25% slower than the "current" version.
The proposed implementation looks like:
class LazyCellBase { // in a Java file - we need a public bitmap_0
public static AtomicIntegerFieldUpdater<LazyCellBase> arfu_0 =
AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
public volatile int bitmap_0 = 0;
}
final class LazyCell extends LazyCellBase {
import LazyCellBase._
var value_0: Int = _
#tailrec final def value(): Int = (arfu_0.get(this): #switch) match {
case 0 =>
if (arfu_0.compareAndSet(this, 0, 1)) {
val result = 0
value_0 = result
#tailrec def complete(): Unit = (arfu_0.get(this): #switch) match {
case 1 =>
if (!arfu_0.compareAndSet(this, 1, 3)) complete()
case 2 =>
if (arfu_0.compareAndSet(this, 2, 3)) {
synchronized { notifyAll() }
} else complete()
}
complete()
result
} else value()
case 1 =>
arfu_0.compareAndSet(this, 1, 2)
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 2 =>
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 3 => value_0
}
}
As of June 2013 this SIP hasn't been approved. I expect that it's likely to be approved and included in a future version of Scala based on the mailing list discussion. Consequently, I think you'd be wise to heed Daniel Spiewak's observation:
Lazy val is *not* free (or even cheap). Use it only if you absolutely
need laziness for correctness, not for optimization.
I've written a post with regard to this issue https://dzone.com/articles/cost-laziness
In nutshell, the penalty is so small that in practice you can ignore it.
given the bycode generated by scala for lazy, it can suffer thread safety problem as mentioned in double check locking http://www.javaworld.com/javaworld/jw-05-2001/jw-0525-double.html?page=1

Resources