Does Inheritance in implicit value classes introduce an overhead? - performance

I want to apply scala's value classes to one of my projects because they enable me to enrich certain primitive types without great overhead (I hope) and stay type-safe.
object Position {
implicit class Pos( val i: Int ) extends AnyVal with Ordered[Pos] {
def +( p: Pos ): Pos = i + p.i
def -( p: Pos ): Pos = if ( i - p.i < 0 ) 0 else i - p.i
def compare( p: Pos ): Int = i - p.i
}
}
My question: Will the inheritance of Ordered force the allocation of Pos objects whenever I use them (thereby introduce great overhead) or not? If so: Is there a way to circumvent this?

Everytime Pos will be treated as an Ordered[Pos], allocation will happen.
There are several cases when allocation has to happen, see http://docs.scala-lang.org/overviews/core/value-classes.html#when_allocation_is_necessary.
So when doing something as simple as calling <, you will get allocations:
val x = Pos( 1 )
val y = Pos( 2 )
x < y // x & y promoted to an actual instance (allocation)
The relevant rules are (quoted from the above article):
Whenever a value class is treated as another type, including a universal trait, an instance of the actual value class must be instantiated
and:
Another instance of this rule is when a value class is used as a type argument.
Disassembling the above code snippet confirms this:
0: aload_0
1: iconst_1
2: invokevirtual #21 // Method Pos:(I)I
5: istore_1
6: aload_0
7: iconst_2
8: invokevirtual #21 // Method Pos:(I)I
11: istore_2
12: new #23 // class test/Position$Pos
15: dup
16: iload_1
17: invokespecial #26 // Method test/Position$Pos."<init>":(I)V
20: new #23 // class test/Position$Pos
23: dup
24: iload_2
25: invokespecial #26 // Method test/Position$Pos."<init>":(I)V
28: invokeinterface #32, 2 // InterfaceMethod scala/math/Ordered.$less:(Ljava/lang/Object;)Z
As can be seen we do have two instances of the "new" opcode for class Position$Pos
UPDATE: to avoid the allocation in simples cases like this, you can manually override each method (even if they only forward to the originlal implementation):
override def < (that: Pos): Boolean = super.<(that)
override def > (that: Pos): Boolean = super.>(that)
override def <= (that: Pos): Boolean = super.<=(that)
override def >= (that: Pos): Boolean = super.>=(that)
This will remove the allocation when doing x < y by example.
However, this still leaves the cases when Pos is treated as an Ordered[Pos] (as when passed to a method taking a Ordered[Pos] or an Ordered[T] with T being a type parameter). In this particular case, you will still get an allocation and there no way around that.

Related

Scala: call by name and call by value performance

Consider the following code:
class A(val name: String)
Compare the two wrappers of A:
class B(a: A){
def name: String = a.name
}
and
class B1(a: A){
val name: String = a.name
}
B and B1 have the same functionality. How is the memory efficiency and computation efficiency comparison between them? Will Scala compiler view them as the same thing?
First, I'll say that micro optimizations questions are usually tricky to answer for their lack of context. Additionally, this question has nothing to do with call by name or call by value, as non of your examples are call by name.
Now, let's compile your code with scalac -Xprint:typer and see what gets emitted:
class B extends scala.AnyRef {
<paramaccessor> private[this] val a: A = _;
def <init>(a: A): B = {
B.super.<init>();
()
};
def name: String = B.this.a.name
};
class B1 extends scala.AnyRef {
<paramaccessor> private[this] val a: A = _;
def <init>(a: A): B1 = {
B1.super.<init>();
()
};
private[this] val name: String = B1.this.a.name;
<stable> <accessor> def name: String = B1.this.name
};
In class B, we hold a reference to a, and have a method name which calls the name value on A.
In class B1, we store name locally, since it is a value of B1 directly, not a method. By definition, val declarations in have a method generated for them and that is how they're accessed.
This boils down to the fact that B1 holds an additional reference to the name string allocated by A. Is this significant in any way from a performance perspective? I don't know. It looks negligible to me under general question you've posted, but I wouldn't be to bothered with this unless you've profiled your application and found this a bottleneck.
Lets take this one step further, and run a simple JMH micro benchmark on this:
[info] Benchmark Mode Cnt Score Error Units
[info] MicroBenchClasses.testB1Access thrpt 50 296.291 ± 20.787 ops/us
[info] MicroBenchClasses.testBAccess thrpt 50 303.866 ± 5.435 ops/us
[info] MicroBenchClasses.testB1Access avgt 9 0.004 ± 0.001 us/op
[info] MicroBenchClasses.testBAccess avgt 9 0.003 ± 0.001 us/op
We see that call times are identical, since in both times we're invoking a method. One thing we can notice is that the throughput on B is higher, why is that? Lets look at the byte code:
B:
public java.lang.String name();
Code:
0: aload_0
1: getfield #20 // Field a:Lcom/testing/SimpleTryExample$A$1;
4: invokevirtual #22 // Method com/testing/SimpleTryExample$A$1.name:()Ljava/lang/String;
7: areturn
B1:
public java.lang.String name();
Code:
0: aload_0
1: getfield #19 // Field name:Ljava/lang/String;
4: areturn
It isn't trivial to understand why a getfield would be slower than a invokevirtual, but in the end the JIT may inline the getter call to name. This goes to show you that you should take nothing for granted, benchmark everything.
Code for test:
import java.util.concurrent.TimeUnit
import org.openjdk.jmh.annotations._
/**
* Created by Yuval.Itzchakov on 19/10/2017.
*/
#State(Scope.Thread)
#Warmup(iterations = 3, time = 1)
#Measurement(iterations = 3)
#BenchmarkMode(Array(Mode.AverageTime, Mode.Throughput))
#OutputTimeUnit(TimeUnit.MICROSECONDS)
#Fork(3)
class MicroBenchClasses {
class A(val name: String)
class B(a: A){
def name: String = a.name
}
class B1(a: A){
val name: String = a.name
}
var b: B = _
var b1: B1 = _
#Setup
def setup() = {
val firstA = new A("yuval")
val secondA = new A("yuval")
b = new B(firstA)
b1 = new B1(secondA)
}
#Benchmark
def testBAccess(): String = {
b.name
}
#Benchmark
def testB1Access(): String = {
b1.name
}
}

Efficient string concatenation in Scala

The JVM optimzes String concatenation with + and replaces it with a StringBuilder. This should be the same in Scala. But what happens if strings are concatenated with ++=?
var x = "x"
x ++= "y"
x ++= "z"
As far as I know this methods treats strings like char seqences, so even if the JVM would create a StringBuilder it would lead to many method calls, right? Would it be better to use a StringBuilder instead?
To what type is the String converted implicitly?
There is a huge HUGE difference in time taken.
If you add strings repeatedly using += you do not optimize away the O(n^2) cost of creating incrementally longer strings. So for adding one or two you won't see a difference, but it doesn't scale; by the time you get to adding 100 (short) strings, using a StringBuilder is over 20x faster. (Precise data: 1.3 us vs. 27.1 us to add the string representations of the numbers 0 to 100; timings should be reproducible to about += 5% and of course are for my machine.)
Using ++= on a var String is far far worse yet, because you are then instructing Scala to treat a string as a character-by-character collection which then requires all sorts of wrappers to make the String look like a collection (including boxed character-by-character addition using the generic version of ++!). Now you're 16x slower again on 100 additions! (Precise data: 428.8 us for ++= on a var string instead of +='s 26.7 us.)
If you write a single statement with a bunch of +es then the Scala compiler will use a StringBuilder and end up with an efficient result (Data: 1.8 us on non-constant strings pulled out of an array).
So, if you add strings with anything other than + in line, and you care about efficiency, use a StringBuilder. Definitely don't use ++= to add another String to a var String; there just isn't any reason to do it, and there's a big runtime penalty.
(Note--very often you don't care at all how efficient your string additions are! Don't clutter your code with extra StringBuilders unless you have reason to suspect that this particular code path is getting called a lot.)
Actually, the inconvenient truth is StringOps usually remains an allocation:
scala> :pa
// Entering paste mode (ctrl-D to finish)
class Concat {
var x = "x"
x ++= "y"
x ++= "z"
}
// Exiting paste mode, now interpreting.
defined class Concat
scala> :javap -prv Concat
Binary file Concat contains $line3.$read$$iw$$iw$Concat
Size 1211 bytes
MD5 checksum 1900522728cbb0ed0b1d3f8b962667ad
Compiled from "<console>"
public class $line3.$read$$iw$$iw$Concat
SourceFile: "<console>"
[snip]
public $line3.$read$$iw$$iw$Concat();
descriptor: ()V
flags: ACC_PUBLIC
Code:
stack=6, locals=1, args_size=1
0: aload_0
1: invokespecial #19 // Method java/lang/Object."<init>":()V
4: aload_0
5: ldc #20 // String x
7: putfield #10 // Field x:Ljava/lang/String;
10: aload_0
11: new #22 // class scala/collection/immutable/StringOps
14: dup
15: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
18: aload_0
19: invokevirtual #30 // Method x:()Ljava/lang/String;
22: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
25: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
28: new #22 // class scala/collection/immutable/StringOps
31: dup
32: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
35: ldc #38 // String y
37: invokevirtual #34 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
40: invokespecial #36 // Method scala/collection/immutable/StringOps."<init>":(Ljava/lang/String;)V
43: getstatic #28 // Field scala/Predef$.MODULE$:Lscala/Predef$;
46: invokevirtual #42 // Method scala/Predef$.StringCanBuildFrom:()Lscala/collection/generic/CanBuildFrom;
49: invokevirtual #46 // Method scala/collection/immutable/StringOps.$plus$plus:(Lscala/collection/GenTraversableOnce;Lscala/collection/generic/CanBuildFrom;)Ljava/lang/Object;
52: checkcast #48 // class java/lang/String
55: invokevirtual #50 // Method x_$eq:(Ljava/lang/String;)V
See more demonstration at this answer.
Edit: To say more, you're building the String on each reassignment, so, no you're not using a single StringBuilder.
However, the optimization is done by javac and not the JIT compiler, so to compare fruits of the same kind:
public class Strcat {
public String strcat(String s) {
String t = " hi ";
String u = " by ";
return s + t + u; // OK
}
public String strcat2(String s) {
String t = s + " hi ";
String u = t + " by ";
return u; // bad
}
}
whereas
$ scala
Welcome to Scala version 2.11.2 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_11).
Type in expressions to have them evaluated.
Type :help for more information.
scala> :se -Xprint:typer
scala> class K { def f(s: String, t: String, u: String) = s ++ t ++ u }
[[syntax trees at end of typer]] // <console>
def f(s: String, t: String, u: String): String = scala.this.Predef.augmentString(scala.this.Predef.augmentString(s).++[Char, String](scala.this.Predef.augmentString(t))(scala.this.Predef.StringCanBuildFrom)).++[Char, String](scala.this.Predef.augmentString(u))(scala.this.Predef.StringCanBuildFrom)
is bad. Or, worse, to unroll Rex's explanation:
"abc" ++ "def"
augmentString("abc").++[Char, String](augmentString("def"))(StringCanBuildFrom)
collection.mutable.StringBuilder.newBuilder ++= new WrappedString(augmentString("def"))
val b = collection.mutable.StringBuilder.newBuilder
new WrappedString(augmentString("def")) foreach b.+=
As Rex explained, StringBuilder overrides ++=(String) but not Growable.++=(Traversable[Char]).
In case you've ever wondered what unaugmentString is for:
28: invokevirtual #40 // Method scala/Predef$.augmentString:(Ljava/lang/String;)Ljava/lang/String;
31: invokevirtual #43 // Method scala/Predef$.unaugmentString:(Ljava/lang/String;)Ljava/lang/String;
34: invokespecial #46 // Method scala/collection/immutable/WrappedString."<init>":(Ljava/lang/String;)V
And just to show that you do finally call unadorned +=(Char) but after boxing and unboxing:
public final scala.collection.mutable.StringBuilder apply(char);
flags: ACC_PUBLIC, ACC_FINAL
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: getfield #19 // Field b$1:Lscala/collection/mutable/StringBuilder;
4: iload_1
5: invokevirtual #24 // Method scala/collection/mutable/StringBuilder.$plus$eq:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun$1;
0 9 1 x C
LineNumberTable:
line 9: 0
public final java.lang.Object apply(java.lang.Object);
flags: ACC_PUBLIC, ACC_FINAL, ACC_BRIDGE, ACC_SYNTHETIC
Code:
stack=2, locals=2, args_size=2
0: aload_0
1: aload_1
2: invokestatic #35 // Method scala/runtime/BoxesRunTime.unboxToChar:(Ljava/lang/Object;)C
5: invokevirtual #37 // Method apply:(C)Lscala/collection/mutable/StringBuilder;
8: areturn
LocalVariableTable:
Start Length Slot Name Signature
0 9 0 this L$line10/$read$$iw$$iw$$anonfun$1;
0 9 1 v1 Ljava/lang/Object;
LineNumberTable:
line 9: 0
A good laugh does get some oxygen into the bloodstream.

Is this a bug in Scala 2.9.1 lazy implementation or just an artifact of decompilation

I am considering using Scala on a pretty computationally intensive program. Profiling the C++ version of our code reveals that we could benefit significantly from Lazy evaluation. I have tried it out in Scala 2.9.1 and really like it. However, when I ran the class through a decompiler the implemenation didn't look quite right. I'm assuming that it's an artifact of the decompiler, but I wanted to get a more conclusive answer...
consider the following trivial example:
class TrivialAngle(radians : Double)
{
lazy val sin = math.sin(radians)
}
when I decompile it, I get this:
import scala.ScalaObject;
import scala.math.package.;
import scala.reflect.ScalaSignature;
#ScalaSignature(bytes="omitted")
public class TrivialAngle
implements ScalaObject
{
private final double radians;
private double sin;
public volatile int bitmap$0;
public double sin()
{
if ((this.bitmap$0 & 0x1) == 0);
synchronized (this)
{
if (
(this.bitmap$0 & 0x1) == 0)
{
this.sin = package..MODULE$.sin(this.radians);
this.bitmap$0 |= 1;
}
return this.sin;
}
}
public TrivialAngle(double radians)
{
}
}
To me, the return block is in the wrong spot, and you will always acquire the lock. This can't be what the real code is doing, but I am unable to confirm this. Can anyone confirm or deny that I have a bogus decompilation, and that the lazy implementation is somewhat reasonable (ie, only locks when it is computing the value, and doesn't acquire the lock for subsequent calls?)
Thanks!
For reference, this is the decompiler I used:
http://java.decompiler.free.fr/?q=jdgui
scala -Xprint:jvm reveals the true story:
[[syntax trees at end of jvm]]// Scala source: lazy.scala
package <empty> {
class TrivialAngle extends java.lang.Object with ScalaObject {
#volatile protected var bitmap$0: Int = 0;
<paramaccessor> private[this] val radians: Double = _;
lazy private[this] var sin: Double = _;
<stable> <accessor> lazy def sin(): Double = {
if (TrivialAngle.this.bitmap$0.&(1).==(0))
{
TrivialAngle.this.synchronized({
if (TrivialAngle.this.bitmap$0.&(1).==(0))
{
TrivialAngle.this.sin = scala.math.`package`.sin(TrivialAngle.this.radians);
TrivialAngle.this.bitmap$0 = TrivialAngle.this.bitmap$0.|(1);
()
};
scala.runtime.BoxedUnit.UNIT
});
()
};
TrivialAngle.this.sin
};
def this(radians: Double): TrivialAngle = {
TrivialAngle.this.radians = radians;
TrivialAngle.super.this();
()
}
}
}
It's a (since JVM 1.5) safe, and very fast, double checked lock.
More details:
What's the (hidden) cost of Scala's lazy val?
Be aware that if you have multiple lazy val members in a class, only one of them can be initialized at once, as they are guarded by synchronized(this) { ... }.
What I get with javap -c does not correspond to your decompile. In particular, there is no monitor enter when the field is found to be initialized. Version 2.9.1 too. There is still the memory barrier implied by the volatile access of course, so it does not come completely free. Comments starting with /// are mine
public double sin();
Code:
0: aload_0
1: getfield #14; //Field bitmap$0:I
4: iconst_1
5: iand
6: iconst_0
7: if_icmpne 54 /// if getField & 1 == O goto 54, skip lock
10: aload_0
11: dup
12: astore_1
13: monitorenter
/// 14 to 52 reasonably equivalent to synchronized block
/// in your decompiled code, without the return
53: monitorexit
54: aload_0
55: getfield #27; //Field sin:D
58: dreturn /// return outside lock
59: aload_1 /// (this would be the finally implied by the lock)
60: monitorexit
61: athrow
Exception table:
from to target type
14 54 59 any

What's the (hidden) cost of Scala's lazy val?

One handy feature of Scala is lazy val, where the evaluation of a val is delayed until it's necessary (at first access).
Of course, a lazy val must have some overhead - somewhere Scala must keep track of whether the value has already been evaluated and the evaluation must be synchronized, because multiple threads might try to access the value for the first time at the same time.
What exactly is the cost of a lazy val - is there a hidden boolean flag associated with a lazy val to keep track if it has been evaluated or not, what exactly is synchronized and are there any more costs?
In addition, suppose I do this:
class Something {
lazy val (x, y) = { ... }
}
Is this the same as having two separate lazy vals x and y or do I get the overhead only once, for the pair (x, y)?
This is taken from the scala mailing list and gives implementation details of lazy in terms of Java code (rather than bytecode):
class LazyTest {
lazy val msg = "Lazy"
}
is compiled to something equivalent to the following Java code:
class LazyTest {
public int bitmap$0;
private String msg;
public String msg() {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
if ((bitmap$0 & 1) == 0) {
synchronized (this) {
msg = "Lazy";
}
}
bitmap$0 = bitmap$0 | 1;
}
}
return msg;
}
}
It looks like the compiler arranges for a class-level bitmap int field to flag multiple lazy fields as initialized (or not) and initializes the target field in a synchronized block if the relevant xor of the bitmap indicates it is necessary.
Using:
class Something {
lazy val foo = getFoo
def getFoo = "foo!"
}
produces sample bytecode:
0 aload_0 [this]
1 getfield blevins.example.Something.bitmap$0 : int [15]
4 iconst_1
5 iand
6 iconst_0
7 if_icmpne 48
10 aload_0 [this]
11 dup
12 astore_1
13 monitorenter
14 aload_0 [this]
15 getfield blevins.example.Something.bitmap$0 : int [15]
18 iconst_1
19 iand
20 iconst_0
21 if_icmpne 42
24 aload_0 [this]
25 aload_0 [this]
26 invokevirtual blevins.example.Something.getFoo() : java.lang.String [18]
29 putfield blevins.example.Something.foo : java.lang.String [20]
32 aload_0 [this]
33 aload_0 [this]
34 getfield blevins.example.Something.bitmap$0 : int [15]
37 iconst_1
38 ior
39 putfield blevins.example.Something.bitmap$0 : int [15]
42 getstatic scala.runtime.BoxedUnit.UNIT : scala.runtime.BoxedUnit [26]
45 pop
46 aload_1
47 monitorexit
48 aload_0 [this]
49 getfield blevins.example.Something.foo : java.lang.String [20]
52 areturn
53 aload_1
54 monitorexit
55 athrow
Values initialed in tuples like lazy val (x,y) = { ... } have nested caching via the same mechanism. The tuple result is lazily evaluated and cached, and an access of either x or y will trigger the tuple evaluation. Extraction of the individual value from the tuple is done independently and lazily (and cached). So the above double-instantiation code generates an x, y, and an x$1 field of type Tuple2.
With Scala 2.10, a lazy value like:
class Example {
lazy val x = "Value";
}
is compiled to byte code that resembles the following Java code:
public class Example {
private String x;
private volatile boolean bitmap$0;
public String x() {
if(this.bitmap$0 == true) {
return this.x;
} else {
return x$lzycompute();
}
}
private String x$lzycompute() {
synchronized(this) {
if(this.bitmap$0 != true) {
this.x = "Value";
this.bitmap$0 = true;
}
return this.x;
}
}
}
Note that the bitmap is represented by a boolean. If you add another field, the compiler will increase the size of the field to being able to represent at least 2 values, i.e. as a byte. This just goes on for huge classes.
But you might wonder why this works? The thread-local caches must be cleared when entering a synchronized block such that the non-volatile x value is flushed into memory. This blog article gives an explanation.
Scala SIP-20 proposes a new implementation of lazy val, which is more correct but ~25% slower than the "current" version.
The proposed implementation looks like:
class LazyCellBase { // in a Java file - we need a public bitmap_0
public static AtomicIntegerFieldUpdater<LazyCellBase> arfu_0 =
AtomicIntegerFieldUpdater.newUpdater(LazyCellBase.class, "bitmap_0");
public volatile int bitmap_0 = 0;
}
final class LazyCell extends LazyCellBase {
import LazyCellBase._
var value_0: Int = _
#tailrec final def value(): Int = (arfu_0.get(this): #switch) match {
case 0 =>
if (arfu_0.compareAndSet(this, 0, 1)) {
val result = 0
value_0 = result
#tailrec def complete(): Unit = (arfu_0.get(this): #switch) match {
case 1 =>
if (!arfu_0.compareAndSet(this, 1, 3)) complete()
case 2 =>
if (arfu_0.compareAndSet(this, 2, 3)) {
synchronized { notifyAll() }
} else complete()
}
complete()
result
} else value()
case 1 =>
arfu_0.compareAndSet(this, 1, 2)
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 2 =>
synchronized {
while (arfu_0.get(this) != 3) wait()
}
value_0
case 3 => value_0
}
}
As of June 2013 this SIP hasn't been approved. I expect that it's likely to be approved and included in a future version of Scala based on the mailing list discussion. Consequently, I think you'd be wise to heed Daniel Spiewak's observation:
Lazy val is *not* free (or even cheap). Use it only if you absolutely
need laziness for correctness, not for optimization.
I've written a post with regard to this issue https://dzone.com/articles/cost-laziness
In nutshell, the penalty is so small that in practice you can ignore it.
given the bycode generated by scala for lazy, it can suffer thread safety problem as mentioned in double check locking http://www.javaworld.com/javaworld/jw-05-2001/jw-0525-double.html?page=1

ruby win32api & structs (VerQueryValue)

I am trying to call the standard Win32 API functions to get file version info, using the win32-api library.
The 3 version.dll functions are GetFileVersionInfoSize, GetFileVersionInfo, and VerQueryValue. Then I call RtlMoveMemory in kernel32.dll to get a copy of the VS_FIXEDFILEINFO struct (see Microsoft documentation: http://msdn.microsoft.com/en-us/library/ms646997%28VS.85%29.aspx).
I drew from an example I saw using VB: http://support.microsoft.com/kb/139491.
My problem is that the data that finally gets returned doesn't seem to match the expected struct, in fact it doesn't even return a consistent value. I suspect the data is getting mangled at some point, probably in VerQueryValue or RtlMoveMemory.
Here is the code:
GetFileVersionInfoSize = Win32::API.new('GetFileVersionInfoSize','PP','I','version.dll')
GetFileVersionInfo = Win32::API.new('GetFileVersionInfo','PIIP','I', 'version.dll')
VerQueryValue = Win32::API.new('VerQueryValue','PPPP','I', 'version.dll')
RtlMoveMemory = Win32::API.new('RtlMoveMemory', 'PPI', 'V', 'kernel32.dll')
buf = [0].pack('L')
version_size = GetFileVersionInfoSize.call(myfile + "\0", buf)
raise Exception.new if version_size == 0 #TODO
version_info = 0.chr * version_size
version_ok = GetFileVersionInfo.call(file, 0, version_size, version_info)
raise Exception.new if version_ok == 0 #TODO
addr = [0].pack('L')
size = [0].pack('L')
query_ok = VerQueryValue.call(version_info, "\\\0", addr, size)
raise Exception.new if query_ok == 0 #TODO
# note that at this point, size == 4 -- is that right?
fixed_info = Array.new(13,0).pack('L*')
RtlMoveMemory.call(fixed_info, addr, fixed_info.length)
# fixed_info.unpack('L*') #=> seemingly random data, usually only the first two dwords' worth and the rest 0.
This is the full code I got to work, in case others are looking for such a function.
Returns an array with four parts of product/file version number (i.e., what is called "File Version" in a dll file properties window):
def file_version ref, options = {}
options = {:path => LIBDIR, :extension => 'dll'}.merge(options)
begin
file = File.join(ROOT, options[:path],"#{ref}.#{options[:extension]}").gsub(/\//,"\\")
buf = [0].pack('L')
version_size = GetFileVersionInfoSize.call(file + "\0", buf)
raise Exception.new if version_size == 0 #TODO
version_info = 0.chr * version_size
version_ok = GetFileVersionInfo.call(file, 0, version_size, version_info)
raise Exception.new if version_ok == 0 #TODO
addr = [0].pack('L')
size = [0].pack('L')
query_ok = VerQueryValue.call(version_info, "\\\0", addr, size)
raise Exception.new if query_ok == 0 #TODO
fixed_info = Array.new(18,0).pack('LSSSSSSSSSSLLLLLLL')
RtlMoveMemory.call(fixed_info, addr.unpack('L')[0], fixed_info.length)
fixed_info.unpack('LSSSSSSSSSSLLLLLLL')[5..8].reverse
rescue
[]
end
end
The answer in https://stackoverflow.com/a/2224681/3730446 isn't strictly correct: the VS_FIXEDFILEINFO struct contains separate FileVersion and ProductVersion. That code returns a version number consisting of the two more-significant components of the ProductVersion and the two less-significant components of the FileVersion. Most times I've seen, that wouldn't matter because both Product- and FileVersion have the same value, but you never know what you might encounter in the wild.
We can see this by comparing the VS_FIXEDFILEINFO struct from http://msdn.microsoft.com/en-us/library/windows/desktop/ms646997(v=vs.85).aspx and the format string we're using to pack and unpack the buffer:
typedef struct tagVS_FIXEDFILEINFO {
DWORD dwSignature; // 0: L
DWORD dwStrucVersion; // 1: S
// 2: S
DWORD dwFileVersionMS; // 3: S
// 4: S
DWORD dwFileVersionLS; // 5: S
// 6: S
DWORD dwProductVersionMS; // 7: S
// 8: S
DWORD dwProductVersionLS; // 9: S
// 10: S
DWORD dwFileFlagsMask; // 11: L
DWORD dwFileFlags; // 12: L
DWORD dwFileOS; // 13: L
DWORD dwFileType; // 14: L
DWORD dwFileSubtype; // 15: L
DWORD dwFileDateMS; // 16: L
DWORD dwFileDateLS; // 17: L
} VS_FIXEDFILEINFO;
Subscripts 5 to 8, then, consists of dwFileVersionLS and dwProductVersionMS. Getting FileVersion and ProductVersion correctly would look like this:
info = fixed_info.unpack('LSSSSSSSSSSLLLLLLL')
file_version = [ info[4], info[3], info[6], info[5] ]
product_version = [ info[8], info[7], info[10], info[9] ]

Resources