Boost preprocessor - strange result - visual-studio-2010

Check the following macro:
#define INPUT (char, "microsecond", "us")(int, "millisecond", "ms")(int, "second", "s")(int, "minute", "min")(float, "hour", "h")
Goal is to add double parentheses around each tuple resulting in:
((char, "microsecond", "us"))((int, "millisecond", "ms"))((int, "second", "s"))((int, "minute", "min"))((float, "hour", "h"))
Now I use the following macros to do this job:
#define ADD_PAREN_1(A, B, C) ((A, B, C)) ADD_PAREN_2
#define ADD_PAREN_2(A, B, D) ((A, B, C)) ADD_PAREN_1
#define ADD_PAREN_1_END
#define ADD_PAREN_2_END
#define OUTPUT0 ADD_PAREN_1 INPUT
#define OUTPUT1 BOOST_PP_CAT( OUTPUT0, _END )
The result is as follows:
OUTPUT0 is fine:
((char, "microsecond", "us")) ((int, "millisecond", C)) ((int, "second", "s")) ((int, "minute", C)) ((float, "hour", "h")) ADD_PAREN_2
But when BOOST_PP_CAT is called the result of OUTPUT1 is:
float
I do not understand this behaviour. Any hints ?
Note I use Visual Studio 2010

The preprocessor works by scanning and expanding. So when it expands your OUTPUT0 macro, it gives:
ADD_PAREN_1 INPUT
^
Then it scans the next token to see if it is a parenthesis, and if it is it will invoke ADD_PAREN_1 as a function macro. However, it will only see INPUT, So it doesn't invoke ADD_PAREN_1. Next it scans and expands the next token:
ADD_PAREN_1 INPUT
^
Which will result in this:
ADD_PAREN_1 (char, "microsecond", "us")(int, "millisecond", "ms")(int, "second", "s")(int, "minute", "min")(float, "hour", "h")
^
Next when you try to use OUTPUT1, it will expand to this:
BOOST_PP_CAT( OUTPUT0, _END )
Which BOOST_PP_CAT will expand OUTPUT0 and then concat the tokens, so you will ultimately get this:
ADD_PAREN_1 (char, "microsecond", "us")(int, "millisecond", "ms")(int, "second", "s")(int, "minute", "min")(float, "hour", "h") ## _END
As you can see you are concating a parenthesis with _END, which is not allowed, and results in a compiler error. In Visual Studio, you may see different results, as their preprocessor works in mysterious ways.
Ultimately, to make it work you just need to apply an extra scan in the OUTPUT0 macro, something like this:
#define X(x) x
#define OUTPUT0 X(ADD_PAREN_1 INPUT)
Which will work in C preprocessors, I don't know if it will exactly work in Visual Studio(I don't have access to it right now to check), but I do know this works:
#define ADD_PAREN(x) BOOST_PP_CAT(ADD_PAREN_1 x, _END)
#define ADD_PAREN_1(A, B, C) ((A, B, C)) ADD_PAREN_2
#define ADD_PAREN_2(A, B, D) ((A, B, C)) ADD_PAREN_1
#define ADD_PAREN_1_END
#define ADD_PAREN_2_END
#define OUTPUT1 ADD_PAREN(INPUT)
Which is similar to how they do it in boost. See how the BOOST_FUSION_ADAPT_ASSOC_STRUCT_FILLER macro is used here.

Related

What is the GCC documentation and example saying about inline asm and not using early clobbers so a pointer shares a register with a mem input?

The GCC documentation (https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers-and-Scratch-Registers-1) contains the following PowerPC example and description:
static void
dgemv_kernel_4x4 (long n, const double *ap, long lda,
const double *x, double *y, double alpha)
{
double *a0;
double *a1;
double *a2;
double *a3;
__asm__
(
/* lots of asm here */
"#n=%1 ap=%8=%12 lda=%13 x=%7=%10 y=%0=%2 alpha=%9 o16=%11\n"
"#a0=%3 a1=%4 a2=%5 a3=%6"
:
"+m" (*(double (*)[n]) y),
"+&r" (n), // 1
"+b" (y), // 2
"=b" (a0), // 3
"=&b" (a1), // 4
"=&b" (a2), // 5
"=&b" (a3) // 6
:
"m" (*(const double (*)[n]) x),
"m" (*(const double (*)[]) ap),
"d" (alpha), // 9
"r" (x), // 10
"b" (16), // 11
"3" (ap), // 12
"4" (lda) // 13
:
"cr0",
"vs32","vs33","vs34","vs35","vs36","vs37",
"vs40","vs41","vs42","vs43","vs44","vs45","vs46","vs47"
);
}
... On the other hand, ap can’t be the same as any of the other inputs, so an early-clobber
on a0 is not needed. It is also not desirable in this case. An
early-clobber on a0 would cause GCC to allocate a separate register
for the "m" (*(const double (*)[]) ap) input. Note that tying an
input to an output is the way to set up an initialized temporary
register modified by an asm statement. An input not tied to an output
is assumed by GCC to be unchanged...
I am totally confused about this description:
For the code there is no relationship between "m" (*(const double (*)[]) ap) and "=b" (a0). "=b" (a0) will share the register with "3" (ap), which saves the address of the input parameter, and "m" (*(const double (*)[]) ap) is the content of the first element of ap, so why an early-clobber on a0 will impact "m" (*(const double (*)[]) ap)?
Even if gcc allocate a new register to "m" (*(const double (*)[]) ap), I still don't understand what the problem. Since there is tied between "=b" (a0) and "3" (ap), so we can still read / write through the register that allocated for "=b" (a0)?
This is an efficiency consideration, not correctness, stopping GCC from wasting instructions (and creating register pressure).
"m" (*(const double (*)[]) ap) isn't the first element, it's an arbitrary-length array, letting the compiler know that the entire array object is an input. But it's a dummy input; the asm template won't actually use that operand, instead looping over the array via the pointer input "3" (ap)
See How can I indicate that the memory *pointed* to by an inline ASM argument may be used? for more about this technique.
But "m" inputs are real inputs that have to work expand to an addressing mode if the template does use them, including after early-clobbers have clobbered their register.
With =&b(a0) / "3"(ap), GCC couldn't pick the same register as the base for an addressing mode for "m" (*(const double (*)[]) ap).
So it would have to waste an instruction ahead of the asm statement copying the address to another register. Also wasting that integer register.

Is there any difference in expressiveness between an extern praxi and an extern castfn?

Consider:
#include "share/atspre_staload.hats"
fun only_zero(n: int(0)): void =
println!("This is definitely zero: ", n)
fun less_than{n,m:int | n < m}(n: int(n), m: int(m)): void =
println!(n, " is less than ", m)
implement main0() = (
only_zero(zeroify(n));
only_zero(m);
less_than(b, a);
less_than(f, e) where { val (f, e) = make_less_than((d, c)) };
) where {
val n = 5
val m = ~5
val (a, b, c, d) = (1, 2, 3, 4)
extern castfn zeroify{n:int}(n: int(n)): int(0)
extern praxi lemma_this_is_zero{n:int}(n: int(n)): [n == 0] void
extern castfn make_less_than{n,m:int}(t: (int(n), int(m))): [o,p:int | o < p] (int(o), int(p))
extern praxi lemma_less_than{n,m:int}(n: int(n), m: int(m)): [n < m] void
prval _ = lemma_this_is_zero(m)
}
which has this output:
This is definitely zero: 5
This is definitely zero: -5
2 is less than 1
4 is less than 3
Are there cases that demand one of these over the other?
If you use 'castfn', you need to make sure that there is a corresponding implicit cast function in the target language. For instance, int2double is a castfn if C is the target language.
On the other hand, praxi/prfun is completely erased, having no trace in the generated code.
I would say that praxi/prfun is more general, but int2double is definitely not a praxi/prfun.

inline assembly instruction with two return registers

I have an custom instruction for a processor, it has two return registers and two operands like:
MINMAX rdMin, RdMax, rs1, rs2
It returns the minimum and maximum out of rs1 and rs2. I have verified this instruction using assembly program. It works fine. Now I want to use this instruction from GCC using inline assembly. I tried the following code, but it did not give the correct values of rdMin and rdMax. Is there any mistake in the syntax.
int main() {
unsigned int array[10] = { 45, 75,0,0,0,0,0,0,0};
int op1=16,op2=18,out,out1,out2;
//asm for AVG rd, rs1, rs2
__asm__ volatile (
"avg %[my_out], %[my_op1], %[my_op2]\n"
: [my_out] "=&r" (out)
: [my_op1] "r" (op1),[my_op2] "r" (op2)
);
//asm for MinMax rdMin, rdMax, rs1, rs2
__asm__ volatile (
"minmax %[my_out1], %[my_out2], %[my_op1], %[my_op2]\n"
: [my_out1] "=r" (out1), [my_out2] "=r" (out2)
: [my_op1] "r" (op1), [my_op2] "r" (op2)
);
array[3] = out;
array[4] = out1;
array[5] = out2;
return 0;
}
Thanks.

Change the parsing language

I'm using a modal-SAT solver. This solver is unfortunately using Flex and Bison, both languages that I don't master...
I wanted to change one syntax to another, but I've got some issue to do it, even after tutorials about Flex-Lexer and Bison.
So here is the problem :
I want to be able to parse such modal logic formulas :
In the previous notation, such formulas were written like this :
(NOT (IMP (AND (ALL R0 (IMP C0 C1)) (ALL R0 C0)) (ALL R0 C1)))
And here are the Flex/Bisons file used to parse them :
alc.y
%{
#include "fnode.h"
#define YYMAXDEPTH 1000000
fnode_t *formula_as_tree;
%}
%union {
int l;
int i;
fnode_t *f;
}
/* Tokens and types */
%token LP RP
%token ALL SOME
%token AND IMP OR IFF NOT
%token TOP BOT
%token RULE CONC
%token <l> NUM
%type <f> formula
%type <f> boolean_expression rule_expression atomic_expression
%type <f> other
%type <i> uboolop bboolop nboolop ruleop
%type <l> rule
%% /* Grammar rules */
input: formula {formula_as_tree = $1;}
;
formula: boolean_expression {$$ = $1;}
| rule_expression {$$ = $1;}
| atomic_expression {$$ = $1;}
;
boolean_expression: LP uboolop formula RP
{$$ = Make_formula_nary($2,empty_code,$3);}
| LP bboolop formula formula RP
{$$ = Make_formula_nary($2,empty_code, Make_operand_nary($3,$4));}
| LP nboolop formula other RP
{$$ = Make_formula_nary($2,empty_code,Make_operand_nary($3,$4));}
;
rule_expression: LP ruleop rule formula RP {$$ = Make_formula_nary($2,$3,$4);}
;
atomic_expression: CONC NUM {$$ = Make_formula_nary(atom_code,$2,Make_empty());}
| TOP {$$ = Make_formula_nary(top_code,empty_code,Make_empty());}
| BOT {$$ = Make_formula_nary(bot_code,empty_code,Make_empty());}
;
other: formula other {$$ = Make_operand_nary($1,$2);}
| {$$ = Make_empty();}
;
uboolop: NOT {$$ = not_code;}
;
bboolop: IFF {$$ = iff_code;}
| IMP {$$ = imp_code;}
;
nboolop: AND {$$ = and_code;}
| OR {$$ = or_code;}
;
ruleop: SOME {$$ = dia_code;}
| ALL {$$ = box_code;}
rule: RULE NUM {$$ = $2;}
;
%% /* End of grammar rules */
int yyerror(char *s)
{
printf("%s\n", s);
exit(0);
}
alc.lex
%{
#include <stdio.h>
#include "fnode.h"
#include "y.tab.h"
int number;
%}
%%
[ \n\t] ;
"(" return LP;
")" return RP;
"ALL" return ALL;
"SOME" return SOME;
"AND" return AND;
"IMP" return IMP;
"OR" return OR;
"IFF" return IFF;
"NOT" return NOT;
"TOP" return TOP;
"BOTTOM" return BOT;
"R" return RULE;
"C" return CONC;
0|[1-9][0-9]* {
sscanf(yytext,"%d",&number);
yylval.l=number;
return NUM;
}
. {
/* Error function */
fprintf(stderr,"Illegal character\n");
return -1;
}
%%
Now, let's write our example but in the new syntax that I want to use :
begin
(([r0](~pO | p1) & [r0]p0) | [r0]p1)
end
Major problems for me that are blocking me to parse this new syntax correctly is :
IMP (A B) is now written ~B | A (as in the boolean logic (A => B) <=> (~B v A)).
ALL RO is now written [r0].
SOME RO is now written <r0>.
IFF (A B) is now written (~B | A) & (~A | B). (IFF stands for if and only if)
Here is the small list of what are the new symbol, even if I don't know how to parse them :
"(" return LP;
")" return RP;
"[]" return ALL;
"<>" return SOME;
"&" return AND;
"IMP" return IMP;
"|" return OR;
"IFF" return IFF;
"~" return NOT;
"true" return TOP;
"false" return BOT;
"r" return RULE;
"p" return CONC;
I assume that only theses 2 files will change, Because it should still be able to read the previous syntaxe, by compiling the source code with other .y and .lex
But I'm asking your help to know exactly how to write it down :/
Thanks in advance !
Tommi Junttila's BC Package implements a language for Boolean expressions and circuits using Bison and Flex.
To study the source files won't fully replace going through a proper Bison/Flex tutorial, but it certainly should give you a good start.
For someone who would have the exact same problem (I assume that this problem is quite rare :) )
With the good vocabulary, it's much easier to google the problem and find a solution.
The first notation
(NOT (IMP (AND (ALL R0 (IMP C0 C1)) (ALL R0 C0)) (ALL R0 C1)))
is the ALC format.
The other notation
begin
(([r0](~pO | p1) & [r0]p0) | [r0]p1)
end
is the InToHyLo format.
And there is a tool called the formula translation tool ("ftt") developed and bundled with Spartacus (http://www.ps.uni-saarland.de/spartacus/). It can translate between all the formats of provers.
Using this tool is a little hack who avoid dealing with the Flex/Bison languages.
One just needs to translate one problem to another, problems will be equivalent and it's very fast to translate.

performance of static member constraint functions

I'm trying to learn static member constraints in F#. From reading Tomas Petricek's blog post, I understand that writing an inline function that "uses only operations that are themselves written using static member constraints" will make my function work correctly for all numeric types that satisfy those constraints. This question indicates that inline works somewhat similarly to c++ templates, so I wasn't expecting any performance difference between these two functions:
let MultiplyTyped (A : double[,]) (B : double[,]) =
let rA, cA = (Array2D.length1 A) - 1, (Array2D.length2 A) - 1
let cB = (Array2D.length2 B) - 1
let C = Array2D.zeroCreate<double> (Array2D.length1 A) (Array2D.length2 B)
for i = 0 to rA do
for k = 0 to cA do
for j = 0 to cB do
C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]
C
let inline MultiplyGeneric (A : 'T[,]) (B : 'T[,]) =
let rA, cA = Array2D.length1 A - 1, Array2D.length2 A - 1
let cB = Array2D.length2 B - 1
let C = Array2D.zeroCreate<'T> (Array2D.length1 A) (Array2D.length2 B)
for i = 0 to rA do
for k = 0 to cA do
for j = 0 to cB do
C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]
C
Nevertheless, to multiply two 1024 x 1024 matrixes, MultiplyTyped completes in an average of 2550 ms on my machine, whereas MultiplyGeneric takes about 5150 ms. I originally thought that zeroCreate was at fault in the generic version, but changing that line to the one below didn't make a difference.
let C = Array2D.init<'T> (Array2D.length1 A) (Array2D.length2 B) (fun i j -> LanguagePrimitives.GenericZero)
Is there something I'm missing here to make MultiplyGeneric perform the same as MultiplyTyped? Or is this expected?
edit: I should mention that this is VS2010, F# 2.0, Win7 64bit, release build. Platform target is x64 (to test larger matrices) - this makes a difference: x86 produces similar results for the two functions.
Bonus question: the type inferred for MultiplyGeneric is the following:
val inline MultiplyGeneric :
^T [,] -> ^T [,] -> ^T [,]
when ( ^T or ^a) : (static member ( + ) : ^T * ^a -> ^T) and
^T : (static member ( * ) : ^T * ^T -> ^a)
Where does the ^a type come from?
edit 2: here's my testing code:
let r = new System.Random()
let A = Array2D.init 1024 1024 (fun i j -> r.NextDouble())
let B = Array2D.init 1024 1024 (fun i j -> r.NextDouble())
let test f =
let sw = System.Diagnostics.Stopwatch.StartNew()
f() |> ignore
sw.Stop()
printfn "%A" sw.ElapsedMilliseconds
for i = 1 to 5 do
test (fun () -> MultiplyTyped A B)
for i = 1 to 5 do
test (fun () -> MultiplyGeneric A B)
Good question. I'll answer the easy part first: the ^a is just part of the natural generalization process. Imagine you had a type like this:
type T = | T with
static member (+)(T, i:int) = T
static member (*)(T, T) = 0
Then you can still use your MultiplyGeneric function with arrays of this type: multiplying elements of A and B will give you ints, but that's okay because you can still add them to elements of C and get back values of type T to store back into C.
As to your performance question, I'm afraid I don't have a great explanation. Your basic understanding is right - using MultiplyGeneric with double[,] arguments should be equivalent to using MultiplyTyped. If you use ildasm to look at the IL the compiler generates for the following F# code:
let arr = Array2D.zeroCreate 1024 1024
let f1 = MultiplyTyped arr
let f2 = MultiplyGeneric arr
let timer = System.Diagnostics.Stopwatch()
timer.Start()
f1 arr |> ignore
printfn "%A" timer.Elapsed
timer.Restart()
f2 arr |> ignore
printfn "%A" timer.Elapsed
then you can see that the compiler really does generate identical code for each of them, putting the inlined code for MultipyGeneric into an internal static function. The only difference that I see in the generated code is in the names of locals, and when running from the command line I get roughly equal elapsed times. However, running from FSI I see a difference similar to what you've reported.
It's not clear to me why this would be. As I see it there are two possibilities:
FSI's code generation may be doing something slightly different than the static compiler
The CLR's JIT compiler may be treat code generated at runtime slightly differently from compiled code. For instance, as I mentioned my code above using MultiplyGeneric actually results in an internal method that contains the inlined body. Perhaps the CLR's JIT handles the difference between public and internal methods differently when they are generated at runtime than when they are in statically compiled code.
I'd like to see your benchmarks. I don't get the same results (VS 2012 F# 3.0 Win 7 64-bit).
let m = Array2D.init 1024 1024 (fun i j -> float i * float j)
let test f =
let sw = System.Diagnostics.Stopwatch.StartNew()
f() |> ignore
sw.Stop()
printfn "%A" sw.Elapsed
test (fun () -> MultiplyTyped m m)
> 00:00:09.6013188
test (fun () -> MultiplyGeneric m m)
> 00:00:09.1686885
Decompiling with Reflector, the functions look identical.
Regarding your last question, the least restrictive constraint is inferred. In this line
C.[i,j] <- C.[i,j] + A.[i,k] * B.[k,j]
because the result type of A.[i,k] * B.[k,j] is unspecified, and is passed immediately to (+), an extra type could be involved. If you want to tighten the constraint you can replace that line with
let temp : 'T = A.[i,k] * B.[k,j]
C.[i,j] <- C.[i,j] + temp
That will change the signature to
val inline MultiplyGeneric :
A: ^T [,] -> B: ^T [,] -> ^T [,]
when ^T : (static member ( * ) : ^T * ^T -> ^T) and
^T : (static member ( + ) : ^T * ^T -> ^T)
EDIT
Using your test, here's the output:
//MultiplyTyped
00:00:09.9904615
00:00:09.5489653
00:00:10.0562346
00:00:09.7023183
00:00:09.5123992
//MultiplyGeneric
00:00:09.1320273
00:00:08.8195283
00:00:08.8523408
00:00:09.2496603
00:00:09.2950196
Here's the same test on ideone (with a few minor changes to stay within the time limit: 512x512 matrix and one test iteration). It runs F# 2.0 and produced similar results.

Resources