Is this an storage-reuse (aliasing) bug in gcc 6.2? - gcc

From what I can tell, the following code should have 100% defined behavior under any reasonable reading of the Standard for platforms which define int64_t, and where long long has the same size and representation, regardless of whether or not long long is recognized as alias-compatible.
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
typedef long long T1;
typedef int64_t T2;
#define T1FMT "%lld"
#define T1VALUE 1
#define T2VALUE 2
T1 blah3(void *p1, void *p2)
{
T1 *t1p, *t1p2;
T2 *t2p;
T1 temp;
t1p = p1;
t2p = p2;
*t1p = T1VALUE; // Write as T1
*t2p = T2VALUE; // Write as T2
temp = *t2p; // Read as T2
t1p2 = (T1*)t2p; // Visible T2 to T1 pointer conversion
*t1p2 = temp; // Write as T1
return *t1p; // Read as T1
}
T1 test3(void)
{
void *p = malloc(sizeof (T1) + sizeof (T2));
T1 result = blah3(p,p);
free(p);
return result;
}
int main(void)
{
T1 result = test3();
printf("The result is " T1FMT "\n", result);
return 0;
}
See code at https://godbolt.org/g/75oLGx (GCC 6.2 x86-64 using -std=c99 -x c -O2)
Correct code for test3 should allocate some storage, then:
Writes a long long with value 1.
Sets the Effective Type of the storage to int64_t by writing an int64_t with value 2.
Reads the storage as int64_t (its Effective Type), which should yield 2
Sets the effective type of the storage to long long by storing the a long long with the aforementioned value (which should be 2).
Read the storage as type long long, which should yield 2.
The gcc x86-64 6.2 at the godbolt site, however, does not yield 2; instead it yields 1. I didn't find any other combination of types for which gcc behaves
like this. I think what's happening is that gcc is deciding that the store to *t1p2 can be omitted because it has no effect, but it's failing to recognize that the store did have the effect of changing the Effective Type of the storage from int64_t to long long.
While I consider questionable the decision not to recognize int64_t and long long as being alias-compatible, I see nothing in the Standard that would justify gcc's failure to recognize the reuse of the storage to hold the value 2 after it had previously held the value 1. Nothing is ever read as any type other than the one with which it was written, but I think gcc is deciding that the two pointers passed to "blah" can't alias.
Am I missing anything or is that an outright bug?

The code does not violate the strict aliasing rule, as you explain. In fact, T1 and T2 could be any types (such that the assignments are not type mismatches), there's no need for them to have the same size or anything.
I would agree that outputting 1 is a compiler bug. On the godbolt site, all versions of gcc seem to have the bug, whereas clang gives the correct output.
However, with a local installation of gcc 4.9.2 (x86_64-win32-seh-rev1), I get the correct output of 2. So it seems that the problem only exists in some builds of gcc.

Related

How to optimize SYCL kernel

I'm studying SYCL at university and I have a question about performance of a code.
In particular I have this C/C++ code:
And I need to translate it in a SYCL kernel with parallelization and I do this:
#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
using namespace sycl;
constexpr int size = 131072; // 2^17
int main(int argc, char** argv) {
//Create a vector with size elements and initialize them to 1
std::vector<float> dA(size);
try {
queue gpuQueue{ gpu_selector{} };
buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
gpuQueue.submit([&](handler& cgh) {
accessor inA{ bufA,cgh };
cgh.parallel_for(range<1>(size),
[=](id<1> i) { inA[i] = inA[i] + 2; }
);
});
gpuQueue.wait_and_throw();
}
catch (std::exception& e) { throw e; }
So my question is about c value, in this case I use directly the value two but this will impact on the performance when I'll run the code? I need to create a variable or in this way is correct and the performance are good?
Thanks in advance for the help!
Interesting question. In this case the value 2 will be a literal in the instruction in your SYCL kernel - this is as efficient as it gets, I think! There's the slight complication that you have an implicit cast from int to float. My guess is that you'll probably end up with a float literal 2.0 in your device assembly. Your SYCL device won't have to fetch that 2 from memory or cast at runtime or anything like that, it just lives in the instruction.
Equally, if you had:
constexpr int c = 2;
// the rest of your code
[=](id<1> i) { inA[i] = inA[i] + c; }
// etc
The compiler is almost certainly smart enough to propagate the constant value of c into the kernel code. So, again, the 2.0 literal ends up in the instruction.
I compiled your example with DPC++ and extracted the LLVM IR, and found the following lines:
%5 = load float, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
%add.i = fadd float %5, 2.000000e+00
store float %add.i, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
This shows a float load & store to/from the same address, with an 'add 2.0' instruction in between. If I modify to use the variable c like I demonstrated, I get the same LLVM IR.
Conclusion: you've already achieved maximum efficiency, and compilers are smart!

What is causing this error: SSE register return with SSE disabled?

I'm new to kernel development, and I need to write a Linux kernel module that performs several matrix multiplications (I'm working on an x64_64 platform). I'm trying to use fixed-point values for these operations, however during compilation, the compiler encounters this error:
error: SSE register return with SSE disabled
I don't know that much about SSE or this issue in particular, but from what i've found and according to most answers to questions about this problem, it is related to the usage of Floating-Point (FP) arithmetic in kernel space, which seems to be rarely a good idea (hence the utilization of Fixed-Point arithmetics). This error seems weird to me because I'm pretty sure I'm not using any FP values or operations, however it keeps popping up and in some ways that seem weird to me. For instance, I have this block of code:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
const int scale = 16;
#define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
#define FIXED_TO_DOUBLE(x) ((x) / (1 << scale))
#define MULT(x, y) ((((x) >> 8) * ((y) >> 8)) >> 0)
#define DIV(x, y) (((x) << 8) / (y) << 8)
#define OUTPUT_ROWS 6
#define OUTPUT_COLUMNS 2
struct matrix {
int rows;
int cols;
double *data;
};
double outputlayer_weights[OUTPUT_ROWS * OUTPUT_COLUMNS] =
{
0.7977986, -0.77172316,
-0.43078753, 0.67738613,
-1.04312621, 1.0552227 ,
-0.32619684, 0.14119884,
-0.72325027, 0.64673559,
0.58467862, -0.06229197
};
...
void matmul (struct matrix *A, struct matrix *B, struct matrix *C) {
int i, j, k, a, b, sum, fixed_prod;
if (A->cols != B->rows) {
return;
}
for (i = 0; i < A->rows; i++) {
for (j = 0; j < B->cols; j++) {
sum = 0;
for (k = 0; k < A->cols; k++) {
a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
b = DOUBLE_TO_FIXED(B->data[k * B->rows + j]);
fixed_prod = MULT(a, b);
sum += fixed_prod;
}
/* Commented the following line, causes error */
//C->data[i * C->rows + j] = sum;
}
}
}
...
static int __init insert_matmul_init (void)
{
printk(KERN_INFO "INSERTING MATMUL");
return 0;
}
static void __exit insert_matmul_exit (void)
{
printk(KERN_INFO "REMOVING MATMUL");
}
module_init (insert_matmul_init);
module_exit (insert_matmul_exit);
which compiles with no errors (I left out code that I found irrelevant to the problem). I have made sure to comment any error-prone lines to get to a point where the program can be compiled with no errors, and I am trying to solve each of them one by one. However, when uncommenting this line:
C->data[i * C->rows + j] = sum;
I get this error message in a previous (unmodified) line of code:
error: SSE register return with SSE disabled
sum += fixed_prod;
~~~~^~~~~~~~~~~~~
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error. Maybe my fixed-point implementation is flawed (I'm no expert in that matter either), or maybe I'm missing something obvious. Just in case, I have tested the same logic in a user-space program (using Floating-Point values) and it seems to work fine. In either case, any help in solving this issue would be appreciated. Thanks in advance!
Edit: I have included the definition of matrix and an example matrix. I have been using the default kbuild command for building external modules, here is what my Makefile looks like:
obj-m = matrix_mult.o
KVERSION = $(shell uname -r)
all:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Linux compiles kernel code with -mgeneral-regs-only on x86, which produces this error in functions that do anything with FP or SIMD. (Except via inline asm, because then the compiler doesn't see the FP instructions, only the assembler does.)
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error.
GCC optimizes whole functions when optimization is enabled, and you are using FP inside that function. You're doing FP multiply and truncating conversion to integer with your macro and assigning the result to an int, since the MCVE you eventually provided shows struct matrix containing double *data.
If you stop the compiler from using FP instructions (like Linux does by building with -mgeneral-regs-only), it refuses to compile your file instead of doing software floating-point.
The only odd thing is that it pins down the error to an integer += instead of one of the statements that compiles to a mulsd and cvttsd2si
If you disable optimization (-O0 -mgeneral-regs-only) you get a more obvious location for the same error (https://godbolt.org/z/Tv5nG6nd4):
<source>: In function 'void matmul(matrix*, matrix*, matrix*)':
<source>:9:33: error: SSE register return with SSE disabled
9 | #define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
| ~~~~~^~~~~~~~~~~~~~~
<source>:46:21: note: in expansion of macro 'DOUBLE_TO_FIXED'
46 | a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
| ^~~~~~~~~~~~~~~
If you really want to know what's going on with the GCC internals, you could dig into it with -fdump-tree-... options, e.g. on the Godbolt compiler explorer there's a dropdown for GCC Tree / RTL output that would let you look at the GIMPLE or RTL internal representation of your function's logic after various analyzer passes.
But if you just want to know whether there's a way to make this function work, no obviously not, unless you compile a file without -mgeneral-registers-only. All functions in a file compiled that way must only be called by callers that have used kernel_fpu_begin() before the call. (and kernel_fpu_end after).
You can't safely use kernel_fpu_begin inside a function compiled to allow it to use SSE / x87 registers; it might already have corrupted user-space FPU state before calling the function, after optimization. The symptom of getting this wrong is not a fault, it's corrupting user-space state, so don't assume that happens to work = correct. Also, depending on how GCC optimizes, the code-gen might be fine with your version, but might be broken with earlier or later GCC or clang versions. I somewhat expect that kernel_fpu_begin() at the top of this function would get called before the compiler did anything with FP instructions, but that doesn't mean it would be safe and correct.
See also Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?
Apparently -msse2 overrides -mgeneral-regs-only, so that's probably just an alias for -mno-mmx -mno-sse and whatever options disables x87. So you might be able to use __attribute__((target("sse2"))) on a function without changing build options for it, but that would be x86-specific. Of course, so is -mgeneral-regs-only. And there isn't a -mno-general-regs-only option to override the kernel's normal CFLAGS.
I don't have a specific suggestion for the best way to set up a build option if you really do think it's worth using kernel_fpu_begin at all, here (rather than using fixed-point the whole way through).
Obviously if you do save/restore the FPU state, you might as well use it for the loop instead of using FP to convert to fixed-point and back.

link functions with mismatching signature

I'm playing around with gcc and g++ compiler and trying to compile some C code within those, my purpose is to see how the compiler / linker enforces that when linking a model with some function declaration to a model with that implementation of that function, the correct function are linked ( in terms of parameters passed and values returned )
for example let's take a look at this code
#include <stdio.h>
extern int foo(int b, int c);
int main()
{
int f = foo(5, 8);
printf("%d",f);
}
after compilation within my symbol table I'd have a symbol for foo, but within the elf file format there is not place that describes the arguments taken and the function signature, ( int(int,int) ), so basically if I write some other code such as this:
char foo(int a, int b, int c)
{
return (char) ( a + b + c );
}
compile that model it'll also have some symbol called foo, what if I link these models together, what's gonna happen? I have never thought of this, and how would a compiler overcome this weakness... I know that within g++ the compiler generates some prefix for every symbol regarding to it's namespace, but does it also take in mind the signature? If anyone has ever encountered this it would be great if he could shed some light upon this problem
The problem is solved with name mangling.
In compiler construction, name mangling (also called name decoration)
is a technique used to solve various problems caused by the need to
resolve unique names for programming entities in many modern
programming languages.
It provides a way of encoding additional information in the name of a
function, structure, class or another datatype in order to pass more
semantic information from the compilers to linkers.
The need arises where the language allows different entities to be
named with the same identifier as long as they occupy a different
namespace (where a namespace is typically defined by a module, class,
or explicit namespace directive) or have different signatures (such as
function overloading).
Note the simple example:
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart
from the name. If they were natively translated into C with no
changes, the result would be an error — C does not permit two
functions with the same name. The C++ compiler therefore will encode
the type information in the symbol name, the result being something
resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name
mangling applies to all symbols.
Wow, I've kept exploring and testing it on my own and I came up with a solution which quietly amazed my mind,
so I wrote the following code and compiled it on a gcc compiler
main.c
#include <stdio.h>
extern int foo(int a, char b);
int main()
{
int g = foo(5, 6);
printf("%d", g);
return 0;
}
foo.c
typedef struct{
int a;
int b;
char c;
char d;
} mystruct;
mystruct foo(int a, int b)
{
mystruct myl;
my.a = a;
my.b = a + 1;
my.c = (char) b;
my.d = (char b + 1;
return my1;
}
now I compiled foo.c to foo.o with gcc firstly and checked the symbol table using
readelf and I had some entry called foo
also after that I compiled main.c to main.o checked the symbol table and it also had some entry called foo, I linked those two together and surprisingly it worked, I ran main.o and obviously encountered some segmentation fault, which makes sense as the actual implementation of foo as implemented in foo.o probably expects three parameters (first one should be struct adders), a parameter which isn't passed in main.o under it's definition to foo then the actual implementation accesses some memory that doesn't belong to it from the stack frame of main, then tries accessing addresses that it thought it got, and ends up with segmentation fault, that's fine,
now I compiled both models again with g++ and not gcc and what came up was amazing.. I found out that the symbol entry under foo.o was _Z3fooii and under main.o it was _Z3fooic, now my guess is that the ii suffix means int int and ic suffix means int char which probably refers to the parameters that should be passed to function hence allowing the compiler to know some function deceleration gets the actual implementation. so I changed my foo declaration in main.c to
extern int foo(int a, int b);
re-compiled and this time got the symbol _Z3fooii, I linked both models again and amazingly this time it worked, I tried running it and again encountered segmentation fault, which again also makes sense as the compiler wont always even authorize correct return values.. anyways what was my original thought - that g++ includes function signature within symbol name and thus enforces the linker to give function implementation get correct parameters to correct function declaration

How to use arrays in program (global) scope in OpenCL

AMD OpenCL Programming Guide, Section 6.3 Constant Memory Optimization:
Globally scoped constant arrays. These arrays are initialized,
globally scoped, and in the constant address space (as specified in
section 6.5.3 of the OpenCL specification). If the size of an array is
below 64 kB, it is placed in hardware constant buffers; otherwise, it
uses global memory. An example of this is a lookup table for math
functions.
I want to use this "globally scoped constant array". I have such code in pure C
#define SIZE 101
int *reciprocal_table;
int reciprocal(int number){
return reciprocal_table[number];
}
void kernel(int *output)
{
for(int i=0; i < SIZE; i+)
output[i] = reciprocal(i);
}
I want to port it into OpenCL
__kernel void kernel(__global int *output){
int gid = get_global_id(0);
output[gid] = reciprocal(gid);
}
int reciprocal(int number){
return reciprocal_table[number];
}
What should I do with global variable reciprocal_table? If I try to add __global or __constant to it I get an error:
global variable must be declared in addrSpace constant
I don't want to pass __constant int *reciprocal_table from kernel to reciprocal. Is it possible to initialize global variable somehow? I know that I can write it down into code, but does other way exist?
P.S. I'm using AMD OpenCL
UPD Above code is just an example. I have real much more complex code with a lot of functions. So I want to make array in program scope to use it in all functions.
UPD2 Changed example code and added citation from Programming Guide
#define SIZE 2
int constant array[SIZE] = {0, 1};
kernel void
foo (global int* input,
global int* output)
{
const uint id = get_global_id (0);
output[id] = input[id] + array[id];
}
I can get the above to compile with Intel as well as AMD. It also works without the initialization of the array but then you would not know what's in the array and since it's in the constant address space, you could not assign any values.
Program global variables have to be in the __constant address space, as stated by section 6.5.3 in the standard.
UPDATE Now, that I fully understood the question:
One thing that worked for me is to define the array in the constant space and then overwrite it by passing a kernel parameter constant int* array which overwrites the array.
That produced correct results only on the GPU Device. The AMD CPU Device and the Intel CPU Device did not overwrite the arrays address. It also is probably not compliant to the standard.
Here's how it looks:
#define SIZE 2
int constant foo[SIZE] = {100, 100};
int
baz (int i)
{
return foo[i];
}
kernel void
bar (global int* input,
global int* output,
constant int* foo)
{
const uint id = get_global_id (0);
output[id] = input[id] + baz (id);
}
For input = {2, 3} and foo = {0, 1} this produces {2, 4} on my HD 7850 Device (Ubuntu 12.10, Catalyst 9.0.2). But on the CPU I get {102, 103} with either OCL Implementation (AMD, Intel). So I can not stress, how much I personally would NOT do this, because it's only a matter of time, before this breaks.
Another way to achieve this is would be to compute .h files with the host during runtime with the definition of the array (or predefine them) and pass them to the kernel upon compilation via a compiler option. This, of course, requires recompilation of the clProgram/clKernel for every different LUT.
I struggled to get this work in my own program some time ago.
I did not find any way to initialize a constant or global scope array from the host via some clEnqueueWriteBuffer or so. The only way is to write it explicitely in your .cl source file.
So here my trick to initialize it from the host is to use the fact that you are actually compiling your source from the host, which also means you can alter your src.cl file before compiling it.
First my src.cl file reads:
__constant double lookup[SIZE] = { LOOKUP }; // precomputed table (in constant memory).
double func(int idx) {
return(lookup[idx])
}
__kernel void ker1(__global double *in, __global double *out)
{
... do something ...
double t = func(i)
...
}
notice the lookup table is initialized with LOOKUP.
Then, in the host program, before compiling your OpenCL code:
compute the values of my lookup table in host_values[]
on your host, run something like:
char *buf = (char*) malloc( 10000 );
int count = sprintf(buf, "#define LOOKUP "); // actual source generation !
for (int i=0;i<SIZE;i++) count += sprintf(buf+count, "%g, ",host_values[i]);
count += sprintf(buf+count,"\n");
then read the content of your source file src.cl and place it right at buf+count.
you now have a source file with an explicitely defined lookup table that you just computed from the host.
compile your buffer with something like clCreateProgramWithSource(context, 1, (const char **) &buf, &src_sz, err);
voilà !
It looks like "array" is a look-up table of sorts. You'll need to clCreateBuffer and clEnqueueWriteBuffer so the GPU has a copy of it to use.

Construct a 'long long'

How do you construct a long long in gcc, similar to constructing an int via int()
The following fails in gcc (4.6.3 20120306) (but passes on MSVC for example).
myFunctionCall(someValue, long long());
with error expected primary-expression before 'long' (the column position indicates the first long is the location).
A simple change
myFunctionCall(someValue, (long long)int());
works fine - that is construct an int and cast to long long - indicating that gcc doesn't like the long long ctor.
Summary Solution
To summarize the brilliant explanation below from #birryree:
many compilers don't support long long() and it may not be standards compliant
constructing long long is equivalent to the literal 0LL, so use myFunctionCall(someValue, 0LL)
alternatively use a typedef long_long_t long long then long_long_t()
lastly, consider using uint64_t if you are after a type that is exactly 64 bits on any platform, rather than a type that is at least 64 bits, but may vary on different platforms.
I wanted a definitive answer on what the expected behavior was, so I posted a question on comp.lang.c++.moderated and got some great answers in return. So a thank you goes out to Johannes Schaub, Alf P. Steinbach (both from SO), and Francis Glassborrow for some information
This is not a bug in GCC - in fact it will break across multiple compilers - GCC 4.6, GCC 4.7, and Clang complain about similar errors like primary expression expected before '(' if you try this syntax:
long long x = long long();
Some primitives have spaces, and that is not allowed if you want to use the constructor-style initialization because of binding (long() is bound, but long long() has a free long). Types with spaces in them (like long long) can not use the type()-construction form.
MSVC is more permissive here, though technically non-standard compliant (and it's not a language extension that you can disable).
Solutions/Workarounds
There are alternatives for what you want to do:
Use 0LL as your value in place of attempting long long() - they would produce the same value.
This is how most code will be written too, so it will be most understandable to anyone else reading your code.
From your comments it seems like you really want long long, so you can typedef yourself to always guarantee you have a long long type, like this:
int main() {
typedef long long MyLongLong;
long long x = MyLongLong(); // or MyLongLong x = MyLongLong();
}
Use a template to get around needing explicit naming:
template<typename TypeT>
struct Type { typedef TypeT T(); };
// call it like this:
long long ll = Type<long long>::T();
As I mentioned in my comments, you can use an aliased type, like int64_t (from <cstdint>), which across common platforms is a typedef long long int64_t. This is a more platform dependent than the previous items in this list.
int64_t is a fixed-width type that is 64-bits, which is typically how wide long long is on platforms like linux-x86 and windows-x86. long long is at least 64-bit wide, but can be longer. If your code will only run on certain platforms, or if you really need a fixed-width type, this might be a viable choice.
C++11 Solutions
Thanks to the C++ newsgroup, I learned some additional ways of doing what you want to do, but unfortunately they're only in the realm of C++11 (and MSVC10 doesn't support either, and only very new compilers either way would):
The {} way:
long long ll{}; // does the zero initialization
Using what Johannes refers to as the 'bord tools' in C++11 with std::common_type<T>
#include <type_traits>
int main() {
long long ll = std::common_type<long long>::type();
}
So is there a real difference between () and initializing with 0 for POD types?
You say this in a comment:
I don't think default ctor returns zero always - more typical behaviour is to leave memory untouched.
Well, for primitive types, that is not true at all.
From Section 8.5 of the ISO C++ Standard/2003 (don't have 2011, sorry, but this information didn't change too much):
To default-initialize an object of type T means:
— if T is a non-POD class type (clause 9), the default constructor for T is called (and the initialization is ill-formed if T has no accessible default
constructor);
— if T is an array type, each element is
default-initialized;
— otherwise, the object is zero-initialized.
The last clause is most important here because long long, unsigned long, int, float, etc. are all scalar/POD types, and so calling things like this:
int x = int();
Is exactly the same as doing this:
int x = 0;
Generated code example
Here is a more concrete example of what actually happens in code:
#include <iostream>
template<typename T>
void create_and_print() {
T y = T();
std::cout << y << std::endl;
}
int main() {
create_and_print<unsigned long long>();
typedef long long mll;
long long y = mll();
long long z = 0LL;
int mi = int();
}
Compile this with:
g++ -fdump-tree-original construction.cxx
And I get this in the generated tree dump:
;; Function int main() (null)
;; enabled by -tree-original
{
typedef mll mll;
long long int y = 0;
long long int z = 0;
int mi = 0;
<<cleanup_point <<< Unknown tree: expr_stmt
create_and_print<long long unsigned int> () >>>>>;
<<cleanup_point long long int y = 0;>>;
<<cleanup_point long long int z = 0;>>;
<<cleanup_point int mi = 0;>>;
}
return <retval> = 0;
;; Function void create_and_print() [with T = long long unsigned int] (null)
;; enabled by -tree-original
{
long long unsigned int y = 0;
<<cleanup_point long long unsigned int y = 0;>>;
<<cleanup_point <<< Unknown tree: expr_stmt
(void) std::basic_ostream<char>::operator<< ((struct __ostream_type *) std::basic_ostream<char>::operator<< (&cout, y), endl) >>>>>;
}
Generated Code Implications
So from the code tree generated above, notice that all my variables are just being initialized with 0, even if I use constructor-style default initialization, like with int mi = int(). GCC will generate code that just does int mi = 0.
My template function that just attempts to do default construction of some passed in typename T, where T = unsigned long long, also produced just a 0-initialization code.
Conclusion
So in conclusion, if you want to default construct primitive types/PODs, it's like using 0.

Resources