Different compiler behavior with C++11 - c++11

The following code
#include <vector>
#include <complex>
#include <algorithm>
template<class K>
inline void conjVec(int m, K* const in) {
static_assert(std::is_same<K, double>::value || std::is_same<K, std::complex<double>>::value, "");
if(!std::is_same<typename std::remove_pointer<K>::type, double>::value)
#ifndef OK
std::for_each(in, in + m, [](K& z) { z = std::conj(z); });
#else
std::for_each(reinterpret_cast<std::complex<double>*>(in), reinterpret_cast<std::complex<double>*>(in) + m, [](std::complex<double>& z) { z = std::conj(z); });
#endif
}
int main(int argc, char* argv[]) {
std::vector<double> nums;
nums.emplace_back(1.0);
conjVec(nums.size(), nums.data());
return 0;
}
compiles fine on Linux with
Debian clang version 3.5.0-9
gcc version 4.9.1
icpc version 15.0.1
and on Mac OS X with
gcc version 4.9.2
but not with
clang-600.0.56
icpc version 15.0.1
except if the macro OK is defined. I don't know which are the faulty compilers, could someone let me know ? Thanks.
PS: here is the error
10:48: error: assigning to 'double' from incompatible type 'complex<double>'
std::for_each(in, in + m, [](K& z) { z = std::conj(z); });

The difference is that on Linux, you're using libstdc++ and glibc, and on MacOS you're using libc++ and whatever CRT MacOS uses.
The MacOS version is correct. (Also, your workaround is completely broken and insanely dangerous.)
Here's what I think happens.
There are multiple overloads of conj in the environment. C++98 brings in a single template, which takes a std::complex<F> and returns the same type. Because this template needs F to be deduced, it doesn't work when calling conj with a simple floating point number, so C++11 added overloads of conj which take float, double and long double, and return the appropriate std::complex instantiation.
Then there's a global function from the C99 library, ::conj, which takes a C99 double complex and returns the same.
libstdc++ doesn't yet provide the new C++11 conj overloads, as far as I can see. The C++ version of conj isn't called. It appears, however, that somehow ::conj found its way into the std namespace, and gets called. The double you pass is implicitly converted to a double complex by adding a zero imaginary part. conj negates that zero. The result double complex is implicitly converted back to a double by discarding the imaginary component. (Yes, that's an implicit conversion in C99. No, I don't know what they were thinking.) The result can be assigned to z.
libc++ provides the new overloads. The one taking a double is chosen. It returns a std::complex<double>. This class has no implicit conversion to double, so the assignment to z gives you an error.
The bottom line is this: your code makes absolutely no sense. A vector<double> isn't a vector<complex<double>> and shouldn't be treated as one. Calling conj on double doesn't make sense. Either it doesn't compile, or it's a no-op. (libc++'s conj(double) is in fact implemented by simply constructing a complex<double> with a zero imaginary part.) And wildly reinterpret_casting your way around compile errors is horrible.

Sebastian Redl's answer explains why your code didn't compile with libc++ but did with libstdc++. if is not the static if that exists in some languages; even if the code in an if branch is 100% dead, it must still be valid code.
In any event, this feels like a massive amount of unnecessary complexity to me. Not everything has to be a template. Especially when your template can only be used with two types, and when used with one of those two it's a no-op.
Compare:
template<class K>
inline void conjVec(int m, K* const in) {
static_assert(std::is_same<K, double>::value || std::is_same<K, std::complex<double>>::value, "");
if(!std::is_same<K, double>::value)
std::for_each(reinterpret_cast<std::complex<double>*>(in), reinterpret_cast<std::complex<double>*>(in) + m, [](std::complex<double>& z) { z = std::conj(z); });
}
with:
inline void conjVec(int m, double* const in) {}
inline void conjVec(int m, std::complex<double>* const in) {
std::for_each(in, in + m, [](std::complex<double>& z) { z = std::conj(z); });
}
I know which one I would prefer.

Related

What is causing this error: SSE register return with SSE disabled?

I'm new to kernel development, and I need to write a Linux kernel module that performs several matrix multiplications (I'm working on an x64_64 platform). I'm trying to use fixed-point values for these operations, however during compilation, the compiler encounters this error:
error: SSE register return with SSE disabled
I don't know that much about SSE or this issue in particular, but from what i've found and according to most answers to questions about this problem, it is related to the usage of Floating-Point (FP) arithmetic in kernel space, which seems to be rarely a good idea (hence the utilization of Fixed-Point arithmetics). This error seems weird to me because I'm pretty sure I'm not using any FP values or operations, however it keeps popping up and in some ways that seem weird to me. For instance, I have this block of code:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
const int scale = 16;
#define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
#define FIXED_TO_DOUBLE(x) ((x) / (1 << scale))
#define MULT(x, y) ((((x) >> 8) * ((y) >> 8)) >> 0)
#define DIV(x, y) (((x) << 8) / (y) << 8)
#define OUTPUT_ROWS 6
#define OUTPUT_COLUMNS 2
struct matrix {
int rows;
int cols;
double *data;
};
double outputlayer_weights[OUTPUT_ROWS * OUTPUT_COLUMNS] =
{
0.7977986, -0.77172316,
-0.43078753, 0.67738613,
-1.04312621, 1.0552227 ,
-0.32619684, 0.14119884,
-0.72325027, 0.64673559,
0.58467862, -0.06229197
};
...
void matmul (struct matrix *A, struct matrix *B, struct matrix *C) {
int i, j, k, a, b, sum, fixed_prod;
if (A->cols != B->rows) {
return;
}
for (i = 0; i < A->rows; i++) {
for (j = 0; j < B->cols; j++) {
sum = 0;
for (k = 0; k < A->cols; k++) {
a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
b = DOUBLE_TO_FIXED(B->data[k * B->rows + j]);
fixed_prod = MULT(a, b);
sum += fixed_prod;
}
/* Commented the following line, causes error */
//C->data[i * C->rows + j] = sum;
}
}
}
...
static int __init insert_matmul_init (void)
{
printk(KERN_INFO "INSERTING MATMUL");
return 0;
}
static void __exit insert_matmul_exit (void)
{
printk(KERN_INFO "REMOVING MATMUL");
}
module_init (insert_matmul_init);
module_exit (insert_matmul_exit);
which compiles with no errors (I left out code that I found irrelevant to the problem). I have made sure to comment any error-prone lines to get to a point where the program can be compiled with no errors, and I am trying to solve each of them one by one. However, when uncommenting this line:
C->data[i * C->rows + j] = sum;
I get this error message in a previous (unmodified) line of code:
error: SSE register return with SSE disabled
sum += fixed_prod;
~~~~^~~~~~~~~~~~~
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error. Maybe my fixed-point implementation is flawed (I'm no expert in that matter either), or maybe I'm missing something obvious. Just in case, I have tested the same logic in a user-space program (using Floating-Point values) and it seems to work fine. In either case, any help in solving this issue would be appreciated. Thanks in advance!
Edit: I have included the definition of matrix and an example matrix. I have been using the default kbuild command for building external modules, here is what my Makefile looks like:
obj-m = matrix_mult.o
KVERSION = $(shell uname -r)
all:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Linux compiles kernel code with -mgeneral-regs-only on x86, which produces this error in functions that do anything with FP or SIMD. (Except via inline asm, because then the compiler doesn't see the FP instructions, only the assembler does.)
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error.
GCC optimizes whole functions when optimization is enabled, and you are using FP inside that function. You're doing FP multiply and truncating conversion to integer with your macro and assigning the result to an int, since the MCVE you eventually provided shows struct matrix containing double *data.
If you stop the compiler from using FP instructions (like Linux does by building with -mgeneral-regs-only), it refuses to compile your file instead of doing software floating-point.
The only odd thing is that it pins down the error to an integer += instead of one of the statements that compiles to a mulsd and cvttsd2si
If you disable optimization (-O0 -mgeneral-regs-only) you get a more obvious location for the same error (https://godbolt.org/z/Tv5nG6nd4):
<source>: In function 'void matmul(matrix*, matrix*, matrix*)':
<source>:9:33: error: SSE register return with SSE disabled
9 | #define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
| ~~~~~^~~~~~~~~~~~~~~
<source>:46:21: note: in expansion of macro 'DOUBLE_TO_FIXED'
46 | a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
| ^~~~~~~~~~~~~~~
If you really want to know what's going on with the GCC internals, you could dig into it with -fdump-tree-... options, e.g. on the Godbolt compiler explorer there's a dropdown for GCC Tree / RTL output that would let you look at the GIMPLE or RTL internal representation of your function's logic after various analyzer passes.
But if you just want to know whether there's a way to make this function work, no obviously not, unless you compile a file without -mgeneral-registers-only. All functions in a file compiled that way must only be called by callers that have used kernel_fpu_begin() before the call. (and kernel_fpu_end after).
You can't safely use kernel_fpu_begin inside a function compiled to allow it to use SSE / x87 registers; it might already have corrupted user-space FPU state before calling the function, after optimization. The symptom of getting this wrong is not a fault, it's corrupting user-space state, so don't assume that happens to work = correct. Also, depending on how GCC optimizes, the code-gen might be fine with your version, but might be broken with earlier or later GCC or clang versions. I somewhat expect that kernel_fpu_begin() at the top of this function would get called before the compiler did anything with FP instructions, but that doesn't mean it would be safe and correct.
See also Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?
Apparently -msse2 overrides -mgeneral-regs-only, so that's probably just an alias for -mno-mmx -mno-sse and whatever options disables x87. So you might be able to use __attribute__((target("sse2"))) on a function without changing build options for it, but that would be x86-specific. Of course, so is -mgeneral-regs-only. And there isn't a -mno-general-regs-only option to override the kernel's normal CFLAGS.
I don't have a specific suggestion for the best way to set up a build option if you really do think it's worth using kernel_fpu_begin at all, here (rather than using fixed-point the whole way through).
Obviously if you do save/restore the FPU state, you might as well use it for the loop instead of using FP to convert to fixed-point and back.

complex and double complex methods not working in visual studio c project

The below code is throwing error in Visual Studio C project. That same code is working in Linux with GCC compiler. Please let me know any solution to execute properly in Windows.
#include<stdio.h>
#include <complex.h>
#include <math.h>
typedef struct {
float r, i;
} complex_;
double c_abs(complex_* z)
{
return (cabs(z->r + I * z->i));
}
int main()
{
complex_ number1 = { 3.0, 4.0 };
double d = c_abs(&number1);
printf("The absolute value of %f + %fi is %f\n", number1.r, number1.i, d);
return 0;
}
The error I am getting was
C2088: '*': illegal for struct
Here what t I observed the I macro not working properly...
So is there any other way we can handle in Windows?
error C2088: '*': illegal for struct
This is the error MSVC returns when compiling the code (as C) for this line.
return (cabs(z->r + I * z->i));
The MSVC C compiler does not have a native complex type, and (quoting the docs) "therefore the Microsoft implementation uses structure types to represent complex numbers".
The imaginary unit I a.k.a. _Complex_I is defined as an _Fcomplex structure in <complex.h>, which explains compile error C2088, since there is no operator * to multiply that structure with a float value.
Even if the multiplication worked out by some magic, the end result would be a float value being passed into the cabs call, but MSVC declares cabs as double cabs(_Dcomplex z); and there is no automatic conversion from float to _Dcomplex so the call would still fail to compile.
What would work with MSVC, however, is replace that line with the following, which constructs a _Dcomplex on the fly from the float real and imaginary parts.
return cabs(_Dcomplex{z->r, z->i});

link functions with mismatching signature

I'm playing around with gcc and g++ compiler and trying to compile some C code within those, my purpose is to see how the compiler / linker enforces that when linking a model with some function declaration to a model with that implementation of that function, the correct function are linked ( in terms of parameters passed and values returned )
for example let's take a look at this code
#include <stdio.h>
extern int foo(int b, int c);
int main()
{
int f = foo(5, 8);
printf("%d",f);
}
after compilation within my symbol table I'd have a symbol for foo, but within the elf file format there is not place that describes the arguments taken and the function signature, ( int(int,int) ), so basically if I write some other code such as this:
char foo(int a, int b, int c)
{
return (char) ( a + b + c );
}
compile that model it'll also have some symbol called foo, what if I link these models together, what's gonna happen? I have never thought of this, and how would a compiler overcome this weakness... I know that within g++ the compiler generates some prefix for every symbol regarding to it's namespace, but does it also take in mind the signature? If anyone has ever encountered this it would be great if he could shed some light upon this problem
The problem is solved with name mangling.
In compiler construction, name mangling (also called name decoration)
is a technique used to solve various problems caused by the need to
resolve unique names for programming entities in many modern
programming languages.
It provides a way of encoding additional information in the name of a
function, structure, class or another datatype in order to pass more
semantic information from the compilers to linkers.
The need arises where the language allows different entities to be
named with the same identifier as long as they occupy a different
namespace (where a namespace is typically defined by a module, class,
or explicit namespace directive) or have different signatures (such as
function overloading).
Note the simple example:
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart
from the name. If they were natively translated into C with no
changes, the result would be an error — C does not permit two
functions with the same name. The C++ compiler therefore will encode
the type information in the symbol name, the result being something
resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name
mangling applies to all symbols.
Wow, I've kept exploring and testing it on my own and I came up with a solution which quietly amazed my mind,
so I wrote the following code and compiled it on a gcc compiler
main.c
#include <stdio.h>
extern int foo(int a, char b);
int main()
{
int g = foo(5, 6);
printf("%d", g);
return 0;
}
foo.c
typedef struct{
int a;
int b;
char c;
char d;
} mystruct;
mystruct foo(int a, int b)
{
mystruct myl;
my.a = a;
my.b = a + 1;
my.c = (char) b;
my.d = (char b + 1;
return my1;
}
now I compiled foo.c to foo.o with gcc firstly and checked the symbol table using
readelf and I had some entry called foo
also after that I compiled main.c to main.o checked the symbol table and it also had some entry called foo, I linked those two together and surprisingly it worked, I ran main.o and obviously encountered some segmentation fault, which makes sense as the actual implementation of foo as implemented in foo.o probably expects three parameters (first one should be struct adders), a parameter which isn't passed in main.o under it's definition to foo then the actual implementation accesses some memory that doesn't belong to it from the stack frame of main, then tries accessing addresses that it thought it got, and ends up with segmentation fault, that's fine,
now I compiled both models again with g++ and not gcc and what came up was amazing.. I found out that the symbol entry under foo.o was _Z3fooii and under main.o it was _Z3fooic, now my guess is that the ii suffix means int int and ic suffix means int char which probably refers to the parameters that should be passed to function hence allowing the compiler to know some function deceleration gets the actual implementation. so I changed my foo declaration in main.c to
extern int foo(int a, int b);
re-compiled and this time got the symbol _Z3fooii, I linked both models again and amazingly this time it worked, I tried running it and again encountered segmentation fault, which again also makes sense as the compiler wont always even authorize correct return values.. anyways what was my original thought - that g++ includes function signature within symbol name and thus enforces the linker to give function implementation get correct parameters to correct function declaration

C++ shared library symbols versioning

I'm trying to create library with two versions of the same function using
__asm__(".symver ......
approach
library.h
#ifndef CTEST_H
#define CTEST_H
int first(int x);
int second(int x);
#endif
library.cpp
#include "simple.h"
#include <stdio.h>
__asm__(".symver first_1_0,first#LIBSIMPLE_1.0");
int first_1_0(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 1;
}
__asm__(".symver first_2_0,first##LIBSIMPLE_2.0");
int first_2_0(int x)
{
int y;
printf("lib: %d\n", y);
printf("lib: %s\n", __FUNCTION__);
return (x + 1) * 1000;
}
int second(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 2;
}
And here is the version scripf file
LIBSIMPLE_1.0{
global:
first; second;
local:
*;
};
LIBSIMPLE_2.0{
global:
first;
local:
*;
};
When build library using gcc, everything works well, and i am able to link to a library binary. Using nm tool i see that both first() and second() function symbols are exported.
Now, when i try to use g++, non of the symbols are exported.
So i tried to use extern "C" directive to wrap both declarations
extern "C" {
int first(int x);
int second(int x);
}
nm shows that second() function symbol is exported, but first() still remain unexported, and mangled.
What is here i am missing to make this to work? Or it is impossible with the c++ compiler to achieve this?
I don't know why, with 'extern "C"', 'first' was not exported - suspect there is something else interfering.
Otherwise C++ name mangling is certainly a pain here. The 'asm' directives (AFAIK) require the mangled names for C++ functions, not the simple 'C' name. So 'int first(int)' would need to be referenced as (e.g.) '_Z5firsti' instead of just 'first'. This is, of course, a real pain as far as portability goes...
The linker map file is more forgiving as its supported 'extern "C++" {...}' blocks to list C++ symbols in their as-written form - 'int first(int)'.
This whole process is a maintainance nightmare. What I'd really like would be a function attribute which could be used to specify the alias and version...
Just to add a reminder that C++11 now supports inline namespaces which can be used to provide symbol versioning in C++.

Overloading conflict with vector types __m128, __m256 in GCC

I've started playing around with AVX instructions on the new Intel's Sandy Bridge processor. I'm using GCC 4.5.2, TDM-GCC 64bit build of MinGW64.
I want to overload operator<< for ostream to be able to print out the vector types __m256, __m128 etc to the console. But I'm running into an overloading conflict. The 2nd function in the following code produces an error "conflicts with previous declaration void f(__vector(8) float)":
void f(__m128 v) {
cout << 4;
}
void f(__m256 v) {
cout << 8;
}
It seems that the compiler cannot distinguish between the two types and consideres them both f(float __vector).
Is there a way around this? I haven't been able to find anything online. Any help is greatly appreciated.
I accidentally stumbled upon the answer when having a similar problem with function templates. In this case, the GCC error message actually suggested a solution:
add -fabi-version=4 compiler option.
This solves my problem, and hopefully doesn't cause any issues when linking the standard libraries.
One can read more about ABI (Application Binary Interface) and GCC at ABI Policy and Guidelines and ABI specification. ABI specifies how the functions names are mangled when the code is compiled into object files. Apparently, ABI version 3 used by GCC by default cannot distinguish between the various vector types.
I was unsatisfied with the solution of changing compiler ABI flags to solve this, so I went looking for a different solution. It seems they encountered this issue in writing the Eigen library - see this source file for details http://eigen.tuxfamily.org/dox-devel/SSE_2PacketMath_8h_source.html
My solution to this is a slightly tweaked version of theirs:
template <typename T, unsigned RegisterSize>
struct Register
{
using ValueType = T;
enum { Size = RegisterSize };
inline operator T&() { return myValue; }
inline operator const T&() const { return myValue; }
inline Register() {}
inline Register(const T & v) : myValue(v) {} // Not explicit
inline Register & operator=(const T & v)
{
myValue = v;
return *this;
}
T myValue;
};
using Register4 = Register<__m128, 4u>;
using Register8 = Register<__m256, 8u>;
// Could provide more declarations for __m128d, __m128i, etc. if needed
Using the above, you can overload on Register4, Register8, etc. or produce template functions taking Registers without running into linking issues and without changing ABI settings.

Resources