How to understand IN DEPTH gcc compiler? - gcc

Background of this question : I am trying to understand how compilers work. I learn many new things : scanner, parser, AST, IR, optimisation, frontend, backend,LL(1), ... I made gradual progress and it is very interesting. Now, I would like to do some practical works.
From a programmer point of view, I know why typedef struct { int x; mytype* next; } mytype; does not compile and I know the correct syntax typedef struct mystruct { int x; struct mystruct* next; } mytype; but I would like to know where the problem happens EXACTLY during compilation. I am using gcc, I would like to know how is it posible to use gcc developper options -fdump-... to answer this question.

The first step of the GCC compiler work is parser
c-parser.c
It parse your c or c++ or some else code into gimple representation:
Parse -> Gimplify -> Tree -> SSA -> Optimize -> Generate -> RTL -> Optimize RTL Generate -> ASM
Errors can be found, for example, in terminal, or in IDE in error output like next:
gcc yourcode.c
yourcode.c:2:25: error: unknown type name 'mytype'
typedef struct { int x; mytype* next; } mytype;
^~~~~~
You also can look at how it works via a
link
Sorry for my English.

Related

link functions with mismatching signature

I'm playing around with gcc and g++ compiler and trying to compile some C code within those, my purpose is to see how the compiler / linker enforces that when linking a model with some function declaration to a model with that implementation of that function, the correct function are linked ( in terms of parameters passed and values returned )
for example let's take a look at this code
#include <stdio.h>
extern int foo(int b, int c);
int main()
{
int f = foo(5, 8);
printf("%d",f);
}
after compilation within my symbol table I'd have a symbol for foo, but within the elf file format there is not place that describes the arguments taken and the function signature, ( int(int,int) ), so basically if I write some other code such as this:
char foo(int a, int b, int c)
{
return (char) ( a + b + c );
}
compile that model it'll also have some symbol called foo, what if I link these models together, what's gonna happen? I have never thought of this, and how would a compiler overcome this weakness... I know that within g++ the compiler generates some prefix for every symbol regarding to it's namespace, but does it also take in mind the signature? If anyone has ever encountered this it would be great if he could shed some light upon this problem
The problem is solved with name mangling.
In compiler construction, name mangling (also called name decoration)
is a technique used to solve various problems caused by the need to
resolve unique names for programming entities in many modern
programming languages.
It provides a way of encoding additional information in the name of a
function, structure, class or another datatype in order to pass more
semantic information from the compilers to linkers.
The need arises where the language allows different entities to be
named with the same identifier as long as they occupy a different
namespace (where a namespace is typically defined by a module, class,
or explicit namespace directive) or have different signatures (such as
function overloading).
Note the simple example:
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart
from the name. If they were natively translated into C with no
changes, the result would be an error — C does not permit two
functions with the same name. The C++ compiler therefore will encode
the type information in the symbol name, the result being something
resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name
mangling applies to all symbols.
Wow, I've kept exploring and testing it on my own and I came up with a solution which quietly amazed my mind,
so I wrote the following code and compiled it on a gcc compiler
main.c
#include <stdio.h>
extern int foo(int a, char b);
int main()
{
int g = foo(5, 6);
printf("%d", g);
return 0;
}
foo.c
typedef struct{
int a;
int b;
char c;
char d;
} mystruct;
mystruct foo(int a, int b)
{
mystruct myl;
my.a = a;
my.b = a + 1;
my.c = (char) b;
my.d = (char b + 1;
return my1;
}
now I compiled foo.c to foo.o with gcc firstly and checked the symbol table using
readelf and I had some entry called foo
also after that I compiled main.c to main.o checked the symbol table and it also had some entry called foo, I linked those two together and surprisingly it worked, I ran main.o and obviously encountered some segmentation fault, which makes sense as the actual implementation of foo as implemented in foo.o probably expects three parameters (first one should be struct adders), a parameter which isn't passed in main.o under it's definition to foo then the actual implementation accesses some memory that doesn't belong to it from the stack frame of main, then tries accessing addresses that it thought it got, and ends up with segmentation fault, that's fine,
now I compiled both models again with g++ and not gcc and what came up was amazing.. I found out that the symbol entry under foo.o was _Z3fooii and under main.o it was _Z3fooic, now my guess is that the ii suffix means int int and ic suffix means int char which probably refers to the parameters that should be passed to function hence allowing the compiler to know some function deceleration gets the actual implementation. so I changed my foo declaration in main.c to
extern int foo(int a, int b);
re-compiled and this time got the symbol _Z3fooii, I linked both models again and amazingly this time it worked, I tried running it and again encountered segmentation fault, which again also makes sense as the compiler wont always even authorize correct return values.. anyways what was my original thought - that g++ includes function signature within symbol name and thus enforces the linker to give function implementation get correct parameters to correct function declaration

Create std::pair or std::map with std::unique_ptr as value implicitely

This code works in Visual Studio:
typedef struct {
int a;
} data_t;
using datap_t = std::unique_ptr<data_t>;
using MyPair = std::pair<std::string, datap_t>;
int main() {
data_t * pd1 = new data_t();
MyPair p("tst", pd1); // This does not compile in gcc or clang
// MyPair p2("tst", datap_t(pd1)); // This compiles
return 0;
}
But clang and gcc give error:
error: no matching function for call to 'std::pair<const std::basic_string<char>, std::unique_ptr<data_t> >::pair(const char [3], data_t*&)
Here is the ideone to try.
The fact that I can call datap_t(pd1) and it compiles means the constructor is valid, so why is that template does not find a suitable match?
I was looking to add a key-value pair to a map using emplace and that is why I wanted to have this implicit conversion in the first place. Note that like Visual Studio the implicit conversion works for most other types, such as std::string from "raw string".
This answer looks relevant, but it talks about a fixed bug and is very old.
The std::unique_ptr constructor that take a single raw pointer as input is marked as explicit to prevent implicit conversions.
pd1 is a raw pointer. MyPair p("tst", pd1); involves an implicit conversion to std::unique_ptr, which is why the compile fails in Clang and GCC, as it should be. You have to use an explicit conversion instead:
MyPair p("tst", datap_t(pd1));
A better option is to not use the raw pointer at all:
MyPair p("tst", std::make_unique<data_t>());
Clang and GCC are doing the right thing, Visual Studio is not (despite its unique_ptr documentation showing the relevant constructor is explicit).

struct keyword in gcc vs Borland C

I'm confused about the struct definitions below. Shouldn't be both correct? With Borland C both compile, but with gcc only the second one compiles. The error is "unknown type name _Node".
typedef struct _Node {
int item;
_Node* next;
} Node;
typedef struct _Node {
int item;
struct _Node* next;
} Node;
It depends on how the compiler handles forward references. The gcc compiler may do,this by default since it is also a C++ compiler.
No, in C only the second (explicitly including the struct specifier) is correct. While C++ allows the omission of struct, c does not, so this is a non-portable Borland extension. If you compile with g++, I imagine it should accept the first syntax as well.

Overloading conflict with vector types __m128, __m256 in GCC

I've started playing around with AVX instructions on the new Intel's Sandy Bridge processor. I'm using GCC 4.5.2, TDM-GCC 64bit build of MinGW64.
I want to overload operator<< for ostream to be able to print out the vector types __m256, __m128 etc to the console. But I'm running into an overloading conflict. The 2nd function in the following code produces an error "conflicts with previous declaration void f(__vector(8) float)":
void f(__m128 v) {
cout << 4;
}
void f(__m256 v) {
cout << 8;
}
It seems that the compiler cannot distinguish between the two types and consideres them both f(float __vector).
Is there a way around this? I haven't been able to find anything online. Any help is greatly appreciated.
I accidentally stumbled upon the answer when having a similar problem with function templates. In this case, the GCC error message actually suggested a solution:
add -fabi-version=4 compiler option.
This solves my problem, and hopefully doesn't cause any issues when linking the standard libraries.
One can read more about ABI (Application Binary Interface) and GCC at ABI Policy and Guidelines and ABI specification. ABI specifies how the functions names are mangled when the code is compiled into object files. Apparently, ABI version 3 used by GCC by default cannot distinguish between the various vector types.
I was unsatisfied with the solution of changing compiler ABI flags to solve this, so I went looking for a different solution. It seems they encountered this issue in writing the Eigen library - see this source file for details http://eigen.tuxfamily.org/dox-devel/SSE_2PacketMath_8h_source.html
My solution to this is a slightly tweaked version of theirs:
template <typename T, unsigned RegisterSize>
struct Register
{
using ValueType = T;
enum { Size = RegisterSize };
inline operator T&() { return myValue; }
inline operator const T&() const { return myValue; }
inline Register() {}
inline Register(const T & v) : myValue(v) {} // Not explicit
inline Register & operator=(const T & v)
{
myValue = v;
return *this;
}
T myValue;
};
using Register4 = Register<__m128, 4u>;
using Register8 = Register<__m256, 8u>;
// Could provide more declarations for __m128d, __m128i, etc. if needed
Using the above, you can overload on Register4, Register8, etc. or produce template functions taking Registers without running into linking issues and without changing ABI settings.

Pbl xcode C++ typedef struct toto toto

I am working on a C++ project on macOS X 10.6.2 with xcode.
I tried to compile my code on windows and do not have any problem, I guess Linux is working but I don't have one with me right now.
My problem is xcode do not accept this kind of instruction :
struct direction {
double x;
double y;
double z;
double t; };
typedef struct direction direction;
Here is my error :
/Users/sbarbier/dev/xcode/Infographie/TP9-RayTracing/RayTracing-Direction.h:22:0 /Users/sbarbier/dev/xcode/Infographie/TP9-RayTracing/RayTracing-Direction.h:22: error: changes meaning of 'direction' from 'typedef struct direction direction'
I am using GCC4.2 and haven't change anything. This code works on every platform, can any one help me ?
This isn't C. In C, to use a struct you had to use the keyword struct:
struct some_struct{ int i; };
struct some_struct myStruct;
This was alleviated like this, commonly:
typedef struct { int i; } some_struct;
some_struct myStruct;
In C++ this is not required. direction already has a type, then you're trying to make a new type of the same name, and that's bad. Take out your entire typedef, it isn't needed.
In C++, struct and class are used only when declaring or defining the struct or class. You might want the typedef in C, but in C++ it doesn't make any sense.

Resources