I'm playing around with gcc and g++ compiler and trying to compile some C code within those, my purpose is to see how the compiler / linker enforces that when linking a model with some function declaration to a model with that implementation of that function, the correct function are linked ( in terms of parameters passed and values returned )
for example let's take a look at this code
#include <stdio.h>
extern int foo(int b, int c);
int main()
{
int f = foo(5, 8);
printf("%d",f);
}
after compilation within my symbol table I'd have a symbol for foo, but within the elf file format there is not place that describes the arguments taken and the function signature, ( int(int,int) ), so basically if I write some other code such as this:
char foo(int a, int b, int c)
{
return (char) ( a + b + c );
}
compile that model it'll also have some symbol called foo, what if I link these models together, what's gonna happen? I have never thought of this, and how would a compiler overcome this weakness... I know that within g++ the compiler generates some prefix for every symbol regarding to it's namespace, but does it also take in mind the signature? If anyone has ever encountered this it would be great if he could shed some light upon this problem
The problem is solved with name mangling.
In compiler construction, name mangling (also called name decoration)
is a technique used to solve various problems caused by the need to
resolve unique names for programming entities in many modern
programming languages.
It provides a way of encoding additional information in the name of a
function, structure, class or another datatype in order to pass more
semantic information from the compilers to linkers.
The need arises where the language allows different entities to be
named with the same identifier as long as they occupy a different
namespace (where a namespace is typically defined by a module, class,
or explicit namespace directive) or have different signatures (such as
function overloading).
Note the simple example:
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart
from the name. If they were natively translated into C with no
changes, the result would be an error — C does not permit two
functions with the same name. The C++ compiler therefore will encode
the type information in the symbol name, the result being something
resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name
mangling applies to all symbols.
Wow, I've kept exploring and testing it on my own and I came up with a solution which quietly amazed my mind,
so I wrote the following code and compiled it on a gcc compiler
main.c
#include <stdio.h>
extern int foo(int a, char b);
int main()
{
int g = foo(5, 6);
printf("%d", g);
return 0;
}
foo.c
typedef struct{
int a;
int b;
char c;
char d;
} mystruct;
mystruct foo(int a, int b)
{
mystruct myl;
my.a = a;
my.b = a + 1;
my.c = (char) b;
my.d = (char b + 1;
return my1;
}
now I compiled foo.c to foo.o with gcc firstly and checked the symbol table using
readelf and I had some entry called foo
also after that I compiled main.c to main.o checked the symbol table and it also had some entry called foo, I linked those two together and surprisingly it worked, I ran main.o and obviously encountered some segmentation fault, which makes sense as the actual implementation of foo as implemented in foo.o probably expects three parameters (first one should be struct adders), a parameter which isn't passed in main.o under it's definition to foo then the actual implementation accesses some memory that doesn't belong to it from the stack frame of main, then tries accessing addresses that it thought it got, and ends up with segmentation fault, that's fine,
now I compiled both models again with g++ and not gcc and what came up was amazing.. I found out that the symbol entry under foo.o was _Z3fooii and under main.o it was _Z3fooic, now my guess is that the ii suffix means int int and ic suffix means int char which probably refers to the parameters that should be passed to function hence allowing the compiler to know some function deceleration gets the actual implementation. so I changed my foo declaration in main.c to
extern int foo(int a, int b);
re-compiled and this time got the symbol _Z3fooii, I linked both models again and amazingly this time it worked, I tried running it and again encountered segmentation fault, which again also makes sense as the compiler wont always even authorize correct return values.. anyways what was my original thought - that g++ includes function signature within symbol name and thus enforces the linker to give function implementation get correct parameters to correct function declaration
Related
I am getting the following error
rudimentary_calc.c: In function ‘main’:
rudimentary_calc.c:9:6: error: conflicting types for ‘getline’
9 | int getline(char line[], int max) ;
| ^~~~~~~
In file included from rudimentary_calc.c:1:
/usr/include/stdio.h:616:18: note: previous declaration of ‘getline’ was here
616 | extern __ssize_t getline (char **__restrict __lineptr,
| ^~~~~~~
when I ran the following code
#include <stdio.h>
#define maxline 100
int main()
{
double sum, atof(char[]);
char line[maxline];
int getline(char line[], int max) ;
sum = 0;
while (getline(line, maxline) > 0)
printf("\t %g \n", sum += atof(line));
return 0;
}
What am I doing wrong? I am very new to C, so I don't know what went wrong.
Generally, you should not have to declare "built-in" functions as long as you #include the appropriate header files (in this case stdio.h). The compiler is complaining that your declaration is not exactly the same as the one in stdio.h.
The venerable K&R book defines a function named getline. The GNU C library also defines a non-standard function named getline. It is not compatible with the function defined in K&R. It is declared in the standard <stdio.h> header. So there is a name conflict (something that every C programmer has do deal with).
You can instruct GCC to ignore non-standard names found in standard headers. You need to supply a compilation flag such as -std=c99 or -std=c11 or any other std=c<year> flag that yout compiler supports.
Live demo
Always use one of these flags, plus at least -Wall, to compile any C code, including code from K&R. You may encounter some compiler warnings or even errors. This is good. Thy will tell you that there are some code constructs that were good in the days of K&R, but are considered problematic now. You want to know about those. The book is rather old and the best practices and the C language itself have evolved since.
Consider the simple C file:
int main()
{
int x = 0;
return x;
}
I'd like to extract a few things:
Ideally, all named things and their type. So I should get something like: int, main, function and int x variable or something like that
Extra points if the variable initialization is there as well
I'm sure that gcc does this internally, since that's an important compilation step, but I'm not sure if I can extract this information.
I'm also not strictly tied to gcc, so if another compiler does this I can consider it, but gcc is preferred.
I'm trying to create library with two versions of the same function using
__asm__(".symver ......
approach
library.h
#ifndef CTEST_H
#define CTEST_H
int first(int x);
int second(int x);
#endif
library.cpp
#include "simple.h"
#include <stdio.h>
__asm__(".symver first_1_0,first#LIBSIMPLE_1.0");
int first_1_0(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 1;
}
__asm__(".symver first_2_0,first##LIBSIMPLE_2.0");
int first_2_0(int x)
{
int y;
printf("lib: %d\n", y);
printf("lib: %s\n", __FUNCTION__);
return (x + 1) * 1000;
}
int second(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 2;
}
And here is the version scripf file
LIBSIMPLE_1.0{
global:
first; second;
local:
*;
};
LIBSIMPLE_2.0{
global:
first;
local:
*;
};
When build library using gcc, everything works well, and i am able to link to a library binary. Using nm tool i see that both first() and second() function symbols are exported.
Now, when i try to use g++, non of the symbols are exported.
So i tried to use extern "C" directive to wrap both declarations
extern "C" {
int first(int x);
int second(int x);
}
nm shows that second() function symbol is exported, but first() still remain unexported, and mangled.
What is here i am missing to make this to work? Or it is impossible with the c++ compiler to achieve this?
I don't know why, with 'extern "C"', 'first' was not exported - suspect there is something else interfering.
Otherwise C++ name mangling is certainly a pain here. The 'asm' directives (AFAIK) require the mangled names for C++ functions, not the simple 'C' name. So 'int first(int)' would need to be referenced as (e.g.) '_Z5firsti' instead of just 'first'. This is, of course, a real pain as far as portability goes...
The linker map file is more forgiving as its supported 'extern "C++" {...}' blocks to list C++ symbols in their as-written form - 'int first(int)'.
This whole process is a maintainance nightmare. What I'd really like would be a function attribute which could be used to specify the alias and version...
Just to add a reminder that C++11 now supports inline namespaces which can be used to provide symbol versioning in C++.
I have mainly two kinds of compile warning:
1. implicit declaration of function
in a.c, it has char *foo(char *ptr1, char *ptr2), in b.c, some functions use this foo function without any declaration, and I found seems compiler will treat the function foo return value as integer, and even I can pass some variables less or more than foo function declaration
2. enumerated type mixed with another type
My target chip is ARM11, it seems that even I don't solve these two kinds of compile warning, my program can run without any issues, but I believe it must have some risk behind these. Can anyone give me some good example that these two kinds of compile warning can cause some unexpected issues?
Meanwhile, if these two warnings have potential risk, why c compiler allow these kinds warning happen but not set them to error directly? any story behind?
Implicit declaration. E.g. you have function: float foo(float a), which isn't declared when you call it. Implicit rules will create auto-declaration with following signature: int foo(double) (if passed argument is float). So value you pass will be converted to double, but foo expects float. The same with return - calling code expects int, but returned float. Values would be a complete mess.
enum mixed with other type. Enumerated type have list of values it could take. If you trying to assign numeric value to it, there is a chance that it isn't one of listed values; if later your code expects only specified range and presumes nothing else could be there - it could misbehave.
Simple example:
File: warn.c
#include <stdio.h>
double foo(double x)
{
return myid(x);
}
int
main (void)
{
double x = 1.0;
fprintf (stderr, "%lg == %lg\n", x, foo (x));
return 0;
}
File: foo.c
double
myid (double x)
{
return x;
}
Compile and run:
$ gcc warn.c foo.c -Wall
warn.c: In function ‘foo’:
warn.c:5: warning: implicit declaration of function ‘myfabs’
$ ./a.out
1 == 0
Old C standard (C90) had this strange "default int" rule and for compatibility it is supported even in latest compilers.
i am learning about linking..
i wrote the following code in c and made .o using gcc
int f()
{
static int x=0;
return x;
}
extern int z;
int g()
{
static int x=10;
return x;
}
static int y;
static int y=9;
int main()
{
return 0;
}
then i made this into .o by:
gcc begin.c -o begin.o
now when i checked the symtab using readelf there was no record of z....why?
also how does gcc allow two 'y'?
and in .data section how are the two 'x' differentiated?
I'm an a rank non-expert, but I can hazard a couple guesses:
Since you only declare z but never actually use it, there is no need to maintain a reference to it.
Try adding -Wall, which will tell you (at least) that y is declared but unused. The first y is actually just a declaration, which you can repeat indefinitely as long as you do not declare a conflicting type.
In my compile, they end up being named x.1281 and x.1287, so I guess it's something like the name-mangling that happens with C++. The 1281 and 1287 I would guess are the offsets of some relevance, although I am not able to see anything obvious.