i am learning about linking..
i wrote the following code in c and made .o using gcc
int f()
{
static int x=0;
return x;
}
extern int z;
int g()
{
static int x=10;
return x;
}
static int y;
static int y=9;
int main()
{
return 0;
}
then i made this into .o by:
gcc begin.c -o begin.o
now when i checked the symtab using readelf there was no record of z....why?
also how does gcc allow two 'y'?
and in .data section how are the two 'x' differentiated?
I'm an a rank non-expert, but I can hazard a couple guesses:
Since you only declare z but never actually use it, there is no need to maintain a reference to it.
Try adding -Wall, which will tell you (at least) that y is declared but unused. The first y is actually just a declaration, which you can repeat indefinitely as long as you do not declare a conflicting type.
In my compile, they end up being named x.1281 and x.1287, so I guess it's something like the name-mangling that happens with C++. The 1281 and 1287 I would guess are the offsets of some relevance, although I am not able to see anything obvious.
Related
Consider the simple C file:
int main()
{
int x = 0;
return x;
}
I'd like to extract a few things:
Ideally, all named things and their type. So I should get something like: int, main, function and int x variable or something like that
Extra points if the variable initialization is there as well
I'm sure that gcc does this internally, since that's an important compilation step, but I'm not sure if I can extract this information.
I'm also not strictly tied to gcc, so if another compiler does this I can consider it, but gcc is preferred.
I have a strange segmentation fault that doesn't exist when everything is in 1 .c file, but does exist when I put part of the code in a dynamically linked library and link it to a test file. The complete code for the working 1 .c file code is at the bottom, the complete code for the error system with 2 .c and 1 .h file come first.
Here is the error system:
example.h:
#include <stdio.h>
#include <stdlib.h>
typedef struct MYARRAY {
int len;
void* items[];
} MYARRAY;
MYARRAY *collection;
void
mypush(void* p);
example.c:
#include "example.h"
void
mypush(void* p) {
printf("Here %lu\n", sizeof collection);
puts("FOO");
int len = collection->len++;
puts("BAR");
collection->items[len] = p;
}
example2.c:
This is essentially a test file:
#include "example.h"
void
test_print() {
puts("Here1");
mypush("foo");
puts("Here2");
}
int
main() {
collection = malloc(sizeof *collection + (sizeof collection->items[0] * 1000));
collection->len = 0;
puts("Start");
test_print();
puts("Done");
return 0;
}
Makefile:
I link example to example2 here, and run:
example:
#clang -I . -dynamiclib \
-undefined dynamic_lookup \
-o example.dylib example.c
#clang example2.c example.dylib -o example2.o
#./example2.o
.PHONY: example
The output is:
$ make example
Start
Here1
Here 8
FOO
make: *** [example] Segmentation fault: 11
But it should show the full output of:
$ make example
Start
Here1
Here 8
FOO
BAR
Here2
Done
The weird thing is everything works if it is this system:
example.c:
#include <stdio.h>
#include <stdlib.h>
typedef struct MYARRAY {
int len;
void* items[];
} MYARRAY;
MYARRAY *collection;
void
mypush(void* p) {
printf("Here %lu\n", sizeof collection);
puts("FOO");
int len = collection->len++;
puts("BAR");
collection->items[len] = p;
}
void
test_print() {
puts("Here1");
mypush("foo");
puts("Here");
}
int
main() {
collection = malloc(sizeof *collection + (sizeof collection->items[0] * 1000));
collection->len = 0;
puts("ASF");
test_print();
return 0;
}
Makefile:
example:
#clang -o example example.c
#./example
.PHONY: example
Wondering why it's creating a segmentation fault when it is linked like this, and what I am doing wrong.
I have checked otool and with DYLD_PRINT_LIBRARIES=YES and it shows it is importing the dynamically linked libraries, but for some reason it's segmentation faulting when linked but works fine when it isn't linked.
Your problem is this, in example.h:
MYARRAY *collection;
Since both main.c and example.c include this file, you end up defining collection twice, which results in undefined behavior. You need to make sure you define each object only once. The details are relatively unimportant since anything can happen with undefined behavior, but what's probably happening is that main.c is allocating memory for one object, but the one example.c is using is still NULL. As mentioned in the comments, since you define collection in main.c your linker is able to build the executable without needing to look for that symbol in the dynamic library, so you don't get a link time warning about it being defined there too, and obviously there'd be no cause for a warning at the time you compile the library.
It works for you when you put everything in one file because obviously then you're not defining anything twice, anymore. The error itself is nothing to do with the fact you're using a dynamic library, although that may have made it harder to detect.
It would be better to define this in example.c and provide a constructor function, there's no need for main() to be able to access it directly. But if you must do this, then define it in example.c and just declare an extern identifier in the header file to tell main.c that the object is defined somewhere else.
I'm playing around with gcc and g++ compiler and trying to compile some C code within those, my purpose is to see how the compiler / linker enforces that when linking a model with some function declaration to a model with that implementation of that function, the correct function are linked ( in terms of parameters passed and values returned )
for example let's take a look at this code
#include <stdio.h>
extern int foo(int b, int c);
int main()
{
int f = foo(5, 8);
printf("%d",f);
}
after compilation within my symbol table I'd have a symbol for foo, but within the elf file format there is not place that describes the arguments taken and the function signature, ( int(int,int) ), so basically if I write some other code such as this:
char foo(int a, int b, int c)
{
return (char) ( a + b + c );
}
compile that model it'll also have some symbol called foo, what if I link these models together, what's gonna happen? I have never thought of this, and how would a compiler overcome this weakness... I know that within g++ the compiler generates some prefix for every symbol regarding to it's namespace, but does it also take in mind the signature? If anyone has ever encountered this it would be great if he could shed some light upon this problem
The problem is solved with name mangling.
In compiler construction, name mangling (also called name decoration)
is a technique used to solve various problems caused by the need to
resolve unique names for programming entities in many modern
programming languages.
It provides a way of encoding additional information in the name of a
function, structure, class or another datatype in order to pass more
semantic information from the compilers to linkers.
The need arises where the language allows different entities to be
named with the same identifier as long as they occupy a different
namespace (where a namespace is typically defined by a module, class,
or explicit namespace directive) or have different signatures (such as
function overloading).
Note the simple example:
Consider the following two definitions of f() in a C++ program:
int f (void) { return 1; }
int f (int) { return 0; }
void g (void) { int i = f(), j = f(0); }
These are distinct functions, with no relation to each other apart
from the name. If they were natively translated into C with no
changes, the result would be an error — C does not permit two
functions with the same name. The C++ compiler therefore will encode
the type information in the symbol name, the result being something
resembling:
int __f_v (void) { return 1; }
int __f_i (int) { return 0; }
void __g_v (void) { int i = __f_v(), j = __f_i(0); }
Notice that g() is mangled even though there is no conflict; name
mangling applies to all symbols.
Wow, I've kept exploring and testing it on my own and I came up with a solution which quietly amazed my mind,
so I wrote the following code and compiled it on a gcc compiler
main.c
#include <stdio.h>
extern int foo(int a, char b);
int main()
{
int g = foo(5, 6);
printf("%d", g);
return 0;
}
foo.c
typedef struct{
int a;
int b;
char c;
char d;
} mystruct;
mystruct foo(int a, int b)
{
mystruct myl;
my.a = a;
my.b = a + 1;
my.c = (char) b;
my.d = (char b + 1;
return my1;
}
now I compiled foo.c to foo.o with gcc firstly and checked the symbol table using
readelf and I had some entry called foo
also after that I compiled main.c to main.o checked the symbol table and it also had some entry called foo, I linked those two together and surprisingly it worked, I ran main.o and obviously encountered some segmentation fault, which makes sense as the actual implementation of foo as implemented in foo.o probably expects three parameters (first one should be struct adders), a parameter which isn't passed in main.o under it's definition to foo then the actual implementation accesses some memory that doesn't belong to it from the stack frame of main, then tries accessing addresses that it thought it got, and ends up with segmentation fault, that's fine,
now I compiled both models again with g++ and not gcc and what came up was amazing.. I found out that the symbol entry under foo.o was _Z3fooii and under main.o it was _Z3fooic, now my guess is that the ii suffix means int int and ic suffix means int char which probably refers to the parameters that should be passed to function hence allowing the compiler to know some function deceleration gets the actual implementation. so I changed my foo declaration in main.c to
extern int foo(int a, int b);
re-compiled and this time got the symbol _Z3fooii, I linked both models again and amazingly this time it worked, I tried running it and again encountered segmentation fault, which again also makes sense as the compiler wont always even authorize correct return values.. anyways what was my original thought - that g++ includes function signature within symbol name and thus enforces the linker to give function implementation get correct parameters to correct function declaration
I'm trying to create library with two versions of the same function using
__asm__(".symver ......
approach
library.h
#ifndef CTEST_H
#define CTEST_H
int first(int x);
int second(int x);
#endif
library.cpp
#include "simple.h"
#include <stdio.h>
__asm__(".symver first_1_0,first#LIBSIMPLE_1.0");
int first_1_0(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 1;
}
__asm__(".symver first_2_0,first##LIBSIMPLE_2.0");
int first_2_0(int x)
{
int y;
printf("lib: %d\n", y);
printf("lib: %s\n", __FUNCTION__);
return (x + 1) * 1000;
}
int second(int x)
{
printf("lib: %s\n", __FUNCTION__);
return x + 2;
}
And here is the version scripf file
LIBSIMPLE_1.0{
global:
first; second;
local:
*;
};
LIBSIMPLE_2.0{
global:
first;
local:
*;
};
When build library using gcc, everything works well, and i am able to link to a library binary. Using nm tool i see that both first() and second() function symbols are exported.
Now, when i try to use g++, non of the symbols are exported.
So i tried to use extern "C" directive to wrap both declarations
extern "C" {
int first(int x);
int second(int x);
}
nm shows that second() function symbol is exported, but first() still remain unexported, and mangled.
What is here i am missing to make this to work? Or it is impossible with the c++ compiler to achieve this?
I don't know why, with 'extern "C"', 'first' was not exported - suspect there is something else interfering.
Otherwise C++ name mangling is certainly a pain here. The 'asm' directives (AFAIK) require the mangled names for C++ functions, not the simple 'C' name. So 'int first(int)' would need to be referenced as (e.g.) '_Z5firsti' instead of just 'first'. This is, of course, a real pain as far as portability goes...
The linker map file is more forgiving as its supported 'extern "C++" {...}' blocks to list C++ symbols in their as-written form - 'int first(int)'.
This whole process is a maintainance nightmare. What I'd really like would be a function attribute which could be used to specify the alias and version...
Just to add a reminder that C++11 now supports inline namespaces which can be used to provide symbol versioning in C++.
As this is my first post to stackoverflow I want to thank you all for your valuable posts that helped me a lot in the past.
I use MinGW (gcc 4.4.0) on Windows-7(64) - more specifically I use Nokia Qt + MinGW but Qt is not involved in my Question.
I need to find the address and -more important- the length of specific functions of my application at runtime, in order to encode/decode these functions and implement a software protection system.
I already found a solution on how to compute the length of a function, by assuming that static functions placed one after each other in a source-file, it is logical to be also sequentially placed in the compiled object file and subsequently in memory.
Unfortunately this is true only if the whole CPP file is compiled with option: "g++ -O0" (optimization level = 0).
If I compile it with "g++ -O2" (which is the default for my project) the compiler seems to relocate some of the functions and as a result the computed function length seems to be both incorrect and negative(!).
This is happening even if I put a "#pragma GCC optimize 0" line in the source file,
which is supposed to be the equivalent of a "g++ -O0" command line option.
I suppose that "g++ -O2" instructs the compiler to perform some global file-level optimization (some function relocation?) which is not avoided by using the #pragma directive.
Do you have any idea how to prevent this, without having to compile the whole file with -O0 option?
OR: Do you know of any other method to find the length of a function at runtime?
I prepare a small example for you, and the results with different compilation options, to highlight the case.
The Source:
// ===================================================================
// test.cpp
//
// Intention: To find the addr and length of a function at runtime
// Problem: The application output is correct when compiled with: "g++ -O0"
// but it's erroneous when compiled with "g++ -O2"
// (although a directive "#pragma GCC optimize 0" is present)
// ===================================================================
#include <stdio.h>
#include <math.h>
#pragma GCC optimize 0
static int test_01(int p1)
{
putchar('a');
putchar('\n');
return 1;
}
static int test_02(int p1)
{
putchar('b');
putchar('b');
putchar('\n');
return 2;
}
static int test_03(int p1)
{
putchar('c');
putchar('\n');
return 3;
}
static int test_04(int p1)
{
putchar('d');
putchar('\n');
return 4;
}
// Print a HexDump of a specific address and length
void HexDump(void *startAddr, long len)
{
unsigned char *buf = (unsigned char *)startAddr;
printf("addr:%ld, len:%ld\n", (long )startAddr, len);
len = (long )fabs(len);
while (len)
{
printf("%02x.", *buf);
buf++;
len--;
}
printf("\n");
}
int main(int argc, char *argv[])
{
printf("======================\n");
long fun_len = (long )test_02 - (long )test_01;
HexDump((void *)test_01, fun_len);
printf("======================\n");
fun_len = (long )test_03 - (long )test_02;
HexDump((void *)test_02, fun_len);
printf("======================\n");
fun_len = (long )test_04 - (long )test_03;
HexDump((void *)test_03, fun_len);
printf("Test End\n");
getchar();
// Just a trick to block optimizer from eliminating test_xx() functions as unused
if (argc > 1)
{
test_01(1);
test_02(2);
test_03(3);
test_04(4);
}
}
The (correct) Output when compiled with "g++ -O0":
[note the 'c3' byte (= assembly 'ret') at the end of all functions]
======================
addr:4199344, len:37
55.89.e5.83.ec.18.c7.04.24.61.00.00.00.e8.4e.62.00.00.c7.04.24.0a.00.00.00.e8.42
.62.00.00.b8.01.00.00.00.c9.c3.
======================
addr:4199381, len:49
55.89.e5.83.ec.18.c7.04.24.62.00.00.00.e8.29.62.00.00.c7.04.24.62.00.00.00.e8.1d
.62.00.00.c7.04.24.0a.00.00.00.e8.11.62.00.00.b8.02.00.00.00.c9.c3.
======================
addr:4199430, len:37
55.89.e5.83.ec.18.c7.04.24.63.00.00.00.e8.f8.61.00.00.c7.04.24.0a.00.00.00.e8.ec
.61.00.00.b8.03.00.00.00.c9.c3.
Test End
The erroneous Output when compiled with "g++ -O2":
(a) function test_01 addr & len seem correct
(b) functions test_02, test_03 have negative lengths,
and fun. test_02 length is also incorrect.
======================
addr:4199416, len:36
83.ec.1c.c7.04.24.61.00.00.00.e8.c5.61.00.00.c7.04.24.0a.00.00.00.e8.b9.61.00.00
.b8.01.00.00.00.83.c4.1c.c3.
======================
addr:4199452, len:-72
83.ec.1c.c7.04.24.62.00.00.00.e8.a1.61.00.00.c7.04.24.62.00.00.00.e8.95.61.00.00
.c7.04.24.0a.00.00.00.e8.89.61.00.00.b8.02.00.00.00.83.c4.1c.c3.57.56.53.83.ec.2
0.8b.5c.24.34.8b.7c.24.30.89.5c.24.08.89.7c.24.04.c7.04.
======================
addr:4199380, len:-36
83.ec.1c.c7.04.24.63.00.00.00.e8.e9.61.00.00.c7.04.24.0a.00.00.00.e8.dd.61.00.00
.b8.03.00.00.00.83.c4.1c.c3.
Test End
This is happening even if I put a "#pragma GCC optimize 0" line in the source file, which is supposed to be the equivalent of a "g++ -O0" command line option.
I don't believe this is true: it is supposed to be the equivalent of attaching __attribute__((optimize(0))) to subsequently defined functions, which causes those functions to be compiled with a different optimisation level. But this does not affect what goes on at the top level, whereas the command line option does.
If you really must do horrible things that rely on top level ordering, try the -fno-toplevel-reorder option. And I suspect that it would be a good idea to add __attribute__((noinline)) to the functions in question as well.