I am attempting to use objcopy to convert an xml file to an object file that is then linked into and used by another shared library on RHEL5. I convert the file with this command:
objcopy --input-format binary --output-target i386-pc-linux-gnu --binary-architecture i386 baselines.xml baselines.0
The object file is created and using readelf I get the following:
Symbol table '.symtab' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 SECTION LOCAL DEFAULT 1
2: 00000000 0 NOTYPE GLOBAL DEFAULT 1 _binary_baselines_xml_sta
3: 0000132b 0 NOTYPE GLOBAL DEFAULT 1 _binary_baselines_xml_end
4: 0000132b 0 NOTYPE GLOBAL DEFAULT ABS _binary_baselines_xml_siz
So it looks like the size is in there. I dumped the file and verified the xml is embedded as ascii at offset 34 (specified by the .data value) and that it's correct. The data is 0x132b bytes in size, as specified by the variable.
Then in the code, I declare a couple variables:
extern "C"
{
extern char _binary_baselines_xml_start;
extern char _binary_baselines_xml_size;
}
static const char* xml_start = &_binary_baselines_xml_start;
const uint32_t xml_size = reinterpret_cast<uint32_t>(&_binary_baselines_xml_size);
When I step into this, the xml pointer is correct and I can see the xml text in the debugger. However, the size symbol shows the value as 0x132b (which is what I want) but it also indicates that "Address 0x132b is out of bounds". When I use the variable it is a very large incorrect random number. I've tried all sorts of other syntax to declare the extern variable such as char*, char[], int, int*, etc. The result is always the same. The value is there but I can't seem to get to it.
Another point of interest is that this code works fine on a windows machine without the prepended underscore on the extern variables but all else the same.
I can't seem to find much online about using objcopy in this manner so any help is greatly appreciated.
I am not sure what you actual issue is. The *_size symbol is an absolute symbol to indicate the size. You are not supposed to be able to actually reference the location (unless by accident) it is just a way of sneaking an integer value into the linker without actually defining a data variable. What you are doing is correct in how you are using it.
The best way to think about this would be if you had the following code:
char* psize = reinterpret_cast<char*>(0x1234);
int size = reinterpret_cast<int>(psize);
The only difference is the linker fills in the 0x1234 value for you via a symbol.
Related
If this is the wrong place for this question I apologise and please redirect me to the suitable section.
I'm somewhat rusty on installing from command line, especially on Windows. I decided to install the latest Perl version on my PC, running under Windows 10. I had previously installed it using Strawberry Perl download, but as it was a few versions out of date I decided to remove it and refresh my skills (ha) by installing it manually. I downloaded the lastest Perl release from https://www.perl.org/get.html#win32 and have been reading the README.win32 to make sure I install it correctly.
As I need a compiler, I decided to use Gcc and dmake. I installed and can run them successfully so went back to installing Perl. As per instructions I tried running dmake in the win32 subdirectory in the Perl download folder. Before this I edited makefile.mk, where these variables are uncommented from the Build configuration section:
INST_DRV/INST_TOP (left as is)
INST_VER *= \5.26.0
USE_MULTI *= define
USE_ITHREADS *= define
USE_IMP_SYS *= define
USE_LARGE_FILES *= define
USE_64_BIT_INT *= define
USE_LONG_DOUBLE *= define
DEFAULT_INC_EXCLUDES_DOT *= define
CCTYPE = GCC
GCCWRAPV *=define
CCHOME *= C:\MinGW
(nothing else changed after this)
When I run dmake in the directory, it quickly comes to this error:
gcc -c -I.\include -I. -I.. -DWIN32 -DPERLDLL -DPERL_CORE -s -O2 -
D__USE_MINGW_ANSI_STDIO -fwrapv -fno-strict-aliasing
-DPERL_EXTERNAL_GLOB -DPERL_IS_MINIPERL -omini\toke.o ..\toke.c
In file included from ..\perl.h:3220:0,
from ..\toke.c:40:
./win32.h:417:13: error: conflicting types for 'mkstemp'
extern int mkstemp(const char *path);
^~~~~~~
In file included from ..\perl.h:790:0,
from ..\toke.c:40:
c:\mingw\include\stdlib.h:809:30: note: previous definition of 'mkstemp' was here
__cdecl __MINGW_NOTHROW int mkstemp (char *__filename_template)
^~~~~~~
In file included from ..\toke.c:40:0:
..\toke.c: In function 'Perl_filter_add':
..\perl.h:1756:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
#define PTR2nat(p) (PTRV)(p) /* pointer to integer of PTRSIZE */
^
..\perl.h:1769:28: note: in expansion of macro 'PTR2nat'
#define FPTR2DPTR(t,p) ((t)PTR2nat(p)) /* function pointer to data pointer */
^~~~~~~
..\toke.c:4397:21: note: in expansion of macro 'FPTR2DPTR'
IoANY(datasv) = FPTR2DPTR(void *, funcp); /* stash funcp into spare field */
^~~~~~~~~
..\perl.h:1769:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
#define FPTR2DPTR(t,p) ((t)PTR2nat(p)) /* function pointer to data pointer */
^
..\toke.c:4397:21: note: in expansion of macro 'FPTR2DPTR'
IoANY(datasv) = FPTR2DPTR(void *, funcp); /* stash funcp into spare field */
^~~~~~~~~
..\toke.c: In function 'Perl_filter_del':
..\perl.h:1756:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
#define PTR2nat(p) (PTRV)(p) /* pointer to integer of PTRSIZE */
^
..\perl.h:1769:28: note: in expansion of macro 'PTR2nat'
#define FPTR2DPTR(t,p) ((t)PTR2nat(p)) /* function pointer to data pointer */
^~~~~~~
..\toke.c:4463:26: note: in expansion of macro 'FPTR2DPTR'
if (IoANY(datasv) == FPTR2DPTR(void *, funcp)) {
^~~~~~~~~
..\perl.h:1769:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
#define FPTR2DPTR(t,p) ((t)PTR2nat(p)) /* function pointer to data pointer */
^
..\toke.c:4463:26: note: in expansion of macro 'FPTR2DPTR'
if (IoANY(datasv) == FPTR2DPTR(void *, funcp)) {
^~~~~~~~~
..\toke.c: In function 'Perl_filter_read':
..\perl.h:1756:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
#define PTR2nat(p) (PTRV)(p) /* pointer to integer of PTRSIZE */
^
..\perl.h:1768:28: note: in expansion of macro 'PTR2nat'
#define DPTR2FPTR(t,p) ((t)PTR2nat(p)) /* data pointer to function pointer */
^~~~~~~
..\toke.c:4554:13: note: in expansion of macro 'DPTR2FPTR'
funcp = DPTR2FPTR(filter_t, IoANY(datasv));
^~~~~~~~~
..\perl.h:1768:25: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
#define DPTR2FPTR(t,p) ((t)PTR2nat(p)) /* data pointer to function pointer */
^
..\toke.c:4554:13: note: in expansion of macro 'DPTR2FPTR'
funcp = DPTR2FPTR(filter_t, IoANY(datasv));
^~~~~~~~~
In file included from ..\perl.h:5644:0,
from ..\toke.c:40:
..\toke.c: In function 'S_pending_ident':
..\perl.h:1734:26: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
# define INT2PTR(any,d) (any)(d)
^
..\embed.h:427:59: note: in definition of macro 'newUNOP_AUX'
#define newUNOP_AUX(a,b,c,d) Perl_newUNOP_AUX(aTHX_ a,b,c,d)
^
..\toke.c:8912:37: note: in expansion of macro 'INT2PTR'
INT2PTR(UNOP_AUX_item *,
^~~~~~~
I get a bunch of warnings beforehand about casting from pointers to integers of different sizes, but this is the bit where it stops and produces an error. Am I missing something obvious? I haven't done this for a while, so I'm hoping it is a silly user error on my part! Thanks.
Try obtaining the MinGW source packages, and installing using makepkg-mingw to build them.
Most if not all have patches applied to customize (or fix) them for the MSYS2/MinGW environment.
Stock source downloaded from its author may not compile directly in that environment the way it would on Linux, or OS X using "configure" and "make".
Instructions are available, and there may be other similar instructions out there associated with Arch Linux.
I have a question about ELF dynamic symbol table. For symbols of type FUNC, I have noticed a value of 0 in some binaries. But in other binaries, it has some non-zero value. Both these binaries were generated by gcc, I want to know why is this difference?. Is there any compiler options to control this?
EDIT: This is the output of readelf --dyn-syms prog1
Symbol table '.dynsym' contains 5 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 000082f0 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.4 (2)
3: 00008314 0 FUNC GLOBAL DEFAULT UND abort#GLIBC_2.4 (2)
4: 000082fc 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.4
Here value of "printf" symbol is 82f0 which happens to be the address of plt table entry for printf.
Output of readelf --dyn-syms prog2
Symbol table '.dynsym' contains 6 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 00000000 0 NOTYPE LOCAL DEFAULT UND
1: 00000000 0 NOTYPE WEAK DEFAULT UND __gmon_start__
2: 00000000 0 FUNC GLOBAL DEFAULT UND puts#GLIBC_2.4 (2)
3: 00000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.4 (2)
4: 00000000 0 FUNC GLOBAL DEFAULT UND abort#GLIBC_2.4 (2)
5: 00000000 0 FUNC GLOBAL DEFAULT UND __libc_start_main#GLIBC_2.4
Here the values for all the symbols are zero.
The x86_64 SV ABI mandates that (emphasis mine):
To allow comparisons of function addresses to work as expected,
if an executable file references a function defined in a shared object,
the link editor will place the address of the procedure linkage table
entry for that function in its associated symbol table entry.
This will result in symbol table entries with section index of
SHN_UNDEF but a type of STT_FUNC and a non-zero st_value.
A reference to the address of a function from within a shared
library will be satisfied
by such a definition in the executable.
With my GCC, this program:
#include <stdio.h>
int main()
{
printf("hello %i\n", 42);
return 0;
}
when compiled directly into an executable generates a null value:
1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
But this program with a comparison of the printf function:
#include <stdio.h>
int main()
{
printf("hello %i\n", 42);
if (printf == puts)
return 1;
return 0;
}
generates a non-null value:
3: 0000000000400410 0 FUNC GLOBAL DEFAULT UND printf#GLIBC_2.2.5 (2)
In the .o file, the first program generates:
000000000014 000a00000002 R_X86_64_PC32 0000000000000000 printf - 4
and the second:
000000000014 000a00000002 R_X86_64_PC32 0000000000000000 printf - 4
000000000019 000a0000000a R_X86_64_32 0000000000000000 printf + 0
The difference is caused by the extra R_X86_64_32 relocation for getting the address of the function.
Observations by running readelf on some binary
All the FUNCTIONS which are UNDEFINED have size zero.
These undefined functions are those which are called through libraries. In my small ELF binary all references to GLIBc are undefined with size zero
From http://docs.oracle.com/cd/E19457-01/801-6737/801-6737.pdf on page 21
It becomes clear that symbol table can have three types of symbols. Among these three, two types UNDEFINED and TENTATIVE symbols are those which are with out storage assigned. in later case you can see in readelf output, some functions which are not undefined(have index) and does not have storage.
for clarity undefined symbols are those which are referenced but does not assign storage(have not been created yet) while tentative symbols are those which are created but w/o assigned storage. e.g uninitialized symbols
edit
if you are talking about .plt, shared libraries symbols bind is lazy.
how to control the bind see http://www.linuxjournal.com/article/1060
This feature is known as lazy symbol binding. The idea is that if you have lots of shared libraries, it could take the dynamic loader lots of time to look up all of the functions to initialize all of the .plt slots, so it would be preferable to defer binding addresses to the functions until we actually need them. This turns out to be a big win if you only end up using a small fraction of the functions in a shared library. It is possible to instruct the dynamic loader to bind addresses to all of the .plt slots before transferring control to the application—this is done by setting the environment variable LD_BIND_NOW=1 before running the program. This turns out to be useful in some cases when you are debugging a program, for example. Also, I should point out that the .plt is in read-only memory. Thus the addresses used for the target of the jump are actually stored in the .got section. The .got also contains a set of pointers for all of the global variables that are used within a program that come from a shared library.
I'm running into some odd issue on kernel module load that I'm suspecting having to do with linking and loading. How to I programmatically figure out the address of each section after they are loaded in memory (from inside the module itself). Like where is .bss / .data / .text and so on.
From reading this article
https://lwn.net/Articles/90913/
It is sorta in the directly that I'm looking for.
You can see the sections begin addresses like this from userspace (need root permissions):
sudo cat /sys/module/<modulename>/sections/.text
I have browsed how syfs retrieves this addresses, and i found the following:
There is a section attributes in struct module
309 /* Section attributes */
310 struct module_sect_attrs *sect_attrs;
This attrs is a bunch of attr structs
1296 struct module_sect_attrs {
1297 struct attribute_group grp;
1298 unsigned int nsections;
1299 struct module_sect_attr attrs[0];
1300 };
where sect attr is the thing you are looking for
1290 struct module_sect_attr {
1291 struct module_attribute mattr;
1292 char *name;
1293 unsigned long address;
From the module's code THIS_MODULE macro is actually a pointer to the struct module object. Its module_init and module_core fields point to memory regions, where all module sections are loaded.
As I understand, sections division is inaccessible from the module code(struct load_info is dropped after module is loaded into memory). But having module's file you can easily deduce section's addresses after load:
module_init:
- init sections with code (.init.text)
- init sections with readonly data
- init sections with writable data
module_core:
- sections with code (.text)
- sections with readonly data
- sections with writable data
If several sections suit to one category, they are placed in the same order, as in the module's file.
Within module's code you can also print address of any its symbol, and after calculate start of the section, contained this symbol.
While this question is five years old, I thought I would contribute my two-cents. I was able to access the kernel's sections in a sort of hack-y way inspired by Alex Hoppus' answer. I don't advocate doing things this way, unless you are writing the kernel module to debug things or understand the kernel etc.
Anyway, I copy the following two structs into my module to help resolve incomplete types.
struct module_sect_attr {
struct module_attribute mattr;
char *name;
unsigned long address;
};
struct module_sect_attrs {
struct attribute_group grp;
unsigned int nsections;
struct module_sect_attr attrs[0];
};
Then, in my module initialization function, I do the following to get the section addresses.
unsigned long text = 0;
unsigned int nsections = 0;
unsigned int i;
struct module_sect_attr* sect_attr;
nsections = THIS_MODULE->sect_attrs->nsections;
sect_attr = THIS_MODULE->sect_attrs->attrs;
for (i = 0; i < nsections; i++) {
if (strcmp((sect_attr + i)->name, ".text") == 0)
text = (sect_attr + i)->address;
}
Finally, it should be noted that if you are looking for the address of .rodata, .bss, or .data you will need to define constant global variables, uninitialized global variables, or regular global variables, respectively, if you don't want those sections to be omitted.
Sorry if the questions are dumb, but they are really confusing me!
According to elf standard the binary is divided into segments like text segment (containing code and RO data) and data segment (containing RW & BSS) which is loaded into memory when the program is executed and process is created, with the segments providing information for environment preparation for process execution.
The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Lastly, is there any relation of this with scatter loading? which i think is not the case as scatter loading is done when image is to be loaded into memory and once control is passed to OS, the memory to be allocated to executable or applications is take care off by the OS itself!
I know these are too many questions, but any help will be greatly appreciated.
If u can provide any reference books or links where i can study in detail about this, that is also appreciated.
Thanks a tonne! :)
The question is, how it is decided that how much stack to allocate to process, when i am not providing stack size during process creation?
When a new process created, execve() system call is used to load the new program as process image into memory from the current running process image. Which mean execve when new program is loaded replaces older .text, .data segments, heap and reset the stack. Now ELF executable file is mapped into memory address space making stack space getting initialized with environment array and the argument array to main().
In do_execve_common() procedure call under subroutine bprm_mm_init() handles tasks such as,
New instance of mm_struct to manage process address space using call to mm_alloc().
Initialize this instance with init_new_context().
bprm_mm_init() initializes stack.
search_binary_handler() routine searches for suitable binary format i.e load_binary, load_shlib to load programs or dynamic libraries respectively. Followed by mapping memory to virtual address space and making process ready to run when scheduler identifies the process.
Therefore, stack memory finally looks like below, which will appear to main() routine at start of the execution. Now and then each environment of a subset of function calls, including parameters and local variables are stored or pushed in stack memory zone dynamically when the calls happen.
-----------------
| | <--- Top of the Stack
| environmental |
| variables and |
| the other |
| parameters to |
| main() |
_________________ <--- Stack Pointer
| |
| Stack Space |
| |
Also, using the data segment we can determine how much memory the process requires (for global variables) but once this memory is allocated how mapping of variables is done with the address space inside this allocated memory?
Let try figuring out how variables are mapped to different parts of memory segments by debugging a simple C program as follows,
/* File Name: elf.c : Demonstrating Global variables */
#include <stdio.h>
int add_numbers(void);
int value1 = 10; // Global Initialized: .data section
int value2; // Global Initialized: .bss section
int add_numbers(void)
{
int result; // Local Uninitialized: Stack section
result = value1 + value2;
return result;
}
int main(void)
{
int final_result; // Local Uninitialized: Stack section
value2 = 20;
final_result = add_numbers();
printf("The sum of %d + %d is %d\n",
value1, value2, final_result);
}
Using readelf to display .data section header as below,
$readelf -a elf
...
Section Headers:
[26] .data PROGBITS 00000000006c2060 000c2060
00000000000016b0 0000000000000000 WA 0 0 32
[27] .bss NOBITS 00000000006c3720 000c3710
0000000000002bc8 0000000000000000 WA 0 0 32
...
$readelf -x 26 elf
Hex dump of section '.data':
0x006c2060 00000000 00000000 00000000 00000000 ................
0x006c2070 0a000000 00000000 00000000 00000000 ................
...
Let's use GDB to look at what these section contain,
(gdb) disassemble 0x006c2060
Dump of assembler code for function `data_start`:
0x00000000006c2060 <+0>: add %al,(%rax)
0x00000000006c2062 <+2>: add %al,(%rax)
0x00000000006c2064 <+4>: add %al,(%rax)
0x00000000006c2066 <+6>: add %al,(%rax)
End of assembler dump.
The above first address of .data section refers to data_start subroutine.
(gdb) disassemble 0x006c2070
Dump of assembler code for function `value1`:
0x00000000006c2070 <+0>: or (%rax),%al
0x00000000006c2072 <+2>: add %al,(%rax)
End of assembler dump.
....
The above disassemble dumps address of global variable value1 initialized to
10. But we don't see global uninitialized variable value2 in next addresses.
Let's look at printing the address of value2,
(gdb) p &value2
$1 = (int *) 0x6c5eb0
(gdb) info symbol 0x6c5eb0
value2 in section **.bss**
(gdb) disassemble 0x6c5eb0
Dump of assembler code for function `value2`:
0x00000000006c5eb0 <+0>: add %al,(%rax)
0x00000000006c5eb2 <+2>: add %al,(%rax)
End of assembler dump.
Tada! Disassembling reference pointer of value2 revels that the variable is stored in .bss section. This explains how the uninitialized global variables mapped to process memory space.
Lastly, is there any relation of this with scatter loading?
No.
I am using MPLABx and the HI Tech PICC compiler. My target chip is a PIC16F876. By looking at the pic16f876.h include file, it appears that it should be possible to set the system registers of the chip by referring to them by name.
For example, within the CCP1CON register, bits 0 to 3 set how the CCP and PWM modules work. By looking at the pic16f876.h file, it looks like it should be possible to refer to these 4 bits alone, without change the value of the rest of the CCP1CON register.
However, I have tried to refer to these 4 bits in a variety of ways with no success.
I have tried;
CCP1CON.CCP1M=0xC0; this results in "error: struct/union required
CCP1CON:CCP1M=0xC0; this results in "error: undefined identifier "CCP1M"
but both have failed. I have read through the Hi Tech PICC compiler manual, but cannot see how to do this.
From the pic16f876.h file, it looks to me as though I should be able to refer to these subsets within the system registers by name, as they are defined in the .h file.
Does anyone know how to accomplish this?
Excerpt from pic16f876.h
// Register: CCP1CON
volatile unsigned char CCP1CON # 0x017;
// bit and bitfield definitions
volatile bit CCP1Y # ((unsigned)&CCP1CON*8)+4;
volatile bit CCP1X # ((unsigned)&CCP1CON*8)+5;
volatile bit CCP1M0 # ((unsigned)&CCP1CON*8)+0;
volatile bit CCP1M1 # ((unsigned)&CCP1CON*8)+1;
volatile bit CCP1M2 # ((unsigned)&CCP1CON*8)+2;
volatile bit CCP1M3 # ((unsigned)&CCP1CON*8)+3;
#ifndef _LIB_BUILD
volatile union {
struct {
unsigned CCP1M : 4;
unsigned CCP1Y : 1;
unsigned CCP1X : 1;
};
struct {
unsigned CCP1M0 : 1;
unsigned CCP1M1 : 1;
unsigned CCP1M2 : 1;
unsigned CCP1M3 : 1;
};
} CCP1CONbits # 0x017;
#endif
You need to access the bitfield members through an instance of a struct. In this case, that is CCP1CONbits. Because it is a bitfield, you only need to have the number of significant bits as defined in the bitfield, not the full eight bits in your code.
So:
CCP1CONbits.CCP1M = 0x0c;
Should be the equivalent of what you are trying to do. If you want to set all eight bits at once you can use CCP1CON = 0xc0. That would set the CCP1M bits to 0x0c and all the other bits to zero.
The header you gave also has individual bit symbols, so you could do this too:
CCP1M0 = 1;
CCP1M1 = 1;
CCP1M2 = 0;
CCP1M3 = 0;
Although the bitfield approach is cleaner.