I'm building an embedded Linux, and I'm encountering an error caused by the LMA and VMA addresses of various sections not being equal:
> /opt/tc/uclibc-crosstools-gcc-4.6/usr/bin/mips-linux-uclibc-objdump -h vmlinux
...
9 __modver 00000470 802b6b90 802b6b90 002aab90 2**0
ALLOC
10 .data 002f5e20 802b8000 802b7b90 002abb90 2**14
CONTENTS, ALLOC, LOAD, DATA
11 .init.text 0001c020 805ae000 805adb90 005a1b90 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
...
The problem I'm having is that the auto generated linker script (arch/mips/kernel/vmlinux.lds) has the following line:
.init.data : AT(ADDR(.init.data) - 0) { ...}
Which to me would indicate that the .init.text VMA should be equal to the .init.text LMA. I also tried to manually add an AT for .data, such that I had .data : AT(ADDR(.data)) in the script, but that doesn't shift .data back to the correct location either. One item of interest is that the LMA and VMA's differ by 0x470 bytes, which is exactly the size of the __modver section. Can anyone shed any light on why I'm getting this behavior?
(I'm using buildroot 2011.11, uClibc 0.9.32.1, gcc 4.6, and linux 3.2 for a mips architecture.)
Thanks
John
So, I'm answering my own question in case someone else comes across the same issue, it might save them some time -- it turns out there is a bug in the linker. The modver section was empty, but contained an ALIGN directive. It appears as though this confuses the linker, and it throws off the LMA's of all subsequent sections. The solution to this was to force a single byte variable to be included in modver (but not between start_modver and end_modver -- otherwise you introduce new problems...). This fixes the problem. The linker will eventually have to be fixed.
John
Related
The question is about loading portable executable images to a random address.
Let's take kernel32.dll as an example, loaded at 0x75A00000.
I can see that at offset 0x10e15 from the image, there is an assembler instruction, which depends on where the image is located.
address:
75A10E13
bytes:
8B 35 18 03 AE 75
command:
MOV ESI,DWORD PTR DS:[75AE0318]
It turns out that by launching the executable file, we must tell the system that we need to relocation to this address.
The system looks at the relocation table, which is in the executable file, and sees the following:
base relocation table
To get the absolute address of the first element to be moved, I do the following: add the virtual address to the address of the image, and then I add the first element of the block to the resulting number.
0x75A00000 + 0x10000 + 0x3E15 = 75A10E15
it's a good number, but always 0x3000 more than I expect. i just subtract 0x3000 and it works. Please, help me find the answer, where does 0x3000 for x86 come from?
Relocation in Portable Executables were resolved when the file was linked. The base relocation table, which you are referring, has a different function: it is used by Windows loader when the PE could not be loaded at the prefered ImageBase address specified by the linker, usually 0x0040_0000.
Dynamically Loaded Libraries shipped with MS Windows are linked to ImageBase addresses different for each core DLL and chosen not to colide with one another, so an executable which imports usual combination of libraries doesn't have to relocate them.
You misinterpreted the format of base relocation section .reloc.
Those 16bit words TypeOrOffset which follow PageRVA and BlockSize have their Base Relocation Type encoded in four most significant bits.
For instance the first TypeOrOffset entry in you dump 0x3E15 has type IMAGE_REL_BASED_HIGHLOW (3) and offset 0x0E15, which is the number to be added to PageRVA.
Brainhive,
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
My code is compiled to a static library and I don't want to hash the entire binary, only my library.
How do I? Will it help adding an ld script with reserved labels?
System is arm64 and I'm using GNU arm compiler (linaro implementation).
I'm looking for a way to make sure my code wasn't altered, initial thought was to find the start address of the .text section and the size of it, run md5sum (or other hash) and compare to a constant.
There are several reasons your request is likely misguided:
Anybody who is willing to modify your compiled code, will also be willing to modify the checksum that you are going to compare against at runtime. That is, you appear to want to do something like:
/* 0xabcd1234 is the precomputed checksum over the library. */
if (checksum_over_my_code() != 0xabcd1234) abort();
The attacker can easily replace this entire code with a sequence of NOP instructions, and proceed to use your modified library.
Your static library (usually) doesn't end up as sequence of bytes in the final binary. If you have foo.o and bar.o in your library, and the end-user links your library with his own code in main.o and baz.o, then the .text section of the resulting executable could well be composed of .text from main.o, then .text from foo.o, then .text from baz.o, and finally .text from bar.o.
When the final executable is linked, the instructions in your library are updated (relocated). That is, suppose your original code has CALL foo instruction. The actual bytes in your .text section will be something like 0xE9 0x00 0x00 0x00 0x00 (with a relocation record stating that the bytes following 0xE9 should be updated with whatever the final address of foo ends up being).
After the link is done, and assuming foo ends up at address 0x08010203, the bytes in .text of the executable will no longer be 0s. Instead they'll be 0xE9 0x03 0x02 0x01 0x08 (they actually wouldn't be that for reasons irrelevant here, but they certainly wouldn't be all 0s).
So computing the checksum over actual .text section of your archive library is completely pointless.
There are tools that allow you to dump an ELF section. elfcat makes it super easy, (elfcat --section-name=test the_file.o) but it should also be doable with objdump too. Once you've dumped the section, the problem is reduced to sizing and hashing a file.
According to the ld manual on Output Section Description:
section [address] [(type)] :
[AT(lma)]
[ALIGN(section_align) | ALIGN_WITH_INPUT]
[SUBALIGN(subsection_align)]
[constraint]
{
output-section-command
output-section-command
...
} [>region] [AT>lma_region] [:phdr :phdr ...] [=fillexp] [,]
The address or >region stand for the VMA, i.e. the Virtual Memory Address of the output section.
The AT() or AT>lma_region stand for the LMA, i.e. the Load Memory Address of the output section.
And I decide get a close view with readelf -e to dump the section headers and program headers of a helloworld elf file. The result is below:
My questions are:
Why there's no LMA in the dumped headers? How is LMA represented in ELF file?
What does the Addr column in the red rectangle mean? VMA?
What does the PhysAddr in the green rectangle mean?
ADD 1
So far, It seems the PhysAddr is the LMA.
Why there's no LMA in the dumped headers? How is the LMA represented
in an ELF file
Firstly there is no LMA header within an elf file, it is actually quiet simple, multiple sections in an ELF file are mapped into segments, if the sections mapped into segments have a LOAD flag for example (PROFBITS) is a loadable section type, and the segment they are mapped into is also a load type segment (INTERP and LOAD) for example are also loadable segments, that means every section within that segment within that elf file would be loaded into memory. where? simply to the VMA they were given, so no there is no LMA in an elf file, a LMA is represented by a VMA given that the section should be loaded which is a specified type / flag.
What does the addr column in the red rectangle mean?
This has a direct correlation to your previous question, Yes! it does mean a VMA, in order to have this properly explained we need to understand that an ELF format was designed for architectures that support some memory protection / memory segmentation.
you might want to give some section special permissions, instead of giving every section it's own memory protection, you'll map multiple sections into a segment and give that sole segment it's own memory protections.
This causes the need to map sections into segment, how would the OS loader know how to map each section into segment and by that give it the appropriate memory protection? by it's address.
Each section is also given an address and by those addresses / offsets / sizes they are mapped into a segment which in overall would be allocated into memory and given some memory protection rules that would apply to all sections.
The only way that the OS could know how to map these is by address so yes if the section is of a loadable type it's ADDR means VMA
( at least for modern systems that use Virtual Memory and dont abuse the elf file )
What does the PhysAddr mean?
As much as I know, PhysAddr is only relevant to old fashioned architectures in which physical addressing is relevant to user-space programs, this section should hold the actual physical address the segment would sit in, yet in most modern systems this is simply ignored...
I suggest you read this http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf,
personally back in the day when learning this, it helped me a lot and gave me a lot of knowledge regarding ELF files
hopefully I've helped you some how! :)
I believe that the MODULE_VERSION does not work if the driver is statically compiled into the kernel. The version number was no where to be seen in the sysfs. the modinfo does not work as its not a loaded module.
So Whats the best way for to wither get the MODULE_VERSION of this driver or encode version number in the driver. Is there a standard way of doing this or should I simply use sysfs?
First of all, there is no much sense to have a module version for in tree modules. Otherwise it is kept is special section called __modver.
$ objdump -h ~/prj/TMP/out/mfld/vmlinux -j __modver
/home/andy/prj/TMP/out/mfld/vmlinux: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
12 __modver 00000c40 c1a003c0 01a003c0 00a013c0 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
It contains pointers to corresponding structures defined in include/linux/module.h in macro MODULE_VERSION.
I have attempted the following test to see if the .data section gets loaded into memory when the program is executed:
global _start
section .data
arr times 99999999 DB 0xAF
section .text
_start:
jmp _start ; prevent process from terminating
Assemble and link:
nasm -f win32 D:\file.asm
link D:\file.obj /OUT:D:\file.exe /ENTRY:start /SUBSYSTEM:CONSOLE
I have executed the program, and the result was the following:
As you can see the program only occupied 276 KB of memory while it has an array with a size of 99999999 bytes!
The paging model on most systems will cause the pages comprising the sections of the binary not requiring some kind of dynamic linking to only be loaded when they are accessed - Windows is no exception. So, the .data section is memory-mapped as a binary file to your process memory space, but is not actually swapped in until you need it. The process monitor only reports the memory actually in by default, although you can configure the columns to show all of the memory in the image, also. There may also be compiler options you can use to change the paging behavior, and you can always remap the memory manually (perhaps locking it in) if you need.