Tool to "un-define" a symbol in a relocatable ELF symbol table - gcc

Is there any utility to patch arbitrary symbols in ELF symbol table so that defined symbol becomes undefined? For example here is readelf --syms for a file that I'm going to process
Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
...
5: 0000000000000000 13 FUNC WEAK DEFAULT 3 my_message
6: 0000000000000000 19 FUNC GLOBAL DEFAULT 5 print_msg
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
And here is expected output for the same binary where my_message has been un-defined:
Symbol table '.symtab' contains 8 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
...
5: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND my_message
6: 0000000000000000 19 FUNC GLOBAL DEFAULT 5 print_msg
7: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND puts
An ELF file itself is relocatable. Modification should alter only symbol table. Actual section that contains original symbol definition should remain unchanged.
I've checked GNU Binutils and objcopy might be what I'm looking for but so far I haven't figured out any option (or combination) that would give me above described behavior.
In fact such tool should be straightforward enough to implement (even with no extra libraries like BFD), but I'm wondering if there is some existing thing that I might miss.

You may look at the 'anonymizer' example of ELFIO library. The example overrides a symbol's name. Overriding of symbol's type can be implemented similarly. But processing of '.symtab' section will be required.
Not exactly the tool, but, a library that permits to implement such tool.

Related

How should I apply add-symbol-file command during u-boot linux boot debug?

I'm following linux bootloading using u-boot (using SPL falcon mode where u-boot-spl launches linux directly) on a qemu virtual machine. Now the code jumped to linux kernel and because I have done add-symbol-file vmlinux 0x80081000 I can follow the kernel code step by step using gdb connected to the virtual machine. Actually I loaded the kernel image to 0x80080000 but I had to set the address to 0x80081000 to make the source code appear on the gdb correctly according to the PC value(I don't know why this difference of 0x1000 is needed).
Later I found the kernel sets up the page table (identity mapping and swap table) and jumps to __primary_switched and this is where pure kernel virtual address is used first time for the PC. This is where the call is made at the end of the head.S file.
ldr x8, =__primary_switched
adrp x0, __PHYS_OFFSET
br x8
In the symbol file (vmlinux, an elf file), the symbols before __primary_switched are all mapped at virtual addresses (starting with 0xffffffc0..... high addresses) but the gdb could follow the source even when the PC value was using physical address. (The PC was initially loaded with physical address of the kernel start and PC relative jumps were being used until it jumps to __primary_switched, mmu disabled or using identity mapping) So does this mean, in doing add-symbol-file only the offset of the symbols from the start of text matters?
Another quetion : I can follow the kernel source with gdb but after __primary_switched, I cannot see the source. The debugger doesn't show the correct source location according to the now kernel virtual PC value. Should I tell the debugger to use correct offset using add-symbol-file again? if so how?
ADD (8:32 AM Wednesday, January 12, 2022, UTC)
I found from gdb manual,
"add-symbol-file filename [ -readnow | -readnever ] [ -o offset ] [
textaddress ] [ -s section address ... ] The add-symbol-file command
reads additional symbol table information from the file filename. You
would use this command when filename has been dynamically loaded (by
some other means) into the program that is running. The textaddress
parameter gives the memory address at which the file's text section
has been loaded. You can additionally specify the base address of
other sections using an arbitrary number of '-s section address'
pairs. If a section is omitted, gdb will use its default addresses as
found in filename. Any address or textaddress can be given as an
expression. ..."
I changed my program a little bit to fix a problem. The readelf shows the .text section starting at ffffffc010080800.
So I adjusted the command to "add-symbol-file vmlinux 0x80000800" and gdb shows the kernel source correct after jump to linux.
Still it doesn't show me the source code after __primary_switched.
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .head.text PROGBITS ffffffc010080000 00010000
0000000000000040 0000000000000000 AX 0 0 4
[ 2] .text PROGBITS ffffffc010080800 00010800
0000000000304370 0000000000000000 AX 0 0 2048
[ 3] .rodata PROGBITS ffffffc010390000 00320000
.... (skip) ...
[12] .notes NOTE ffffffc01045be18 003ebe18
000000000000003c 0000000000000000 A 0 0 4
[13] .init.text PROGBITS ffffffc010470000 003f0000
0000000000027ec8 0000000000000000 AX 0 0 4
[14] .exit.text PROGBITS ffffffc010497ec8 00417ec8
000000000000046c 0000000000000000 AX 0 0 4
Since '__primary_switched' resides in section .init.text, I tried adding "-s .init.text 0xffffffc010470000" or "-s .init_text 0x803ef800"(physcial
address) to the add-symbol-file command to no avail. Is my command wrong? Or could this be from page table (virtual -> Physical) problem because I see synchronous exception right after I enter __primary_switched (I see PC value has become 0x200. If the exception vector is located in 0x0, this is the vector entry for synch exception like undefined instruction. I should also check the vector base address has not been set correctly.)
I found my kernel load address was wrong (__PHYS_OFFSET was below physical ddr address start).
After fixing it, the PC increments normally with kernel virtual address and I should just apply the add-symbol-file command using the virtual address.
This was the new section addresses.
Section Headers:
[Nr] Name Type Address Offset
Size EntSize Flags Link Info Align
[ 0] NULL 0000000000000000 00000000
0000000000000000 0000000000000000 0 0 0
[ 1] .head.text PROGBITS ffffffc010080000 00010000
0000000000000040 0000000000000000 AX 0 0 4
[ 2] .text PROGBITS ffffffc010080800 00010800
0000000000304370 0000000000000000 AX 0 0 2048
[ 3] .rodata PROGBITS ffffffc010390000 00320000
00000000000a6385 0000000000000000 WA 0 0 4096
[ 4] .modinfo PROGBITS ffffffc010436385 003c6385
00000000000018ff 0000000000000000 A 0 0 1
[ 5] .pci_fixup PROGBITS ffffffc010437c90 003c7c90
00000000000020f0 0000000000000000 A 0 0 16
[ 6] __ksymtab PROGBITS ffffffc010439d80 003c9d80
0000000000006d20 0000000000000000 A 0 0 4
[ 7] __ksymtab_gpl PROGBITS ffffffc010440aa0 003d0aa0
0000000000005808 0000000000000000 A 0 0 4
[ 8] __ksymtab_strings PROGBITS ffffffc0104462a8 003d62a8
00000000000134f2 0000000000000000 A 0 0 1
[ 9] __param PROGBITS ffffffc0104597a0 003e97a0
0000000000000b68 0000000000000000 A 0 0 8
[10] __modver PROGBITS ffffffc01045a308 003ea308
0000000000000cf8 0000000000000000 A 0 0 8
[11] __ex_table PROGBITS ffffffc01045b000 003eb000
0000000000000e18 0000000000000000 A 0 0 8
[12] .notes NOTE ffffffc01045be18 003ebe18
000000000000003c 0000000000000000 A 0 0 4
[13] .init.text PROGBITS ffffffc010470000 003f0000
0000000000027ec8 0000000000000000 AX 0 0 4
[14] .exit.text PROGBITS ffffffc010497ec8 00417ec8
The final kernel image is loaded at 0x80080000. Then __PHYS_OFFSET becomes 0x80000000. (TEXT_OFFSET is 0x80000 by default). Now I can debug the kernel source before __primary_switch using this command.
add-symbol-file images/vmlinux 0x80080800 -s .head.text 0x80080000 -s .init.text 0x803f7800
And after the kernel entered __primary_switched (now kernel virtual address is used), I added this command to see the source and I can follow code using qemu and gdb step-by-step.
add-symbol-file images/vmlinux 0xffffffc010080800 -s .head.text 0xffffffc010080000 -s .init.text 0xffffffc010470000 Hope this helps someone later.
But after some days, I think I could just use add-symbol-file images/vmlinux 0xffffffc010080800 (applying all the section info).

NASM Assembly Pe32 - What is the Optional Header Data Directory value

I'm trying to recode an existing EXE from scratch and having a problem figuring out what value the IMAGE_OPTIONAL_HEADER struct element "DataDirectory" has.
It's part of the Pe32 header.
I'm using NASM and the WIN32N.INC file.
I know that the IMAGE_OPTIONAL_HEADER struct element "DataDirectory" has the size DQ. Thats because the struct "DataDirectory" has the elements "VirtualAddress" and "isize" which are both DD.
STRUC IMAGE_DATA_DIRECTORY
.VirtualAddress RESD 1
.isize RESD 1
ENDSTRUC
STRUC IMAGE_OPTIONAL_HEADER
.Magic RESW 1
.MajorLinkerVersion RESB 1
.MinorLinkerVersion RESB 1
.SizeOfCode RESD 1
.SizeOfInitializedData RESD 1
.SizeOfUninitializedData RESD 1
.AddressOfEntryPoint RESD 1
.BaseOfCode RESD 1
.BaseOfData RESD 1
.ImageBase RESD 1
.SectionAlignment RESD 1
.FileAlignment RESD 1
.MajorOperatingSystemVersion RESW 1
.MinorOperatingSystemVersion RESW 1
.MajorImageVersion RESW 1
.MinorImageVersion RESW 1
.MajorSubsystemVersion RESW 1
.MinorSubsystemVersion RESW 1
.Reserved1 RESD 1
.SizeOfImage RESD 1
.SizeOfHeaders RESD 1
.CheckSum RESD 1
.Subsystem RESW 1
.DllCharacteristics RESW 1
.SizeOfStackReserve RESD 1
.SizeOfStackCommit RESD 1
.SizeOfHeapReserve RESD 1
.SizeOfHeapCommit RESD 1
.LoaderFlags RESD 1
.NumberOfRvaAndSizes RESD 1
.DataDirectory RESQ 1
ENDSTRUC
So what exact values does the DataDirectory elements have? There are way more Data Directory then just one. Like Export directory RVA + size, Import directory RVA + size etc.
Do I just put the Offset of the first virtual Address in "VirtualAddress" and its size in "isize"? That would be my guess but I'm not sure about it.
It is an array of IMAGE_DATA_DIRECTORY structs. MSDN tells you what the struct looks like:
typedef struct _IMAGE_DATA_DIRECTORY {
DWORD VirtualAddress;
DWORD Size;
} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
The NumberOfRvaAndSizes field tells you how many there are. Usually 16 but there can be fewer.
Each directory tells you the offset and size of the thing they "point" to. The IMAGE_DIRECTORY_ENTRY_* defines tells you what they are. For example, IMAGE_DIRECTORY_ENTRY_DEBUG is 6 and tells you the location of IMAGE_DEBUG_DIRECTORY and the total size of it and it's data.
For more information, see the PE/COFF format documentation and the Matt Pietrek "An In-Depth Look into the Win32 Portable Executable File Format" and "Peering Inside the PE: A Tour of the Win32 Portable Executable File Format" MSDN/MSJ articles.
Each entry contains an RVA and size.
The most important ones are:
the one at index 0 [export directory],
the one at index 1 [import directory],
the one at index 5 [relocation table].
Now, depending on what you try to achieve, this table may be completely useless to you.
It is, in fact, a kind of "shortcut" for the loader, allowing it to quickly lookup particular portions of data without having to iterate all the section header table stuff before. Thus, it is really only usefull for execution-time. If you just want to inspect the PE-file without it beeing loaded into virtual memory, it will not provide any usefull information.
As Anders already told, there are usually 16 of them, although in my PE-file I'm currently researching I can find only 10 (as the field NumberOfRvaAndSizes tells me, essentially the last entry of the optionla header, you called it .DataDirectory and seem to have found it to be a QUADWORD, but fyi it should really be a DOUBLEWORD. At least if I interpret your RESQ entry correctly).
EDIT: Turned out that there are indeed 16 entries, since the value "10" is in hexadecimal form...

How do you build Openssl_1.0.0 version 4

I have a binary that I am trying to run that seems to specifically require OPENSSL_1.0.0 version 4:
Version needs section '.gnu.version_r' contains 2 entries:
Addr: 0x00000000080486ac Offset: 0x0006ac Link: 6 (.dynstr)
000000: Version: 1 File: libcrypto.so.1.0.0 Cnt: 1
0x0010: Name: OPENSSL_1.0.0 Flags: none Version: 4
I have checked the source code out of openssl git and built 1.0.0-stable, but can't figure out how to specifically build what is needed by the binary.
What release should I checkout from the openssl repo and how do I compile it so that it would be usable by this binary?
Here are other possibly relivant fields from it's elf header:
ELF Header:
Magic: 7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
Class: ELF32
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Intel 80386
Version: 0x1
Entry point address: 0x804a6e0
Start of program headers: 52 (bytes into file)
Start of section headers: 214412 (bytes into file)
Flags: 0x0
Size of this header: 52 (bytes)
Size of program headers: 32 (bytes)
Number of program headers: 9
Size of section headers: 40 (bytes)
Number of section headers: 37
Section header string table index: 34
Relocation section '.rel.plt' at offset 0x73c contains 37 entries:
Offset Info Type Sym.Value Sym. Name
0805b038 00000d07 R_386_JUMP_SLOT 00000000 MD5#OPENSSL_1.0.0
Dynamic section at offset 0x11f0c contains 25 entries:
Tag Type Name/Value
0x00000001 (NEEDED) Shared library: [libcrypto.so.1.0.0]
Symbol table '.dynsym' contains 43 entries:
Num: Value Size Type Bind Vis Ndx Name
13: 00000000 0 FUNC GLOBAL DEFAULT UND MD5#OPENSSL_1.0.0 (4)
Version symbols section '.gnu.version' contains 43 entries:
Addr: 0000000008048656 Offset: 0x000656 Link: 5 (.dynsym)
00c: 2 (GLIBC_2.0) 4 (OPENSSL_1.0.0) 2 (GLIBC_2.0) 0 (*local*)
The version you see there is not the version of the library but the (arbitrary) version of the symbol within the table. As jww already pointed out, that refers to OPENSSL_1.0.0 which is the symbol version string, and relates to the 1.0.x release branch.
See https://blog.flameeyes.eu/2011/06/gold-readiness-obstacle-2-base-versioning/ for one random reference to symbol versioning that I wrote about before.

power8 assembly code with shared build issue with save and restore of TOC

I have the following assembly code
.machine power8
.abiversion 2
.section ".toc","aw"
.section .text
GLOBAL(myfunc)
myfunc:
stdu 1,-240(1)
mflr 0
std 0, 0*8(1)
mfcr 8
std 8, 1*8(1)
std 2, 2*8(1)
# Save all non-volatile registers R14-R31
std 14, 4*8(1)
...
# Save all the non-volatile FPRs
...
stwu 1, -48(1)
bl function_call
nop
addi 1, 1, 48
ld 0, 0*8(1)
mtlr 0
ld 8, 1*8(1)
ld 2, 2*8(1)
...
# epilogue, restore stack frame
This works fine with static build but shared build gives segmentation fault in
00000157.plt_call.__tls_get_addr_opt##GLIBC_2.22, should the shared build be handled differently in power8 w.r.t TOC?
The calling convention is the same between POWER 8 and previous processors. However, there has been changes with regards to the TOC pointer (r2) handling between ABIv1 and ABIv2.
In ABIv2, the caller does not establish the TOC pointer in r2; the called function should do this for global entry points (ie, where the TOC pointer may not be the same as that used in the callee). To do this, ABIv2 functions will have a prologue that sets r2:
0000000000000000 <foo>:
0: 00 00 4c 3c addis r2,r12,0
4: 00 00 42 38 addi r2,r2,0
- this depends on r12 containing the address of the function's global entry point (those 0 values will be replaced with actual offsets at final link time).
I don't see any code setting r12 appropriately in your example. Are you sure you're complying with the v2 ABI there?
The ABIv2 spec is available here: https://members.openpowerfoundation.org/document/dl/576 Section 2.3.2 will be the most relevant for this issue.

Reassigning non-absolute variables in OSX's assembler

The following assembler directives, when compiled with clang on OSX, produce an error:
.set link,0
test:
.int link
.set link,test
test2:
.int link
.set link,test2
The error:
$ clang test.s
test.s:7:13: error: invalid reassignment of non-absolute variable 'link'
.set link,test2
^
I want to use link in a macro as a variable that keeps track of the last defined word, to build a linked list (as in JONESFORTH).
As far as I know, you can't redefine normal symbols. The way I see it, you have two choices. Either you allocate a local label number to store your link address (as these can be redefined) or you use preprocessed assembly. For both cases, you probably want to use a macro to declare your nodes.
Example:
.macro declare_node list_id
.ifndef link_head_\list_id
link_head_\list_id : .int 0
.else
.int \list_id\()b-4
.endif
\list_id :
.endm
test:
declare_node 100
.int 42 # node data
test2:
declare_node 100
.int 314 # node data
test3:
declare_node 101
.int 173 # node data
test4:
declare_node 101
.int 141 # node data
Here, a numerical list id is used as the local label, so you can declare multiple lists.
I have the same problem (jonesforth). I have not found out why apples assembler doesn't allow to redefine symbols, but it is what it is.
I worked around this by manually passing the last defined word as an argument to the defword macro. It's ugly as hell, and error prone.
.macro defcode name, length, flags, name, link
.const_data
.balign 8
.globl name_\name
name_\name :
.quad \link // link
.byte \flags+\length // flags + length byte
.ascii \name // the name
.balign 8 // padding to next 8 byte boundary
.globl \name
\name :
.quad code_\name // codeword
.text
.balign 8
.globl code_\name
code_\name : // assembler code follows
.endmacro
Then call the macro like
defcode "BRANCH",6,0,BRANCH,name_TICK
...
NEXT
defcode "0BRANCH",7,0,ZBRANCH,name_BRANCH
...
NEXT
I'd be super excited to learn about better ways to handle it.

Resources