How can I convert only one file or one function of an elf file to assembly?

How can I convert only one file or one function of an elf file to assembly? - linux-kernel

I have an elf file of a very big code base (kernel). I want to convert it to assembly code. I have base address of a function and offset of the instruction. Using this information, I want to get the specific instruction. I have used "objdump -b binary -m i386 -D file.elf" to get assembly code from elf file, but it is generating 4GB of data. I have also referred to this Can I give objdump an address and have it disassemble the containing function? but it is also not working for me.

You can limit objdump output with --start-address and --stop-address options.
For process code only for the single function, values for these options can be taken from readelf -s output, which contains start address of the function in the section and the function's size, and from readelf -S output, which contains address of the section with the function:
--start-address=<section_start + function_start>
--stop-address=<section_start + function_start + function_size>

I want to convert it to assembly code.
gdb -q ./elf_file
(gdb) set height 0 # prevent pagination
(gdb) set logging on # output will be mirrored in gdb.txt
(gdb) disassemble 0xffff000008081890 0xffff000008081bf5
(gdb) quit
Enjoy!

Related

Reading memory with GDB vmlinux /proc/kcore

I am trying to use gdb to read memory from vmlinux. The exact syntax is
sudo gdb vmlinux-4.18.0-rc1+ /proc/kcore
I use this file because vmlinux is a symlink to this file.
The result is the following
Reading symbols from vmlinux-4.18.0-rc1+...(no debugging symbols found)...done.
warning: core file may not match specified executable file.
[New process 1]
Core was generated by `root=/dev/mapper/rcs--power9--talos--vg-root ro console=hvc0 quiet'.
#0 0x0000000000000000 in ?? ()
(gdb) x/4xb 0xfffffff0
0xfffffff0: Cannot access memory at address 0xfffffff0
(gdb) print &sys_call_table
No symbol table is loaded. Use the "file" command.
(gdb)
The file vmlinux-4.18.0-rc1+ is in /boot. The file type is as follows:
root#rcs-power9-talos:/boot# file vmlinux-4.18.0-rc1+
vmlinux-4.18.0-rc1+: ELF 64-bit LSB executable, 64-bit PowerPC or cisco 7500, version 1 (SYSV), statically linked, BuildID[sha1]=a1c9f3fe22ff5cbf419787657c878c8a07e559b2, stripped
I modified the config-4.18.0-rc1+ file such that every CONFIG_DEBUG option is set to yes. I then rebooted the system. My questions are:
Do I need to do anything else for the changes I made to /boot/config-4.18.0-rc1+ to take effect?
Based on the file type of vmlinux-4.18.0-rc1+, does it seem that this file should work for debugging?
I did not build the kernel myself. It is a custom build from Raptor Computer Systems.

The config-* file you've modified is just for reference - all these options have already been compiled into the kernel, so changing them will not have any effect.
However, you can get any symbol you want in two steps:
consult /proc/kallsyms (e.g. grep sys_call_table /proc/kallsyms). Get the address. Note, that this might appear as 0x00000000 - which can be fixed by setting /proc/sys/kernel/kptr_restrict to 0
Then use above address as direct argument. You will still run into minor issues (e.g. "print" won't know what datatype it is, but x/20x for example will work) , but these can be resolved with a bit of gdb scripting, or providing an external dwarf file.

objcopy is removing a section unless I declare a static volatile variable in that section (using attribute)

I have a linker script in which I have defined a section for containing the checksum of a software image. Something like:
...
.my_checksum :
{
__checksum_is_here = .;
KEEP (*(.my_checksum))
. = ALIGN(4);
_sw_image_code_end = .;
} > IMAGE
...
The checksum is placed into that section by using objcopy --update-section.
I build an elf file by using the arm gcc compiler, and I can see this section and its value within it:
> arm-none-eabu-objdumph -h my_elf_file.elf
...
0 .text 0001496c 08010000 08010000 00010000 2**4
...
7 .my_checksum 00000004 080250c0 080250c0 000350c0 2**2
...
// Notice that 000350c0 is the file offset and 080250c0 is the LMA.
// The starting LMA is 08010000
And I can retrieve its value:
> xxd -s 0x000350c0 -l 4 my_elf_file.elf
000350c0: 015e 028e // I have checked this value and it is correct.
Now I generate a bin file by executing
> arm-none-eabi-objcopy -O binary --gap-fill 0xFF -S my_elf_file.elf my_elf_file.bin
Now, if I try to read the checksum value again, using the difference between the checksum LMA and the first section LMA (see above):
> xxd -s 0x150c0 -l 4 my_elf_file.bin
The result I obtain here is different from the one obtained in the elf file, that is, the checksum section has been removed by objcopy. (That's what I think at least).
Nevertheless, If I define this in my main.c file:
static volatile unsigned int __aux_checksum __attribute__((section(".my_checksum")));
...
int main() {
...
((void)__aux_checksum); // Avoid compiler/linker optimizations.
...
}
Now, if I replicate the same steps as above with the elf and bin files (using the proper offsets), I can retrieve the checksum from the bin file (elf and bin give the same result).
Questions
My first question is: I know that you can define a section using __attribute__((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
My second question is: Is this the only way for preventing objcopy of removing this particular section?

Lets answer your 2nd question first,
Is this the only way for preventing objcopy of removing this particular section?
You need a concept as documented in the gnu LD manual under SECTIONS.
4.6.8.1. Output Section Type
Each output section may have a type. The type is a keyword in parentheses. The following types are defined:
NOLOAD
The section should be marked as not loadable, so that it will not be loaded into memory when the program is run.
DSECT, COPY, INFO, OVERLAY
These type names are supported for backward compatibility, and are rarely used. They all have the same effect: the section should be marked as not allocatable, so that no memory is allocated for the section when the program is run.
The linker normally sets the attributes of an output section based on the input sections which map into it. You can override this by using the section type. For example, in the script sample below, the ROM section is addressed at memory location 0 and does not need to be loaded when the program is run. The contents of the ROM section will appear in the linker output file as usual.
SECTIONS {
ROM 0 (NOLOAD) : { … }
…
}
So what does that mean? Say you have debugging info in your objects. If you are burning a ROM image you probably don't want to place the debugging info in the object. As well, the BSS segment is all zero and there is no need to store it to ROM, but you need to clear our RAM (at the load address) to make way for it. The 'init value' for the .data section is initialized from ROM but resides in RAM. The concepts are 'loadable' and 'allocatable' and they have flags for them in an ELF file. By default your .my_checksum gets no flags. Ie, not allocated and not loadable like debug info.
I know that you can define a section using attribute((section)), but if you use a section already defined within the linker script, does this command changes its behaviour for placing the variable within the section, instead of creating a new one?
From the above,
The linker normally sets the attributes of an output section based on the input sections which map into it.
Your input sections flags get inherited by your output section. So you have put in at least allocatable as a flag.
I would suggest that you just put your checksum at the end of either .text or .data. For instance, input secttions .rodata (constant values) usually get put with the output .text. There is usually no need to invent another output sections unless you want some book keeping that wont get to the final image. Your __checksum_is_here label is sufficient to find it and you can look at this question on CRCs.

MinGW's ld cannot perform PE operations on non PE output file

I know there are some other similar questions about this out there, be it StackOverflow or not. I've researched a lot for this, and still didn't find a single solution.
I'm doing an operative system as a side project. I've been doing all in Assembly, but now I wanna join C code.
To test, I made this assembly code file (called test.asm):
[BITS 32]
GLOBAL _a
SECTION .text
_a:
jmp $
Then I made this C file (called main.c):
extern void a(void);
int main(void)
{
a();
}
To link, I used this file (called make.bat):
"C:\minGW\bin\gcc.exe" -ffreestanding -c -o c.o main.c
nasm -f coff -o asm.o test.asm
"C:\minGW\bin\ld.exe" -Ttext 0x100000 --oformat binary -o out.bin c.o asm.o
pause
I've been researching for ages, and I'm still struggling to find an answer. I hope this won't be flagged as duplicate. I acknowledge about the existence of similar questions, but all have different answers, and none work for me.
Question: What am I doing wrong?

Old MinGW versions had the problem that "ld" was not able to create non-PE files at all.
Maybe current versions have the same problem.
The work-around was creating a PE file with "ld" and then to transform the PE file to binary, HEX or S19 using "objcopy".
--- EDIT ---
Thinking about the question again I see two problems:
As I already said some versions of "ld" have problems creating "binary" output (instead of "PE", "ELF" or whatever format is used).
Instead of:
ld.exe --oformat binary -o file.bin c.o asm.o
You should use the following sequence to create the binary file:
ld.exe -o file.tmp c.o asm.o
objcopy -O binary file.tmp file.bin
This will create an ".exe" file named "binary.tmp"; then "objcopy" will create the raw data from the ".exe" file.
The second problem is the linking itself:
"ld" assumes a ".exe"-like file format - even if the output file is a binary file. This means that ...
... you cannot even be sure if the object code of "main.o" is really placed at the first address of the resulting object code. "ld" would also be allowed to put the code of "a()" before "main()" or even put "internal" code before "a()" and "main()".
... addressing works a bit differently which means that a lot of padding bytes will be created (maybe at the start of the file!) if you do something wrong.
The only possibility I see is to create a "linker script" (sometimes called "linker command file") and to create a special section in the assembler code (because I normally use another assembler than "nasm" I do not know if the syntax here is correct):
[BITS 32]
GLOBAL _a
SECTION .entry
jmp _main
SECTION .text
_a:
jmp $
In the linker script you can specify which sections appear in which order. Specify that ".entry" is the first section of the file so you can be sure it is the first instruction of the file.
In the linker script you may also say that multiple sections (e.g. ".entry", ".text" and ".data") should be combined into a single section. This is useful because sections are normally 0x1000-byte-aligned in PE files! If you do not combine multiple sections into one you'll get a lot of stub bytes between the sections!
Unfortunately I'm not the expert for linker scripts so I cannot help you too much with that.
Using "-Ttext" is also problematic:
In PE files the actual address of a section is calculated as "image base" + "relative address". The "-Ttext" argument will influence the "relative address" only. Because the "relative address" of the first section is typically fixed to 0x1000 in Windows a "-Ttext 0x2000" would do nothing but filling 0x1000 stub bytes at the start of the first section. However you do not influence the start address of ".text" at all - you only fill stub bytes at the start of the ".text" section so that the first useful byte is located at 0x2000. (Maybe some "ld" versions behave differently.)
If you wish that the first section of your file is located at address 0x100000 you should use the equivalent of "-Ttext 0x1000" in the linker script (-Ttext is not used if a linker script is used) and define the "image base" to 0xFF000:
ld.exe -T linkerScript.ld --image-base 0xFF000 -o binary.tmp a.o main.o
The memory address of the ".text" section will be 0xFF000 + 0x1000 = 0x100000.
(And the first byte of the binary file generated by "objcopy" will be the first byte of the first section - representing memory address 0x100000.)

Setting start address to execute raw binary file

Bootloader is seperated into 2 stages. First stage is written in assembly and only loads second stage, second stage is in C. Stage1 loads code in C to address 0x0500:0, and jumps there. Stage2 have to write "hello message" and halt.
I tried different ways to set starting address to raw binary made by: (but nothing worked)
cc -nostartfiles -nostdlib -c stage2.c
ld -s -T scrptfile.ld stage2.o /* I'm using ld just to set starting address of executable */
objcopy -O binary stage2 stage2.bin /* delete all unuseful data */
Linker script
SECTIONS
{
. = 0x0500;
.text : { *(.text)}
.data : { *(.data)}
.bss : { *(.bss)}
}
Maybe I delete with objcopy somethnig that shouldt be deleted.
How can I execute this stage2.bin then?
As I understand, written C code using 32-bits length instructions, when raw binary allows only 16?
P.S. Parameter -set-start (objcopy) returns an error: Invalid bfd target. It is because output file is binary?
Thank you for answers.

. = 0x0500 does not correspond to 0x0500:0. 0x0500:0 is physical address 0x5000, not 0x500.
Also, if you're trying to compile C code as 32-bit and run it in real mode (which is 16-bit), it won't work. You need to either compile code as 16-bit or switch the CPU into 32-bit protected mode. There aren't that many C compilers still compiling 16-bit code. Turbo C++ is one, Open Watcom is another. AFAIK, gcc can't do that.
Finally, I'm guessing you expect the entry point to be at 0x500:0 (0x5000 physical). You need to either tell this to the linker (I don't remember how, if at all possible) or deal with an arbitrary location of the entry point (i.e. extract it from the binary somehow).

Determine load address and entry point of stripped Linux Kernel image

I have a crosscompiling toolchain for an embedded system (mipsel) on my x86 Linux. I know how to build a custom kernel (let's call the image "vmlinux") for it and how to strip that image via
objcopy -S -O binary vmlinux vmlinux.bin
For further processing I also need the load address and entry point of the image. Before stripping it is no problem to determine them via scripts/mksysmap or, more explicitly, via
nm -n vmlinux | grep -v '\( [aNUw] \)\|\(__crc_\)\|\( \$[adt]\)' > System.map
Then I can determine the load address and entry point via
awk '/A _text/ { print "0x"$1; }' < _System.map
awk '/T kernel_entry/ { print "0x"$1; }' < System.map
Now the challenge is that sometimes I do not build the kernel by myself, but get a pre-built kernel after it has already been stripped of its symbols via objcopy. Can anybody tell me how to do this? I am not very proficient in kernel building and toolchain usage. Both nm and objdump do not like the stripped image, saying
vmlinux.bin: File format not recognized

From the objcopy manual page
objcopy can be used to generate a raw binary file by using an output target of binary (e.g., use -O binary). When objcopy generates a raw binary file, it will essentially produce a memory dump of the contents of the input object file. All symbols and relocation information will be discarded. The memory dump will start at the virtual address of the lowest section copied into the output file.
Here is an example that could be used on the PowerPC architecture:
original vmlinux
bash-3.2$ file vmlinux
vmlinux: ELF 32-bit MSB executable, PowerPC or cisco 4500, version 1 (SYSV), statically linked, not stripped
stripped vmlinux is considered a "data" file
bash-3.2$ file vmlinux.bin
vmlinux.bin: data
convert binary to ELF format for the PowerPC
bash-3.2$ powerpc-440fp-linux-objcopy -I binary vmlinux.bin -B powerpc -O elf32-powerpc vmlinux.bin.x
output of vmlinux is now considered an ELF file
bash-3.2$ file vmlinux.bin.x
vmlinux.bin.x: ELF 32-bit MSB relocatable, PowerPC or cisco 4500, version 1 (SYSV), not stripped
You must pass the -I, -B and -O parameter. You can get this parameters from your objcopy documentation.
But since your binary is stripped already trying to decompile it might not be worthwhile since the section information is not available. All of the data in the file will be dumped into the .data secion.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio