How do you go about knowing what is happening in a JIT'ed code? - debugging

I am working with Firefox on a research project. Firefox makes uses of lots of JIT'ed code during run time.
I instrumented Firefox using a custom PIN tool to find out locations(address) of some things I as looking for. The issue is that those location are in JIT'ed code. I want to know what is actually happening over there in the code.
To do this I dumped the corresponding memory region and used objdump to disassemble the dump.
I used objdump -D -b binary -mi386 file.dump to see the instructions that would have been executed. To my surprise the only section listed is .data section (a very big one).
Either i am incorrectly disassembling it or something else is wrong with my understanding. I expect to see more sections like .text where actual executable instructions should be present and .data section should not be executable.
Am I correct in my understanding here?
Also If some one can please advise me on how to properly know what is happening in Jit'ed code.
Machine
Linux 3.13.0-24-generic #47-Ubuntu SMP x86_64

or something else is wrong with my understanding
Yes: something else is wrong with your understanding.
Sections (such as .text and .data) only make sense at static link time (the static linker groups .text from multiple .o files together into a single .text in the final executable). They are not useful, and in fact could be completely stripped, at execution time. On ELF systems, all that you need at runtime are segments (PT_LOAD segments in particular), which you can see with readelf -l binary.
Sections in ELF file are "parts of the file". When you dump memory, sections don't make any sense to even talk about.
The .data that you see in objdump output is not really there either, it's just an artifact that objdump manufactures.

Related

what are the two numbers for an instruction location in objdump of a kernel module? [duplicate]

Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?
It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)
Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb
In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.

instruction point value of dynamic linking and static linking

By using Intel's pin, I printed out the instruction pointer (ip) values for a program with dynamic linking and static linking.
And I've found that their ip values are quite different, even though they are the same program.
A program with static linking shows 0x400f50 for its very first ip value.
but a program with dynamic linking shows 0x7f94f0762090 for its first ip value
I am not sure why they have that quite a large gap.
It would be appreciated if anyone could help me find out the reason
I am not sure why they have that quite a large gap.
Because a dynamically linked program does not start executing in the binary: the first few thousands of instructions are executed in the dynamic linker (ld-linux), before control is transferred to _start in the main executable.
See also this answer.

Compiling and linking NASM and 64-bit C code together into a bootloader [duplicate]

This question already has an answer here:
Relocation error when compiling NASM code in 64-bit mode
(1 answer)
Closed 4 years ago.
I made a very simple 1 stage bootloader that does two main things: it switches from 16 bit real mode to 64 bit long mode, and it read the next few sectors from the hard disk that are for initiating the basic kernel.
For the basic kernel, I am trying to write code in C instead of assembly, and I have some questions regarding that:
How should I compile and link the nasm file and the C file?
When compiling the files, should I compile to 16 bit or 64 bit? since I am switching from 16 to 64 bits.
How would I add more files from either C or assembly to the project?
I rewrote the question to make my goal more clear, so if source code is needed tell me to add it.
Code: https://github.com/LatKid/BasicBootloaderNASMC
since I am also linking a nasm file with the C file, it spits an error from the nasm object file, which is relocation R_X86_64_16 against .text' can not be used when making a shared object; recompile with -fPIC
One of your issues is probably inside that nasm assembler file (which you don't show in the initial version of your question). It should contain only position-independent code (PIC) so cannot produce an object file with relocation R_X86_64_16 (In your edited question, mov sp, main is obviously not PIC, you should use instruction pointer relative data access of x86-64, and you cannot define main both in your nasm file and in a C file, and you cannot mix 16 bits mode with 64 bits mode when linking).
Study ELF, then the x86-64 ABI to understand what kind of relocations are permitted in a PIC file (and what constraints an assembler file should follow to produce a PIC object file).
Use objdump(1) & readelf(1) to inspect object files (and shared objects and executables).
Once your nasm code produces a PIC object file, link with gcc and use gcc -v to understand what happens under the hoods (you'll see that extra libraries and object files, including crt0 ones, -lgcc and -lc, are used).
Perhaps you need to understand better compilation and linking. Read Levine's book Linkers and Loaders, Drepper's paper How To Write Shared Libraries, and -about compilation- the Dragon book.
You might want to link with gcc but use your own linker script. See also this answer to a very related question (probably with motivations similar to yours); the references there are highly relevant for you.
PS. Your question lacks motivation and context (it has no MCVE but needs one) and might be some XY problem. I guess you are on Linux. I strongly recommend publishing your actual full code -even buggy- (perhaps on github or gitlab or elsewhere) as free software to get potential help. I strongly recommend using an existing bootloader (probably GRUB) and focus your efforts on your OS code (which should be published as free software, to get some feedback).

Where is _start symbol likely to be defined

I have some startup assembly for RISCV which defines the .text section as beginning at .globl _start.
I know what this is - as a disassembly shows me the address, but I cannot see where it is defined. It's not in the linker script and a grep in the build directories shows it is in various binary files, but I cannot find a definition.
I am guessing this appears in a file somewhere as a function of the architecture, but can anyone tell me where? (This is all being built using RISCV GNU cross compilers on Linux)
Unless you control it yourself there is usually at least in the gnu tools world a file called crt0.s. Or perhaps some other name. Should be one per architecture since it is in assembly language. It is the default bootstrap, zeros .bss copies .data as needed, etc.
I dont remember if it is part of the C library (glibc, newlib, etc), or if it is added on later by folks that build a toolchain targeting some specific platform.
Not required certainly but it is not uncommon to see _start be the label of the beginning of the binary, it is supposed to be the entry point certainly. So if you have an operating system/loader that uses a binary with labels present (elf, etc), then it can load the binary and instead of branching to the first address it branches to the entry point.
So the _start is merely defined as being at the start of the .text section, and the address of the .text section is defined in the linker script.

Flash size of a bare metal ARM program

How to know the flash size of a bare metal arm code. If I have the elf is it possible to know how much flash will be required to store the program? For example if I have the elf file that is supposed to go into a ARM based MCU, how can I determine how much of the MCU's flash will be consumed by the code?
The ELF headers should contain the information you need. You can use either the objdump (with -h) or readelf tool to read these. Those tools should be included with your toolchain.
Basically, you're looking to add up the size all the loadable sections, such as .text and .data. Look for the LOAD flag in the output from objdump, for example.
You can ignore non-loadable sections such as .comment, .debug and .bssĀ· Some of those are there for the benefit of the debugger, for example, and some are just placeholders for memory that will be used by the program at run time, but contains no pre-existing data.
When I say "add up the size", that's not strictly true; the linker will have already allocated each section to a specific address in flash (I'm assuming your program will run directly from ROM), so you need to find the end-address of the last section to determine how much is left.

Resources