which start_kernel() function is used? - linux-kernel

When I was going through the linux kernel code I found the entry point to kernel as i386_start_kernel() function which is doing all early setup and then start_kernel() is called.
Now when I searched for start_kernel() I found it is defined in several .c files as such:
Bootp.c, Main.c under /boot and again Main.c under /init.
As per my understanding it should be from Main.c under /init. But still I am not so cleared about it. It would be great if someone can confirm which start_kernel() is called or explain which start_kernel() is for what?

The start_kernel from the init/main.c is a entry point to the generic kernel code. All other start_kernel functions which you can find in the linux kernel soure code, do architecture-specific job.

Related

what are the two numbers for an instruction location in objdump of a kernel module? [duplicate]

Consider the following Linux kernel dump stack trace; e.g., you can trigger a panic from the kernel source code by calling panic("debugging a Linux kernel panic");:
[<001360ac>] (unwind_backtrace+0x0/0xf8) from [<00147b7c>] (warn_slowpath_common+0x50/0x60)
[<00147b7c>] (warn_slowpath_common+0x50/0x60) from [<00147c40>] (warn_slowpath_null+0x1c/0x24)
[<00147c40>] (warn_slowpath_null+0x1c/0x24) from [<0014de44>] (local_bh_enable_ip+0xa0/0xac)
[<0014de44>] (local_bh_enable_ip+0xa0/0xac) from [<0019594c>] (bdi_register+0xec/0x150)
In unwind_backtrace+0x0/0xf8 what does +0x0/0xf8 stand for?
How can I see the C code of unwind_backtrace+0x0/0xf8?
How to interpret the panic's content?
It's just an ordinary backtrace, those functions are called in reverse order (first one called was called by the previous one and so on):
unwind_backtrace+0x0/0xf8
warn_slowpath_common+0x50/0x60
warn_slowpath_null+0x1c/0x24
ocal_bh_enable_ip+0xa0/0xac
bdi_register+0xec/0x150
The bdi_register+0xec/0x150 is the symbol + the offset/length there's more information about that in Understanding a Kernel Oops and how you can debug a kernel oops. Also there's this excellent tutorial on Debugging the Kernel
Note: as suggested below by Eugene, you may want to try addr2line first, it still needs an image with debugging symbols though, for example
addr2line -e vmlinux_with_debug_info 0019594c(+offset)
Here are two alternatives for addr2line. Assuming you have the proper target's toolchain, you can do one of the following:
Use objdump:
locate your vmlinux or the .ko file under the kernel root directory, then disassemble the object file :
objdump -dS vmlinux > /tmp/kernel.s
Open the generated assembly file, /tmp/kernel.s. with a text editor such as vim. Go to
unwind_backtrace+0x0/0xf8, i.e. search for the address of unwind_backtrace + the offset. Finally, you have located the problematic part in your source code.
Use gdb:
IMO, an even more elegant option is to use the one and only gdb. Assuming you have the suitable toolchain on your host machine:
Run gdb <path-to-vmlinux>.
Execute in gdb's prompt: list *(unwind_backtrace+0x10).
For additional information, you may checkout the following resources:
Kernel Debugging Tricks.
Debugging The Linux Kernel Using Gdb
In unwind_backtrace+0x0/0xf8 what the +0x0/0xf8 stands for?
The first number (+0x0) is the offset from the beginning of the function (unwind_backtrace in this case). The second number (0xf8) is the total length of the function. Given these two pieces of information, if you already have a hunch about where the fault occurred this might be enough to confirm your suspicion (you can tell (roughly) how far along in the function you were).
To get the exact source line of the corresponding instruction (generally better than hunches), use addr2line or the other methods in other answers.

How to read instructions retired using the perf-interface inside a LKM?

How can I read from the PMU from inside Kernel space?
For a profiling task I need to read the retired instructions provided by the PMU from inside the kernel. The perf_event_open systemcall seems to offer this capability. In my source code I
#include <linux/syscalls.h>
set my parameters for the perf_event_attr struct and call the sys_perf_event_open(). The mentioned header contains the function declaration. When checking "/proc/kallsyms", it is confirmed that there is a systemcall with the name sys_perf_event_open. The symbol is globally available indicated by the T:
ffffffff8113fe70 T sys_perf_event_open
So everything should work as far as I can tell.
Still, when compiling or inserting the LKM I get a warning/error that sys_perf_event_open does not exist.
WARNING: "sys_perf_event_open" [/home/vagrant/mods/lkm_read_pmu/read_pmu.ko] undefined!
What do I need to do in order to get those retired instructions counter?
The /proc/kallsyms file shows all kernel symbols defined in the source. Right, the capital T indicates a global symbol in the text section of the kernel binary, but the meaning of "global" here is according to the C language. That is, it can be used in other files of the kernel itself. You can't call a kernel function from a kernel module just because it's global.
Kernel modules can only use kernel symbols that are exported with EXPORT_SYMBOL in the kernel source code. Since kernel 2.6.0, none of the system calls are exported, so you can't call any of them from a kernel module, including sys_perf_event_open. System calls are really designed to be called from user space. What this all means is that you can't use the perf_event subsystem from within a kernel module.
That said, I think you can modify the kernel to add EXPORT_SYMBOL to sys_perf_event_open. That will make it an exported symbol, which means it can be used from a kernel module.

How mingw32-g++ compiler know where to inject system calls in the WIN32 machine executable?

Today, only for the testing purposes, I came with the following idea, to create and compile a naive source code in CodeBlocks, using Release target to remove the unnecessary debugging code, a main function with three nop operations only to find faster where the entry point for the main function is.
CodeBlocks sample naive program:
Using IDA disassembler, I have seen something strange, OS actually can add aditional machine code calls in the main function (added implicitly), a call to system function which reside in kernel32.dll what is used for OS thread handling.
IDA program view:
In the machine code only for test reason the three "nop" (90) was replaced by "and esp, 0FFFFFFF0h", program was re-pached again, this is why "no operation" opcodes are not disponible in the view.
Observed behaviour:
It is logic to create a new thread for each process is opened, as we can explore it in the TaskManager, a process run in it's own thread, that is a reason why compiler add this code (the implicit default thread).
My questions:
How compiler know where to "inject" this call code automatically?
Why this call is not made before in the upper function (sub_401B8C) which will route to main function entry point?
To quote the gcc manual:
If no init section is available, when GCC compiles any function called
main (or more accurately, any function designated as a program entry
point by the language front end calling expand_main_function), it
inserts a procedure call to __main as the first executable code after
the function prologue. The __main function is defined in libgcc2.c and
runs the global constructors.

How do you go about knowing what is happening in a JIT'ed code?

I am working with Firefox on a research project. Firefox makes uses of lots of JIT'ed code during run time.
I instrumented Firefox using a custom PIN tool to find out locations(address) of some things I as looking for. The issue is that those location are in JIT'ed code. I want to know what is actually happening over there in the code.
To do this I dumped the corresponding memory region and used objdump to disassemble the dump.
I used objdump -D -b binary -mi386 file.dump to see the instructions that would have been executed. To my surprise the only section listed is .data section (a very big one).
Either i am incorrectly disassembling it or something else is wrong with my understanding. I expect to see more sections like .text where actual executable instructions should be present and .data section should not be executable.
Am I correct in my understanding here?
Also If some one can please advise me on how to properly know what is happening in Jit'ed code.
Machine
Linux 3.13.0-24-generic #47-Ubuntu SMP x86_64
or something else is wrong with my understanding
Yes: something else is wrong with your understanding.
Sections (such as .text and .data) only make sense at static link time (the static linker groups .text from multiple .o files together into a single .text in the final executable). They are not useful, and in fact could be completely stripped, at execution time. On ELF systems, all that you need at runtime are segments (PT_LOAD segments in particular), which you can see with readelf -l binary.
Sections in ELF file are "parts of the file". When you dump memory, sections don't make any sense to even talk about.
The .data that you see in objdump output is not really there either, it's just an artifact that objdump manufactures.

How does the kernel stop you using malloc?

I know that you can't use malloc inside a kernel module because all functions used in the kernel must be defined in the kernel, but how exactly does the kernel achieve this lock-down?
It's not so much that it's locked down. It's just that your kernel module has no idea where malloc() is. The malloc() function is part of the C standard library, which is loaded alongside programs in userspace. When a userland program is executed, the linker will load the shared libraries needed by the program and figure out where the needed functions are. SO it will load libc at an address, and malloc() will be at some offset of that. So when your program goes to call malloc() it actually calls into libc.
Your kernel module isn't linked against libc or any other userspace components. It's linked against the kernel, which doesn't include malloc. Your kernel driver can't depend on the address of anything in userspace, because it may have to run in the context of any userspace program or even in no context, like in an interrupt. So the code for malloc() may not even be in memory anywhere when your module runs. Now if you knew that you were running in the context of a process that had libc loaded, and knew the address that malloc() was located at, you could potentially call that address by storing it in a function pointer. Bad things would probably happen though, possibly including a kernel panic. You don't want to cross userspace and kernelspace boundaries except through sane, well defined interfaces.
When you write a module for the kernel you just don't have this functions in your header files. And you don't have it in the files you are linking with.
Also the malloc's implementation is a procedure calling system calls. System calls moves you to hypervisor and calls the kernel's code. There is no point doing it while in hypervisor mode.
You can see it here in more details.

Resources