Is there any way that we can write a function to print symbol table like entries i.e.,c03441a0 t dmi_broken during kernel boot time for debugging purposes for ARM architecture. This is required during early bootup.
Related
I am working on a library that gives information on what happens during the execution of a program.
In the Unix world, I use BFD to resolve the address of the symbols used in the program. For instance, if I am interested in multi-threaded programs, I have wrappers for pthread functions that start counters on pthread_create and collect values on pthread_join. I know which function is executed by a given thread because the function pointer is passed to pthread_create, and at the end of the exeuction (while the program is still running) I resolve it to get the function name etc using libbfd.
My question is: how can I do the same thing in the Windows world, more specifically for programs executed in WSL2?
How can I read from the PMU from inside Kernel space?
For a profiling task I need to read the retired instructions provided by the PMU from inside the kernel. The perf_event_open systemcall seems to offer this capability. In my source code I
#include <linux/syscalls.h>
set my parameters for the perf_event_attr struct and call the sys_perf_event_open(). The mentioned header contains the function declaration. When checking "/proc/kallsyms", it is confirmed that there is a systemcall with the name sys_perf_event_open. The symbol is globally available indicated by the T:
ffffffff8113fe70 T sys_perf_event_open
So everything should work as far as I can tell.
Still, when compiling or inserting the LKM I get a warning/error that sys_perf_event_open does not exist.
WARNING: "sys_perf_event_open" [/home/vagrant/mods/lkm_read_pmu/read_pmu.ko] undefined!
What do I need to do in order to get those retired instructions counter?
The /proc/kallsyms file shows all kernel symbols defined in the source. Right, the capital T indicates a global symbol in the text section of the kernel binary, but the meaning of "global" here is according to the C language. That is, it can be used in other files of the kernel itself. You can't call a kernel function from a kernel module just because it's global.
Kernel modules can only use kernel symbols that are exported with EXPORT_SYMBOL in the kernel source code. Since kernel 2.6.0, none of the system calls are exported, so you can't call any of them from a kernel module, including sys_perf_event_open. System calls are really designed to be called from user space. What this all means is that you can't use the perf_event subsystem from within a kernel module.
That said, I think you can modify the kernel to add EXPORT_SYMBOL to sys_perf_event_open. That will make it an exported symbol, which means it can be used from a kernel module.
I would like to establish a milestone roadmap for Linux initialization for me to easily understand. (For an embedded system) Here is what I got:
Bootloader loads kernel to RAM and starts it
Linux kernel enters head.o, starts start_kernel()
CPU architecture is found, MMU is started.
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
Then /sbin/init (or alike) is run.
I don't know when exactly devicetree is processed here. Is it when do_initcall() functions are beings processed or is it something prior to that?
In general when devicetree is parsed, and when tree nodes are processed?
Thank you very much in advance.
Any correction to my thoughts are highly appreciated.
It's a good question.
Firstly, I think you already know that the kernel will use data in the DT to identify the specific machine, in case of general use across different platform or hardware, we need it to establish in the early boot so that it has the opportunity to run machine-specific fixups.
Here is some information I digest from linux kernel documents.
In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine’s core CPU or SoC. On ARM for example, setup_arch() in arch/arm/kernel/setup.c will call setup_machine_fdt() in arch/arm/kernel/devtree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the ‘compatible’ property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc (which is defined in arch/arm/include/asm/mach/arch.h if you’re curious).
As for the Linux Initialization, I think there are something we can add in the list.
Put on START button, reset signal trigger
CS:IP fix to the BIOS 0XFFFF0 address
Jump to the start of BIOS
Self-check, start of hardware device like keyboard, real mode IDT & GDT
Load Bootloader like grub2 or syslinux.
Bootloader loads kernel to RAM and starts it (boot.img->core.img).
A20 Open, call setup.s, switch into protected mode
Linux kernel enters head.o, IDT & GDT refresh, decompress_kernel(), starts start_kernel()
INIT_TASK(init_task) create
trap_init()
CPU architecture is found, MMU is started (mmu_init()).
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
rest_init() will create process 1 & 2, in other word, /sbin/init (or alike) and kthreadd is run.
I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.
ELF executables have a fixed load address (0x804800 on 32-bit x86 Linux binaries, and 0x40000 on 64-bit x86_64 binaries).
I read the SO answers (e.g., this one) about the historical reasons for those specific addresses. What I still don't understand is why to use a fixed load address and not a randomized one (given some range to be randomized within)?
why to use a fixed load address and not a randomized one
Traditionally that's how executables worked. If you want a randomized load address, build a PIE binary (which is really a special case of shared library that has startup code in it) with -fPIE and link with -pie flags.
Building with -fPIE introduces runtime overhead, in some cases as bad as 10% performance degradation, which may not be tolerable if you have a large cluster or you need every last bit of performance.
not sure if I understood your question correct, but saying I did, that's sort-off a "legacy" / historical issue, ELF is the file format used by UNIX derived operating systems, both POSIX (IOS) and Unix-like (Linux).
and the elf format simply states that there must be some resolved and absolute virtual address that the code is loaded into and begins running from...
and simply that's how the file format is, and during to historical reasons that cant be changed... you couldn't just "throw" the executable in any memory address and have it run successfully, back in the 90's when the ELF format was introduced problems such as calling functions with virtual tables we're raised and it was decided that the elf format would have absolute addresses within it.
Also think about it, take a look at the elf format -https://en.wikipedia.org/wiki/Executable_and_Linkable_Format
how would you design an OS executable-loader that would be able to handle an executable load it to ANY desired virtual address and have the code run successfully without actually having to change the binary itself... if you would like to do something like that you'd either need to vastly change the output compilers generate or the format itself, which again isn't possible
As time passed the requirement of position independent executing (PIE/PIC) has raised and shared objects we're introduced in order to allow that and ASLR
(Address Space Layout Randomization) - which means that the code could be thrown in any memory address and still be able to execute, that is simply implemented by making sure that all calls within the code itself are relative to the current address of the executed instruction, AND that when the shared object is loaded the OS loader would have to change some data within the binary given that the data changed is not executable instructions (R E) but actual data (RW, e.g .data segment), which also is implemented by calling functions from some "Jump tables" ( which would be changed at load time ) for example PLT / GOT.... those shared objects allow absolute randomization of the addresses the code is loaded to and if you want to execute some more "secure" code you'd have to compile it as a shared object and and dynamically link it and load time or run time..
( hope I've cleared some things out :) )