loader inside the kernel - linux-kernel

Assuming I don't care about security, the goal is to write a new system call that given a binary (ELF) can execute it inside the kernel.
Let's say I have a statically compiled binary A whose location in memory is ptr_A, the goal is to instrument the kernel with a new system call
sys_new_loader(ptr_A, ptr_result)
that
executes A inside the kernel, that is it is not possible for user space program to peek into A
returns value at location specified by ptr_result .
How could I go about implement this?
(I understand that exec system call family does a lot of book-keeping before transfering the control to user space. Do I need all those book-keeping, or can I simply jump to the specific location in ptr_A)

Related

who creates map in BPF

After reading man bpf and a few other sources of documentation, I was under impression that a map can be only created by user process. However the following small program seems to magically create bpf map:
struct bpf_map_def SEC("maps") my_map = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 10,
};
SEC("sockops")
int my_prog(struct bpf_sock_ops *skops)
{
u32 key = 1;
long *value;
...
value = bpf_map_lookup_elem(&my_map, &key);
...
return 1;
}
So I load the program with the kernel's tools/bpf/bpftool and also verify that program is loaded:
$ bpftool prog show
1: sock_ops name my_prog tag f3a3583cdd82ae8d
loaded_at Jan 02/18:46 uid 0
xlated 728B not jited memlock 4096B
$ bpftool map show
1: array name my_map flags 0x0
key 4B value 8B max_entries 10 memlock 4096B
Of course the map is empty. However, removing bpf_map_lookup_elem from the program results in no map being created.
UPDATE
I debugged it with strace and found that in both cases, i.e. with bpf_map_lookup_elem and without it, bpftool does invoke bpf(BPF_MAP_CREATE, ...) and it apparently succeeds. Then, in case of bpf_map_lookup_elem left out, I strace on bpftool map show, and bpf(BPF_MAP_GET_NEXT_ID, ..) immediately returns ENOENT, and it never gets to dump a map. So obviously something is not completing the map creation.
So I wonder if this is expected behavior?
Thanks.
As explained by antiduh, and confirmed with your strace checks, bpftool is the user space program creating the maps in this case. It calls function bpf_prog_load() from libbpf (under tools/lib/bpf/), which in turn ends up performing the syscall. Then the program is pinned at the desired location (under a bpf virtual file system mount point), so that it is not unloaded when bpftool returns. Maps are not pinned.
Regarding map creation, the magic bits also take place in libbpf. When bpf_prog_load() is called, libbpf receives the name of the object file as an argument. bpftool does not ask to load this specific program or that specific map; instead, it provides the object file and libbpf has to deal with it. So the functions in libbpf parse this ELF object file, and eventually find a number of sections corresponding to maps and programs. Then it tries to load the first program.
Loading this program includes the following steps:
CHECK_ERR(bpf_object__create_maps(obj), err, out);
CHECK_ERR(bpf_object__relocate(obj), err, out);
CHECK_ERR(bpf_object__load_progs(obj), err, out);
In other words: start by creating all maps we found in the object file. Then perform map relocation (i.e. associate map index to eBPF instructions), and at last load program instructions.
So regarding your question: in both cases, with and without bpf_map_lookup_elem(), maps are created with a bpf(BPF_MAP_CREATE, ...) syscall. After that, relocation happens, and program instructions are adapted to point, if needed, to the newly created maps. Then once all steps are finished and the program is loaded, bpftool exits. The eBPF program should be pinned, and still loaded in the kernel. As far as I understand, if it does use the maps (if bpf_map_lookup_elem() was used), then maps are still referenced by a loaded program, and are kept in the kernel. On the other hand, if the program does not use the maps, then there is nothing more to hold them back, so the maps are destroyed when the file descriptors held by bpftool are closed, when bpftool returns.
So in the end, when bpftool has completed, you have a map loaded in the kernel if the program uses it, but no map if no program would rely on it. Sounds like expected behaviour in my opinion; but please do ping one way or another if you experience strange things with bpftool, I'm one of the guys working on the utility. One last generic observation: maps can also be pinned and remain in the kernel even if no program uses them, should one need to keep them around.
I was under impression that a map can be only created by user process.
You're completely right - user programs are the ones that invoke the bpf system call in order to load eBPF programs and create eBPF maps.
And you did just that:
So I load the program with tools/bpf/bpftool and ...
Your bpftool program is the user process that is invoking the bpf syscall, and thus is the user process that is creating the eBPF map.
BPF programs don't have to be unloaded when the user program that created it quits - bpftool likely uses this mechanism.
Some relevant bits from the man page to connect the dots:
A user process can create multiple maps ... and access them via file descriptors.
Generally, eBPF programs are loaded by the user process and automatically unloaded when the process exits. In some cases ... the program will continue to stay alive inside the kernel even after the process that loaded the program exits.
Each eBPF program is a set of instructions that is safe to run until its completion. ... During verification, the kernel increments reference counts for each of the maps that the eBPF program uses, so that the attached maps can't be removed until the program is unloaded.

why need linker script and startup code?

I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.

What is the difference between Virtual File System and System Call?

As I understand, kernel provides mainly two interface for user space to do something in kernel, these are System Call and Virtual File system (procfs, sysfs etc).
What I read in a book, that internally VFS also uses System Call.
So I want to know, how these two are connected exactly? And what are the situation where we should use VFS over System Call and vice versa.
A system call is the generic facility for any user space process to switch from user space mode to kernel mode.
It is like a function call that resides in the kernel and being invoked from user space with a variable number of parameters, the most important one is the syscall number.
The kernel will always maintain an architecture-specific array of supported system calls (=kernel functions) and will basically dispatch any syscall coming from user space to the correct function based on the system call number passed from user space.
Virtual File System is just an abstraction of a file system that provides you with standard functions to deal with any thing that can be considered a file. So for example you can call "open", "close", "read", etc. on any file without being concerned about what filesystem is this file stored in.
The relation here between VFS and syscalls is that VFS is basically code that resides in the kernel and the only way to get to the kernel is through syscalls ( "open" is a syscall, so is "close", etc )

Jump to App from custom bootloader in TMS320 digital media processor

I am working on a boot loader for TMS320DM6437. The idea is to create 2 independent firmware that one will update another. In firmware1 I will download firmware2 file and write it to NOR flash in a specified address. Both firmware are stored in NOR flash in ais format. Now I have two applications in flash. One is my custom boot loader and the second one is my main project. I want to know how I can jump from the first program to the second program located at a specified address. I also expect information about documents which may help me to create custom bootloader
Any recommendations?
You can jump to the entry point. I'm using this approach on TMS320 2802x and 2803x, but it should be the same.
The symbol of the entry point is c_int00.
To get to know the address of c_int00 in the second application, you have to fix the Run-Time Support (RTS) library at a specific address, by modifying the linker command file.
Otherwise you can leave the RTS unconstrained, and create a C variable (at a fixed address) that is initialized with the value of cint_00. Using this method your memory map is more flexible and you can add, togheter with the C variable, a comprehensive data structure with other information for your bootloader, e.g. CRC, version number, etc.
Be carefull with the (re)initialization of the peripherals in the second application, since you are not starting from a hardware reset, and you may need to explicity reset some more registers, or clear interrupt requests.

Memory mapping of binary to VAS

When a new process is created the Address space is created using fork() i.e new page table entries are created for the new process which are exactly same as the parent process.
After fork() the exec() is called. What happens during the exec() system call?
I read in the book "Operating system concepts " that when a new program is executed, the process is given a new empty VAS. Does that mean that the page table entries created during fork() would get deleted/modifeid ? What is the meaning of empty VAS?
How does the memory mapping of binary to VAS is performed? How does the loader knows that what addresses of the VAS should be mapped to the corresponding binary file?
I am really confused here.
when you call exec the kernel will load the binary and set up a whole new set of page tables (replacing the old ones).
The loader gets the address to load the binary at from the binary itself (basically it does read() to get the headers and stuff that's not code, then mmap() to actually load the code/data stuff in the binary)
so it looks at the binary and figures out how it should be loaded, the does mmap(), passing in an address to do the map at for each part of the binary that needs to be in a different place (ie code and data sections are probably two different calls to mmap() also the .bss section would be mapped from /dev/zero)
Note that depending on the OS and the binary being loaded some of this stuff may be handled by the kernel directly or by a userspace loader (on UNIXish systems ld would be the loader, it handles shared object loading)

Resources