linux system call implementation - linux-kernel

Where can I find the source code of some of the system calls? For example, I am looking for the implementation of fstat as described here.

A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).
From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat from /usr/include/asm/unistd_64.h....-, and which other registers contain the arguments to the system call).
Use strace(1) to understand which system calls are done by a given program or process.
The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno).
For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.
Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).
See also this & that answers

Related

What is the difference between the following two eBPF program types BPF_PROG_TYPE_SYSCALL and BPF_PROG_TYPE_KPROBE?

So I am assuming that BPF_PROG_TYPE_SYSCALL programs are triggered whenever a particular syscall is executed inside the kernel. Can't BPF_PROG_TYPE_KPROBE ebpf programs be used for that purpose? kprobes can hook into any kernel function and syscalls are also kernel functions.
So what is the difference between the two types of programs and when to use which?
You would think that but actually BPF_PROG_TYPE_SYSCALL is a program type which can execute syscalls itself. https://lwn.net/Articles/854228/ It was introduced as an attempt to make one BPF program load another so the first program can be signed with a certificate. But it hasn't caught on very well yet as of writing this.
Indeed if you want to trigger on syscall execution, kprobes are the way to go.

How to read instructions retired using the perf-interface inside a LKM?

How can I read from the PMU from inside Kernel space?
For a profiling task I need to read the retired instructions provided by the PMU from inside the kernel. The perf_event_open systemcall seems to offer this capability. In my source code I
#include <linux/syscalls.h>
set my parameters for the perf_event_attr struct and call the sys_perf_event_open(). The mentioned header contains the function declaration. When checking "/proc/kallsyms", it is confirmed that there is a systemcall with the name sys_perf_event_open. The symbol is globally available indicated by the T:
ffffffff8113fe70 T sys_perf_event_open
So everything should work as far as I can tell.
Still, when compiling or inserting the LKM I get a warning/error that sys_perf_event_open does not exist.
WARNING: "sys_perf_event_open" [/home/vagrant/mods/lkm_read_pmu/read_pmu.ko] undefined!
What do I need to do in order to get those retired instructions counter?
The /proc/kallsyms file shows all kernel symbols defined in the source. Right, the capital T indicates a global symbol in the text section of the kernel binary, but the meaning of "global" here is according to the C language. That is, it can be used in other files of the kernel itself. You can't call a kernel function from a kernel module just because it's global.
Kernel modules can only use kernel symbols that are exported with EXPORT_SYMBOL in the kernel source code. Since kernel 2.6.0, none of the system calls are exported, so you can't call any of them from a kernel module, including sys_perf_event_open. System calls are really designed to be called from user space. What this all means is that you can't use the perf_event subsystem from within a kernel module.
That said, I think you can modify the kernel to add EXPORT_SYMBOL to sys_perf_event_open. That will make it an exported symbol, which means it can be used from a kernel module.

How does Go make system calls?

As far as I know, in CPython, open() and read() - the API to read a file is written in C code. The C code probably calls some C library which knows how to make system call.
What about a language such as Go? Isn't Go itself now written in Go? Does Go call C libraries behind the scenes?
The short answer is "it depends".
Go compiles for multiple combinations of H/W and OS, and they all have different approaches to how syscalls are to be made when working with them.
For instance, Solaris does not provide a stable supported set of syscalls, so they go through the systems libc — just as required by the vendor.
Windows does support a rather stable set of syscalls but it is defined as a C API provided by a set of standard DLLs.
The functions exposed by those DLLs are mostly shims which use a single "make a syscall by number" function, but these numbers are not documented and are different between the kernel flavours and releases (perhaps, intentionally).
Linux does provide a stable and documented set of numbered syscalls and hence there Go just calls the kernel directly.
Now keep in mind that for Go to "call the kernel directly" means following the so-called ABI of the H/W and OS combo. For instance, on modern Linux on amd64 making a syscall requires filling a set of CPU registers with certain values, doing some other arrangements and then issuing the SYSENTER CPU instruction.
On Windows, you have to use its native calling convention (which is stdcall, not cdecl).
Yes go is now written in go. But, you don't need C to make syscalls.
An important thing to call out is that syscalls aren't "written in C." You can make syscalls from C on Unix because of <unistd.h>. In particular, how Linux defines this header is a little convoluted, but you can see from this file the general idea. Syscalls are defined with a name and a number. When you call read for example, what really happens behind the scenes is the parameters are setup in the proper registers/memory (linux expects the syscall number in eax) followed by the instruction syscall which fires interrupt 0x80. The OS has already setup the proper interrupt handlers that will receive this interrupt and the OS goes about doing whatever is needed for that syscall. So, you don't need something written in C (or a standard library for that matter) to make syscalls. You just need to understand the call ABI and know the interrupt numbers.
However, as #retgits points out golang's approach is to piggyback off the fact that libc already has all of the logic for handling syscalls. mksyscall.go is a CLI script that parses these libc files to extract the necessary information.
You can actually trace the life of a syscall if you compile a go script like:
package main
import (
"syscall"
)
func main() {
var buf []byte
syscall.Read(9, buf)
}
Run objdump -D on the resulting binary. The go runtime is rather large, so your best bet is to find the main function, see where it calls syscall.Read and then search for the offsets from there: syscall.Read calls syscall.syscall, syscall.syscall calls runtime.libcCall (which switches from the go ABI to C ABI compatibility so that arguments are located where the OS expects--you can see this in runtime, for darwin for example), runtime.libcCall calls runtime.asmcgocall, etc.
For extra fun, run that binary with gdb and continue stepping in until you hit the syscall.
The sys package takes care of the syscalls to the underlying OS. Depending on the OS you're using different packages are used to generate the appropriate calls. Here is a link to the README for Go running on Unix systems: https://github.com/golang/sys/blob/master/unix/README.md the parts on mksyscall.go, which are hand-written Go files which implement system calls that need special handling, and type files, should walk you through how it works.
The Go compiler (which translates the Go code to target CPU code) is written in Go but that is different to the run time support code which is what you are talking about. The standard library is mainly written in Go and probably knows how to directly make system calls with no C code involved. However, there may be a bit of C support code, depending on the target platform.

why need linker script and startup code?

I've read this tutorial
I could follow the guide and run the code. but I have questions.
1) Why do we need both load-address and run-time address. As I understand it is because we have put .data at flash too; so why we don't run app there, but need start-up code to copy it into RAM?
http://www.bravegnu.org/gnu-eprog/c-startup.html
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
arm-none-eabi-gcc -nostdlib -o sum_array.elf sum_array.c
Many thanks
Your first question was answered in the guide.
When you load a program on an operating system your .data section, basically non-zero globals, are loaded from the "binary" into the right offset in memory for you, so that when your program starts those memory locations that represent your variables have those values.
unsigned int x=5;
unsigned int y;
As a C programmer you write the above code and you expect x to be 5 when you first start using it yes? Well, if are booting from flash, bare metal, you dont have an operating system to copy that value into ram for you, somebody has to do it. Further all of the .data stuff has to be in flash, that number 5 has to be somewhere in flash so that it can be copied to ram. So you need a flash address for it and a ram address for it. Two addresses for the same thing.
And that begins to answer your second question, for every line of C code you write you assume things like for example that any function can call any other function. You would like to be able to call functions yes? And you would like to be able to have local variables, and you would like the variable x above to be 5 and you might assume that y will be zero, although, thankfully, compilers are starting to warn about that. The startup code at a minimum for generic C sets up the stack pointer, which allows you to call other functions and have local variables and have functions more than one or two lines of code long, it zeros the .bss so that the y variable above is zero and it copies the value 5 over to ram so that x is ready to go when the code your entry point C function is run.
If you dont have an operating system then you have to have code to do this, and yes, there are many many many sandboxes and toolchains that are setup for various platforms that already have the startup and linker script so that you can just
gcc -O myprog.elf myprog.c
Now that doesnt mean you can make system calls without a...system...printf, fopen, etc. But if you download one of these toolchains it does mean that you dont actually have to write the linker script nor the bootstrap.
But it is still valuable information, note that the startup code and linker script are required for operating system based programs too, it is just that native compilers for your operating system assume you are going to mostly write programs for that operating system, and as a result they provide a linker script and startup code in that toolchain.
1) The .data section contains variables. Variables are, well, variable -- they change at run time. The variables need to be in RAM so that they can be easily changed at run time. Flash, unlike RAM, is not easily changed at run time. The flash contains the initial values of the variables in the .data section. The startup code copies the .data section from flash to RAM to initialize the run-time variables in RAM.
2) Linker-script: The object code created by your compiler has not been located into the microcontroller's memory map. This is the job of the linker and that is why you need a linker script. The linker script is input to the linker and provides some instructions on the location and extent of the system's memory.
Startup code: Your C program that begins at main does not run in a vacuum but makes some assumptions about the environment. For example, it assumes that the initialized variables are already initialized before main executes. The startup code is necessary to put in place all the things that are assumed to be in place when main executes (i.e., the "run-time environment"). The stack pointer is another example of something that gets initialized in the startup code, before main executes. And if you are using C++ then the constructors of static objects are called from the startup code, before main executes.
1) Why do we need both load-address and run-time address.
While it is in most cases possible to run code from memory mapped ROM, often code will execute faster from RAM. In some cases also there may be a much larger RAM that ROM and application code may compressed in ROM, so the executable code may not simply be copied from ROM also decompressed - allowing a much larger application than the available ROM.
In situations where the code is stored on non-memory mapped mass-storage media such as NAND flash, it cannot be executed directly in any case and must be loaded into RAM by some sort of bootloader.
2) Why we need linker script and start-up code here. Can I not just build C source as below and run it with qemu?
The linker script defines the memory layout of you target and application. Since this tutorial is for bare-metal programming, there is no OS to handle that for you. Similarly the start-up code is required to at least set an initial stack-pointer, initialise static data, and jump to main. On an embedded system it is also necessary to initialise various hardware such as the PLL, memory controllers etc.

Implementation of putc in Versatile ARM LATEST Kernel-4.6

I am Trying to understand How linux printing
"Uncompressing Linux....... done, booting the kernel"
message even before it uncompressed itself in ARM Versatile Boad.
From this File the function decompress_kernel is writing the message through putstr() function which inturn have putc function which writing to hardware register uart.
putc is implemented in this file, putc writes directly to AMBA_UART_DR registers and these registers are different across architectures and also differs across different chips too.
But in the latest kernel-4.6 this was deprecated .
When i checked putc implemetation for ARM Versatile Boad in latest kernel its been deprecated so
how they implemented in latest kernel-4.6 where as rest of machine-specific code still exist?
How kernel is printing the banner in latest kernel?
Versatile board support code was converted to the multi-platform kernel model (ARCH_MULTIPLATFORM). Just like every other board support code of the same kind, now it takes putc() prototype from arch/arm/include/debug/uncompress.h.
Instead, the actual implementation of putc() is a generic assembly function coded into arch/arm/boot/compressed/debug.S.
Being generic, debug.S makes reference to few macros (addruart, waituart, senduart, busyuart) to get information about the actual UART hardware. These macros are defined in an include file selected by CONFIG_DEBUG_LL_INCLUDE (search arch/arm/Kconfig.debug for it). In case of the Versatile board CONFIG_DEBUG_LL_INCLUDE is defined as arch/arm/include/debug/pl01x.S, where in fact you find those macros.

Resources