I am Trying to understand How linux printing
"Uncompressing Linux....... done, booting the kernel"
message even before it uncompressed itself in ARM Versatile Boad.
From this File the function decompress_kernel is writing the message through putstr() function which inturn have putc function which writing to hardware register uart.
putc is implemented in this file, putc writes directly to AMBA_UART_DR registers and these registers are different across architectures and also differs across different chips too.
But in the latest kernel-4.6 this was deprecated .
When i checked putc implemetation for ARM Versatile Boad in latest kernel its been deprecated so
how they implemented in latest kernel-4.6 where as rest of machine-specific code still exist?
How kernel is printing the banner in latest kernel?
Versatile board support code was converted to the multi-platform kernel model (ARCH_MULTIPLATFORM). Just like every other board support code of the same kind, now it takes putc() prototype from arch/arm/include/debug/uncompress.h.
Instead, the actual implementation of putc() is a generic assembly function coded into arch/arm/boot/compressed/debug.S.
Being generic, debug.S makes reference to few macros (addruart, waituart, senduart, busyuart) to get information about the actual UART hardware. These macros are defined in an include file selected by CONFIG_DEBUG_LL_INCLUDE (search arch/arm/Kconfig.debug for it). In case of the Versatile board CONFIG_DEBUG_LL_INCLUDE is defined as arch/arm/include/debug/pl01x.S, where in fact you find those macros.
Related
How can I read from the PMU from inside Kernel space?
For a profiling task I need to read the retired instructions provided by the PMU from inside the kernel. The perf_event_open systemcall seems to offer this capability. In my source code I
#include <linux/syscalls.h>
set my parameters for the perf_event_attr struct and call the sys_perf_event_open(). The mentioned header contains the function declaration. When checking "/proc/kallsyms", it is confirmed that there is a systemcall with the name sys_perf_event_open. The symbol is globally available indicated by the T:
ffffffff8113fe70 T sys_perf_event_open
So everything should work as far as I can tell.
Still, when compiling or inserting the LKM I get a warning/error that sys_perf_event_open does not exist.
WARNING: "sys_perf_event_open" [/home/vagrant/mods/lkm_read_pmu/read_pmu.ko] undefined!
What do I need to do in order to get those retired instructions counter?
The /proc/kallsyms file shows all kernel symbols defined in the source. Right, the capital T indicates a global symbol in the text section of the kernel binary, but the meaning of "global" here is according to the C language. That is, it can be used in other files of the kernel itself. You can't call a kernel function from a kernel module just because it's global.
Kernel modules can only use kernel symbols that are exported with EXPORT_SYMBOL in the kernel source code. Since kernel 2.6.0, none of the system calls are exported, so you can't call any of them from a kernel module, including sys_perf_event_open. System calls are really designed to be called from user space. What this all means is that you can't use the perf_event subsystem from within a kernel module.
That said, I think you can modify the kernel to add EXPORT_SYMBOL to sys_perf_event_open. That will make it an exported symbol, which means it can be used from a kernel module.
As far as I know, in CPython, open() and read() - the API to read a file is written in C code. The C code probably calls some C library which knows how to make system call.
What about a language such as Go? Isn't Go itself now written in Go? Does Go call C libraries behind the scenes?
The short answer is "it depends".
Go compiles for multiple combinations of H/W and OS, and they all have different approaches to how syscalls are to be made when working with them.
For instance, Solaris does not provide a stable supported set of syscalls, so they go through the systems libc — just as required by the vendor.
Windows does support a rather stable set of syscalls but it is defined as a C API provided by a set of standard DLLs.
The functions exposed by those DLLs are mostly shims which use a single "make a syscall by number" function, but these numbers are not documented and are different between the kernel flavours and releases (perhaps, intentionally).
Linux does provide a stable and documented set of numbered syscalls and hence there Go just calls the kernel directly.
Now keep in mind that for Go to "call the kernel directly" means following the so-called ABI of the H/W and OS combo. For instance, on modern Linux on amd64 making a syscall requires filling a set of CPU registers with certain values, doing some other arrangements and then issuing the SYSENTER CPU instruction.
On Windows, you have to use its native calling convention (which is stdcall, not cdecl).
Yes go is now written in go. But, you don't need C to make syscalls.
An important thing to call out is that syscalls aren't "written in C." You can make syscalls from C on Unix because of <unistd.h>. In particular, how Linux defines this header is a little convoluted, but you can see from this file the general idea. Syscalls are defined with a name and a number. When you call read for example, what really happens behind the scenes is the parameters are setup in the proper registers/memory (linux expects the syscall number in eax) followed by the instruction syscall which fires interrupt 0x80. The OS has already setup the proper interrupt handlers that will receive this interrupt and the OS goes about doing whatever is needed for that syscall. So, you don't need something written in C (or a standard library for that matter) to make syscalls. You just need to understand the call ABI and know the interrupt numbers.
However, as #retgits points out golang's approach is to piggyback off the fact that libc already has all of the logic for handling syscalls. mksyscall.go is a CLI script that parses these libc files to extract the necessary information.
You can actually trace the life of a syscall if you compile a go script like:
package main
import (
"syscall"
)
func main() {
var buf []byte
syscall.Read(9, buf)
}
Run objdump -D on the resulting binary. The go runtime is rather large, so your best bet is to find the main function, see where it calls syscall.Read and then search for the offsets from there: syscall.Read calls syscall.syscall, syscall.syscall calls runtime.libcCall (which switches from the go ABI to C ABI compatibility so that arguments are located where the OS expects--you can see this in runtime, for darwin for example), runtime.libcCall calls runtime.asmcgocall, etc.
For extra fun, run that binary with gdb and continue stepping in until you hit the syscall.
The sys package takes care of the syscalls to the underlying OS. Depending on the OS you're using different packages are used to generate the appropriate calls. Here is a link to the README for Go running on Unix systems: https://github.com/golang/sys/blob/master/unix/README.md the parts on mksyscall.go, which are hand-written Go files which implement system calls that need special handling, and type files, should walk you through how it works.
The Go compiler (which translates the Go code to target CPU code) is written in Go but that is different to the run time support code which is what you are talking about. The standard library is mainly written in Go and probably knows how to directly make system calls with no C code involved. However, there may be a bit of C support code, depending on the target platform.
Basically, this is the same question that was asked here.
When performing kernel debugging of a machine running Windows 7 or older, with WinDbg version 6.2 and up, the debugger doesn't show anything in the registers window. Pressing the Customize... button results in a message box that reads Registers are not yet known.
At the same time, issuing the r command results in perfectly valid register values being printed out.
What is the reason for this behaviour, and can it be fixed?
TL;DR: I wrote an extension DLL that fixes the bug. Available here.
The Problem
To understand the problem, we first need to understand that WinDbg is basically just a frontend to Microsoft's Windows Symbolic Debugger Engine, implemented inside dbgeng.dll. Other frontends include the command-line kd.exe (kernel debugger) and cdb.exe (user-mode debugger).
The engine implements everything we expect from a debugger: working with symbol files, read and writing memory and registers, setting breakpoitns, etc. The engine then exposes all of this functionality through COM-like interfaces (they implement IUnknown but are not registered components). This allows us, for instance, to write our own debugger (like this person did).
Armed with this knowledge, we can now make an educated guess as to how WinDbg obtains the values of the registers on the target machine.
The engine exposes the IDebugRegisters interface for manipulating registers. This interface declares the GetValues method for retrieving the values of multiple registers in one go. But how does WinDbg know how many registers are there? That why we have the GetNumberRegisters method.
So, to retrieve the values of all registers on the target, we'll have to do something like this:
Call IDebugRegisters::GetNumberRegisters to get the total number of registers.
Call IDebugRegisters::GetValues with the Count parameter set to the total number of registers, the Indices parameter set to NULL, and the Start parameter set to 0.
One tiny problem, though: the second call fails with E_INVALIDARG.
Ehm, excuse me? How can it fail? Especially puzzling is the documentation for this return value:
The value of the index of one of the registers is greater than the number of registers on the target machine.
But I just asked you how many registers there are, so how can that value be out of range? Okay, let's continue reading the docs anyway, maybe something will become clear:
If the return value is not S_OK, some of the registers still might have been read. If the target was not accessible, the return type is E_UNEXPECTED and Values is unchanged; otherwise, Values will contain partial results and the registers that could not be read will have type DEBUG_VALUE_INVALID.
(Emphasis mine.)
Aha! So maybe the engine just couldn't read one of the registers! But which one? Turns out that the engine chokes on the xcr0 register. From the Intel 64 and IA-32 Architectures Software Developer’s Manual:
Extended control register XCR0 contains a state-component bitmap that specifies the user state components that software has enabled the XSAVE feature set to manage. If the bit corresponding to a state component is clear in XCR0, instructions in the XSAVE feature set will not operate on that state component, regardless of the value of the instruction mask.
Okay, so the register controls the operation of the XSAVE instruction, which saves the state of the CPU's extended features (like XMM and AVX). According to the last comment on this page, this instruction requires some support from the operating system. Although the comment states that Windows 7 (that's what the VM I was testing on was running) does support this instruction, it seems that the issue at hand is related to the OS anyway, as when the target is Windows 8 everything works fine.
Really, it's unclear whether the bug is within the debugger engine, which reports more registers than it can retrieve values for, or within WinDbg, which refuses to show any values at all if the engine fails to produce all of them.
The Solution
We could, of course, bite the bullet and just use an older version of WinDbg for debugging older Windows versions. But where's the challenge in that?
Instead, I present to you a debugger extension that solves this problem. It does so by hooking (with the help of this library) the relevant debugger engine methods and returning S_OK if the only register that failed was xcr0. Otherwise, it propagates the failure. The extension supports runtime unload, so if you experience problems you can always disable the hooks.
That's it, have fun!
Where can I find the source code of some of the system calls? For example, I am looking for the implementation of fstat as described here.
A system call is mostly implemented inside the Linux kernel, with a tiny glue code in the C standard library. But see also vdso(7).
From the user-land point of view, a system call (they are listed in syscalls(2)...) is a single machine instruction (often SYSENTER) with some calling conventions (e.g. defining which machine register hold the syscall number - e.g. __NR_stat from /usr/include/asm/unistd_64.h....-, and which other registers contain the arguments to the system call).
Use strace(1) to understand which system calls are done by a given program or process.
The C standard library has a tiny wrapper function (which invokes the kernel, following the ABI, and deals with error reporting & errno).
For stat(2), the C wrapping function is e.g. in stat/stat.c for musl-libc.
Inside the kernel code, most of the work happens in fs/stat.c (e.g. after line 207).
See also this & that answers
Is it possible to pin a softirq, or any other bottom half to a processor. I have a doubt that this could be done from within a softirq code.
But then inside a driver is it possible to pin a particular IRQ to a
core.
From user mode, you can easily do this by writing to /proc/irq/N/smp_affinity to control which processor(s) an interrupt is directed to. The symbols for the code implementing this are not exported though, so it's difficult to do from the kernel (at least for a loadable module which is how most drivers are structured).
The fact that the implementing function symbols aren't exported is a sign that the kernel developers don't want to encourage this. Presumably that's because it takes control away from the user. And also embeds assumptions about number of processors and so forth into the driver.
So, to answer your question, yes, it's possible, but it's discouraged, and you would need to do one of several "ugly" things to implement it ((a) change kernel exports, (b) link your driver statically into main kernel, or (c) open/write to the proc file from kernel mode).
The usual way to achieve this is by writing a user-mode program (can even be a shell script) that programs core numbers/masks into the appropriate proc file. See Documentation/IRQ-affinity.txt in the kernel source directory for details.