code segment of kernel thread - linux-kernel

Is there any way to get address and size of code segment of linux kernel thread (like task_struct->mm->mmap->vm_start and vm_end for active task with task_struct->mm != 0)?

I would recommend you go through the taskstats interface from the Linux kernel which can provide info on all the Linux threads, including VM stats.
Have a look on the doc, as well as on the header for the interface.
There is no easy way to hack into the kernel to enumerate all the task_struct available.

Related

Linux kernel initialization - When are devicetree blobs parsed and tree nodes are loaded?

I would like to establish a milestone roadmap for Linux initialization for me to easily understand. (For an embedded system) Here is what I got:
Bootloader loads kernel to RAM and starts it
Linux kernel enters head.o, starts start_kernel()
CPU architecture is found, MMU is started.
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
Then /sbin/init (or alike) is run.
I don't know when exactly devicetree is processed here. Is it when do_initcall() functions are beings processed or is it something prior to that?
In general when devicetree is parsed, and when tree nodes are processed?
Thank you very much in advance.
Any correction to my thoughts are highly appreciated.
It's a good question.
Firstly, I think you already know that the kernel will use data in the DT to identify the specific machine, in case of general use across different platform or hardware, we need it to establish in the early boot so that it has the opportunity to run machine-specific fixups.
Here is some information I digest from linux kernel documents.
In the majority of cases, the machine identity is irrelevant, and the kernel will instead select setup code based on the machine’s core CPU or SoC. On ARM for example, setup_arch() in arch/arm/kernel/setup.c will call setup_machine_fdt() in arch/arm/kernel/devtree.c which searches through the machine_desc table and selects the machine_desc which best matches the device tree data. It determines the best match by looking at the ‘compatible’ property in the root device tree node, and comparing it with the dt_compat list in struct machine_desc (which is defined in arch/arm/include/asm/mach/arch.h if you’re curious).
As for the Linux Initialization, I think there are something we can add in the list.
Put on START button, reset signal trigger
CS:IP fix to the BIOS 0XFFFF0 address
Jump to the start of BIOS
Self-check, start of hardware device like keyboard, real mode IDT & GDT
Load Bootloader like grub2 or syslinux.
Bootloader loads kernel to RAM and starts it (boot.img->core.img).
A20 Open, call setup.s, switch into protected mode
Linux kernel enters head.o, IDT & GDT refresh, decompress_kernel(), starts start_kernel()
INIT_TASK(init_task) create
trap_init()
CPU architecture is found, MMU is started (mmu_init()).
setup_arch() is called, setting CPU up.
Kernel subsystems are loaded.
do_initcalls() is called and modules with *_initcall() and module_init() functions are started.
rest_init() will create process 1 & 2, in other word, /sbin/init (or alike) and kthreadd is run.

What are allowed and not allowed to do in a linux Device Driver?

I have a general question about linux device driver. More often I get confused which actions are allowed or not allowed to perform in a linux device driver?
Is there any rules or kind of lookup list to follow?
for instance with the following examples, which are not allowable?
msleep(1000);
al = kmallock(sizeof(val));
printk(KERN_ALERT "faild to print\n";
ret = adc_get_val()*0.001;
In linux device driver programming it depends in which context you are. There are two contexts that need to be distinguished:
process context
IRQ context.
Sleeping can only be done while in process context or you schedule the work for later execution (there are several mechanism available to do that). This is a complex topic that cannot be described in a paragraph.
Allocating memory can sleep, it depends with which parameters/flags kmalloc is invoked.
print can always be called (once the kernel has been invoked), otherwise use early_printk.
I don't know what the function add_get_val does. It is not part of the linux kernel. And as has already been commented, float values cannot be easily used in the kernel.

From Kernel Space to User Space: Inner-workings of Interrupts

I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf

what do these trace events stand for?

I just learned to use ftrace and perf and there are some stuff they are in common I don't understand--trace events. I guess they are some kernel internal functions, ftrace will record their name when they're called if they're enabled. Is that right? All the evens are sorted in groups listed below. Would someone tell me what they stand for or where I can get information about them in detail. thx.
block btrfs compaction drm ext3 ext4 fs ftrace gpio header_event header_page irq jbd jbd2 kmem mce module napi net power raw_syscalls rcu regmap regulator rpm sched scsi signal skb sock syscalls timer udp vfs vmscan vsyscall workqueue writeback xen xfs
Each of those is the name of the code in the linux kernel which printed the log message. For example, rcu is the lockless list code. It stands for Read Copy Update. The names will roughly match up with names of files or directories in the kernel source. Look in the Documentation directory of the kernel source for more information.

What options do we have for communication between a user program and a Linux Kernel Module?

I am a new comer to Linux Kernel Module programming. From the material that I have read so far, I have found that there are 3 ways for a user program to request services or to communicate with a Linux Kernel Module
a device file in /dev
a file in /proc file system
ioctl() call
Question: What other options do we have for communication between user program and linux kernel module?
Your option 3) is really a sub-option of option 1) - ioctl() is one way of interacting with a device file (read() and write() being the usual ways).
Two other ways worth considering are:
The sysfs filesystem;
Netlink sockets.
Basically, many standard IPC mechanisms — cf. http://en.wikipedia.org/wiki/Inter-process_communication — can be used:
File and memory-mapped file: a device file (as above) or similarly special file in /dev, procfs, sysfs, debugfs, or a filesystem of your own, cartesian product with read/write, ioctl, mmap
Possibly signals (for use with a kthread)
Sockets: using a protocol of choice: TCP, UDP (cf. knfsd, but likely not too easy), PF_LOCAL, or Netlink (many subinterfaces - base netlink, genetlink, Connector, ...)
Furthermore,
 4. System calls (not really usable from modules though)
 5. Network interfaces (akin to tun).
Working examples of Netlink — just to name a few — can be found for example in
git://git.netfilter.org/libmnl (userspace side)
net/core/rtnetlink.c (base netlink)
net/netfilter/nf_conntrack_netlink.c (nfnetlink)
fs/quota/netlink.c (genetlink)
This includes all types with examples :)
http://people.ee.ethz.ch/~arkeller/linux/kernel_user_space_howto.html
Runnable examples of everything
Too much talk is making me bored!
file operations:
file types that implement file operations:
procfs. See also: proc_create() example for kernel module
debugfs
character devices. See also: https://unix.stackexchange.com/questions/37829/how-do-character-device-or-character-special-files-work/371758#371758
sysfs. See also: How to attach file operations to sysfs attribute in platform driver?
file operation syscalls themselves
open, read, write, close, lseek. See also: How to add poll function to the kernel module code?
poll. See also: How do I use ioctl() to manipulate my kernel module?
ioctl. See also: How do I use ioctl() to manipulate my kernel module?
mmap. See also: How to mmap a Linux kernel buffer to user space?
anonymous inodes. See also: What is an anonymous inode in Linux?
netlink sockets. See also: How to use netlink socket to communicate with a kernel module?
This Linux document gives some of the ways in which the kernel and user space can interact(communicate). They are the following.
Procfs, sysfs, and similar mechanisms. This includes /dev entries as well, and all the methods in which kernel space exposes a file in user space (/proc, /dev, etc. entries are basically files exposed from the kernel space).
Socket based mechanisms. Netlink is a type of socket, which is meant specially for communication between user space and kernel space.
System calls.
Upcalls. The kernel executes a code in user space. For example spawning a new process.
mmap - Memory mapping a region of kernel memory to user space. This allows both the kernel, and the user space to read/write to the same memory area.
Other than these, the following list adds some other mechanisms I know.
Interrupts. The user space can raise interrupts to talk to kernel space. For example some CPUs use int80 to make system calls (while others may use a different mechanism like syscall instruction). The kernel has to define the corresponding interrupt handler in advance.
vDSO/vsyscall - These are mechanisms in Linux kernel to optimize execution of some system calls. The idea is to have a shared memory region, and when a process makes a system call, the user space library gets data from this region, instead of actually calling the corresponding system call. This saves context switch overhead.

Resources