I am trying to debug a kernel panic.
kernel log says that
[63859.139142] Unable to handle kernel paging request at virtual address c0a0da06
[63859.139236] pgd = ec040000
[63859.139289] [c0a0da06] *pgd=00a1941e(bad)
I am interested in knowing what is pgd?
Thank you.
pgd is short for "page global directory", the kernel's name for the top level of a page table.
Related
I am trying to implement a PCI device driver for a virtual PCI device on QEMU. The device defines a BAR region as RAM, and the driver can do ioremap() this region and access it without any issues. The next step is to assign this region (or a fraction of it) to a user application.
To do this, I have also implemented an .mmap function as part of my driver file operations. This mmap is simply using remap_pfn_range, but it also passes the pfn of the memory pointer returned by the ioremap() earlier.
However, upon running the user space application, the mmap is successful, but when the app tries to access the memory, it gets killed and I get the following dmesg errors.
"
a.out: Corrupted page table at address 7f66248b8000
..Some page table info..
Bad pagetable: 000f [#2] SMP NOPTI
..and the core dump..
"
Does anyone know what have I done wrong? Did I missed a step? Or it could be an error specific to QEMU?
I am running x86_softmmu as my QEMU configuration and my kernel is the 4.14
I've solved this issue and managed to map PCI memory to user space via the driver. As #IanAbbott implied, I've changed the pfn input of the remap_pfn_range() function I was using in my custom ->mmap().
The original was:
io_remap_pfn_range(vma, vma->vm_start, pfn, vma->vm_end - vma->vm_start, vma->vm_page_prot));
where the pfn was the result of the buffer pointer return from the ioremap(). I changed the pfn to:
pfn = pci_resource_start(pdev, BAR) >> PAGE_SHIFT;
That basically points to the actual starting address pointed by the BAR. My working remap_pfn_range() function is now:
io_remap_pfn_range(vma, vma->vm_start, pci_resource_start(pdev, BAR) >> PAGE_SHIFT, vma->vm_end - vma->vm_start,vma->vm_page_prot);
I confirmed that it works by doing some dummy writes to the buffer pointer in my driver, then picking up the reads and doing some writes in my user space application.
I have a process which does mmap into a kernel driver to map DMA packet memory into user space. When the process crashes want this memory to be part of coredump for post crash analysis.
Currently linux marks VM_IO flag for VMAs created as part of driver mmap function. In elf_core_dump() vma_dump_size() is called which checks for VM_IO flag and returns 0 if set. So these vmas are never added to coredump sections.
As per always_dump_vma() there seems probably a workaround to set vma->vm_ops
having a name function pointer which should return non-NULL when invoked.
When extending this workaround to this driver mmap function running into second problem where the vma added is filled with zeroes by kernel elf_core_dump() instead of real data. In elf_core_dump() get_dump_page() for the DMA VMA pages returns NULL as the VMA has VM_IO flag. This results in dump_skip() instead of dump_emit() being called causing the problem.
How to workaround this second problem to get non-zero contents into process coredump for this packet memory vma ? Tried searching in various forums but did not get a precise solution.
I have been trying to understand how do h/w interrupts end up in some user space code, through the kernel.
My research led me to understand that:
1- An external device needs attention from CPU
2- It signals the CPU by raising an interrupt (h/w trance to cpu or bus)
3- The CPU asserts, saves current context, looks up address of ISR in the
interrupt descriptor table (vector)
4- CPU switches to kernel (privileged) mode and executes the ISR.
Question #1: How did the kernel store ISR address in interrupt vector table? It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual? The more detail on this subject the better please.
In user space how can a programmer write a piece of code that listens to a h/w device notifications?
This is what I understand so far.
5- The kernel driver for that specific device has now the message from the device and is now executing the ISR.
Question #3:If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
I might be completely off track here, any guidance would be appreciated.
I am not looking for specific details answers, I am only trying to understand the general picture.
Question #1: How did the kernel store ISR address in interrupt vector table?
Driver calls request_irq kernel function (defined in include/linux/interrupt.h and in kernel/irq/manage.c), and Linux kernel will register it in right way according to current CPU/arch rules.
It might probably be done by sending the CPU some piece of assembly described in the CPUs user manual?
In x86 Linux kernel stores ISR in Interrupt Descriptor Table (IDT), it format is described by vendor (Intel - volume 3) and also in many resources like http://en.wikipedia.org/wiki/Interrupt_descriptor_table and http://wiki.osdev.org/IDT and http://phrack.org/issues/59/4.html and http://en.wikibooks.org/wiki/X86_Assembly/Advanced_Interrupts.
Pointer to IDT table is registered in special CPU register (IDTR) with special assembler commands: LIDT and SIDT.
If the programmer in user space wanted to poll the device, I would assume this would be done through a system call (or at least this is what I understood so far). How is this done? How can a driver tell the kernel to be called upon a specific systemcall so that it can execute the request from the user? And then what happens, how does the driver gives back the requested data to user space?
Driver usually registers some device special file in /dev; pointers to several driver functions are registered for this file as "File Operations". User-space program opens this file (syscall open), and kernels calls device's special code for open; then program calls poll or read syscall on this fd, kernel will call *poll or *read of driver's file operations (http://www.makelinux.net/ldd3/chp-3-sect-7.shtml). Driver may put caller to sleep (wait_event*) and irq handler will wake it up (wake_up* - http://www.makelinux.net/ldd3/chp-6-sect-2 ).
You can read more about linux driver creation in book LINUX DEVICE DRIVERS (2005) by Jonathan Corbet, Alessandro Rubini, and Greg Kroah-Hartman: https://lwn.net/Kernel/LDD3/
Chapter 3: Char Drivers https://lwn.net/images/pdf/LDD3/ch03.pdf
Chapter 10: Interrupt Handling https://lwn.net/images/pdf/LDD3/ch10.pdf
I was going through the do_page_fault (x86 arch) routine. Suppose a process tries to write to a shared page which is swapped out. Then as per the execution flow in do_page_fault, if the access is valid and it is a normal page (not huge page) and the execution lets say came till do_swap_page (i.e., no errors). Once do_swap_page is executed, it returns.
1) But will there be a fault again in case swap-in itself was not handled due to some reason?
2) In general, I would like to know more detail about MMU like - does it check pte flags or vm area flags to raise fault on an address? Can anyone point me to the sources where I can understand how MMU does the checks for a memory access.
1) But will there be a fault again in case swap-in itself was not handled due to some reason?
Yes. Fault will be generated again and again (ISR completes successfully), till page is in place. MMU doesn't track whether previous access to this page generated interrupt or not.
However, if page fault is triggered when previous fault is handled, double fault will be triggered.
2) In general, I would like to know more detail about MMU like - does it check pte flags or vm area flags to raise fault on an address? Can anyone point me to the sources where I can understand how MMU does the checks for a memory access.
Yes, it checks.
You may check OSDev for more info.
I'm writing a kernel driver which needs to access memory-mapped IO.
My call to request_mem_region is failing, indicating that another module (either loaded or built-in) has requested the memory in question.
How can I determine which driver has done this?
Seeing as a string identifier is passed to the request_mem_region function, I assume this is possible.
/proc/iomem is a file that shows the current map of the system's memory for each physical device.