What do these Linux Kernel Oops fields mean?

What do these Linux Kernel Oops fields mean? - linux-kernel

I have already encountered some Oops in my developer's life and whereas I am familiar with some information that I can retrieve from these Oops, there are still pieces of information I can't understand and therefore, can't use to solve problems.
Below you will find an Oops example and I will describe what I can deduce from it. Then, I will ask what the remaining info can teach me about the problem.
[ 716.485951] BUG: unable to handle kernel paging request at fc132158
[ 716.485973] IP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[ 716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000
[ 716.485997] Oops: 0002 [#1] SMP
[ 716.486004] Modules linked in: ubi(O) mtdchar nandsim nand mtd nand_ids nand_bch bch nand_ecc bnep rfcomm bluetooth parport_pc ppdev lp parport nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_analog kvm_intel snd_hda_intel snd_hda_codec snd_hwdep kvm snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event hid_generic snd_seq cdc_acm snd_timer snd_seq_device mei tpm_tis snd mac_hid serio_raw soundcore lpc_ich snd_page_alloc microcode coretemp usbhid hid nouveau usb_storage ttm drm_kms_helper drm floppy e1000e i2c_algo_bit mxm_wmi video wmi
[ 716.486128] Pid: 3994, comm: ubimkvol Tainted: G O 3.8.0-rc3+ #3 LENOVO 6239AS8/LENOVO
[ 716.486136] EIP: 0060:[<fc1936e7>] EFLAGS: 00010246 CPU: 0
[ 716.486144] EIP is at ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[ 716.486151] EAX: 000000ac EBX: eb5ea000 ECX: 0000002b EDX: 00000000
[ 716.486157] ESI: eb4d1d74 EDI: fc132158 EBP: eb4d1d40 ESP: eb4d1d20
[ 716.486164] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 716.486170] CR0: 8005003b CR2: fc132158 CR3: 27542000 CR4: 000407f0
[ 716.486176] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 716.486183] DR6: ffff0ff0 DR7: 00000400
[ 716.486188] Process ubimkvol (pid: 3994, ti=eb4d0000 task=ec01d9b0 task.ti=eb4d0000)
[ 716.486195] Stack:
[ 716.486199] e755f000 eb4d1d2c c11cad11 eb4d1d34 eb543c00 eb5ea000 00000000 eb4d1e20
[ 716.486215] eb4d1e30 fc195412 e755f000 fc1adf01 eb5ea26c 00000002 0000009e eb5ea480
[ 716.486232] 00000002 e755f22c e755f2ac e755f000 eb4d1d74 2a000000 01000000 00000000
[ 716.486248] Call Trace:
[ 716.486257] [<c11cad11>] ? sysfs_create_file+0x21/0x30
[ 716.486266] [<fc195412>] ubi_create_volume+0x4b2/0x790 [ubi]
[ 716.486277] [<fc19967a>] ubi_cdev_ioctl+0x5da/0xac0 [ubi]
[ 716.486285] [<c117202a>] ? link_path_walk+0x5a/0x7d0
[ 716.486294] [<fc1990a0>] ? vol_cdev_ioctl+0x440/0x440 [ubi]
[ 716.486842] [<c1177e12>] do_vfs_ioctl+0x82/0x5b0
[ 716.487703] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.488564] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.489422] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.489891] [<c1171eb4>] ? putname+0x24/0x40
[ 716.489891] [<c1167239>] ? do_sys_open+0x169/0x1d0
[ 716.489891] [<c11783b0>] sys_ioctl+0x70/0x80
[ 716.489891] [<c16205cd>] sysenter_do_call+0x12/0x38
[ 716.489891] Code: ac 00 00 00 03 bb c8 04 00 00 f7 c7 01 00 00 00 0f 85 ee 00 00 00 f7 c7 02 00 00 00 0f 85 ca 00 00 00 89 c1 31 d2 c1 e9 02 a8 02 <f3> a5 74 0b 0f b7 16 66 89 17 ba 02 00 00 00 a8 01 74 07 0f b6
[ 716.489891] EIP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi] SS:ESP 0068:eb4d1d20
[ 716.489891] CR2: 00000000fc132158
[ 716.516453] ---[ end trace 473b15a7780e19ea ]---
It seems that the kernel wanted to access a wrong page. Now,
The Oops code 0002 tells me that it occurred while trying to read something in user-mode.
The Instruction Pointer is at ubi_change_vtbl_record, which means the offending instruction is located in this function.
I can deduce the path that lead to the faulting function from the
call trace (an ioctl launched from process ubimkvol)
From there, Is the "stack" a dump of the raw stack of the task ? I can see that some values mentioned are also function addresses found in the call trace. Then, I got fancy looking values like EAX, EBX ... DR7. I think they are CPU registers but still, I don't know what they really are.
Finally, the following line gets me lost :
[ 716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000
What are pdpt, pde and pte ? I feel they are information about the page fault but I could not retrieve further information after some googling around.

Yes, EAX, etc. are 32-bit x86 processor registers. pdpt (page directory pointer table), pde (page directory entry), and pte (page table entry) are all paging structures.
IP (also EIP for 32-bit or RIP for 64-bit processors) is the instruction pointer at the time of the Oops.
The stack is the raw stack for this processor. Each processor will have its own stack. Note that on this architecture the stack grows down (addresses start with 0xfxxxxxx).

Correct me if I am wrong but,
OOPS 0002 means no page found when writing in kernel mode:
bit 0 == 0 means no page found, 1 means a protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode

Related

On rhel8 os When I run a program the kernel crash, how can I determine which line of code reports the error ？

I don't know about kernel, I don't know how to troubleshoot.
When the problem happened, my system kernel crashed, the following is the vmcore-dmesg.txt log
##dmesg logs
[ 378.442884] SPDMD-LUN:[ERROR]lun_del:2218 device nvfile-mgmtd-0 is busy :2
[ 424.511211] XFS (nvfile-storage-0): Mounting V5 Filesystem
[ 424.513045] XFS (nvfile-storage-2): Mounting V5 Filesystem
[ 424.538953] XFS (nvfile-storage-2): Starting recovery (logdev: internal)
[ 424.546536] XFS (nvfile-storage-2): Ending recovery (logdev: internal)
[ 425.512217] XFS (nvfile-storage-0): Starting recovery (logdev: internal)
[ 425.518929] XFS (nvfile-storage-1): Mounting V5 Filesystem
[ 425.520813] XFS (nvfile-storage-3): Mounting V5 Filesystem
[ 426.288987] XFS (nvfile-storage-3): Starting recovery (logdev: internal)
[ 426.475465] XFS (nvfile-storage-1): Starting recovery (logdev: internal)
[ 427.496551] XFS (nvfile-storage-0): Ending recovery (logdev: internal)
[ 428.320932] XFS (nvfile-storage-3): Ending recovery (logdev: internal)
[ 428.977479] XFS (nvfile-storage-1): Ending recovery (logdev: internal)
[ 445.927118] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 445.927685] PGD 0 P4D 0
[ 445.928216] Oops: 0000 [#1] SMP NOPTI
[ 445.928723] CPU: 4 PID: 591 Comm: kworker/u193:1 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.el8.x86_64 #1
[ 445.929735] Hardware name: Lenovo ThinkSystem SR860 V2/7Z60CTO1WW, BIOS M5E118K-1.52 08/06/2021
[ 445.930305] Workqueue: xfs-cil/nvfile-storage- xlog_cil_push_work [xfs]
[ 445.930910] RIP: 0010:blk_queue_split+0x1c6/0x660
[ 445.933048] Code: c0 45 31 f6 89 44 24 44 89 7c 24 40 31 ff 85 db 0f 84 56 04 00 00 8b 44 24 44 48 89 c1 48 89 44 24 30 48 c1 e1 04 49 03 4d 78 <8b> 41 08 8b 71 0c 48 8b 11 44 29 f8 39 d8 48 89 54 24 48 0f 47 c3
[ 445.936323] RSP: 0018:ffffbcbf9a4bfaf8 EFLAGS: 00010246
[ 445.936711] RAX: 0000000000000000 RBX: 0000000000003000 RCX: 0000000000000000
[ 445.937028] RDX: 0000000000000018 RSI: 0000000000000018 RDI: 0000000000000000
[ 445.937343] RBP: ffffbcbf9a4bfb90 R08: 0000000000000000 R09: ffff940c1965d700
[ 445.937659] R10: 0000000000000000 R11: 0000000000000000 R12: ffff94ca6f590000
[ 445.937979] R13: ffff940bda0ddc80 R14: 0000000000000000 R15: 0000000000000000
[ 445.938304] FS: 0000000000000000(0000) GS:ffff940effd00000(0000) knlGS:0000000000000000

How to use decode_stacktrace.sh?

Q1: admin-guide/bug-hunting.html says that:
If the kernel is compiled with CONFIG_DEBUG_INFO, you can enhance
the quality of the stack trace by using
file:scripts/decode_stacktrace.sh.
Is CONFIG_DEBUG_INFO a prerequisite for running the script decode_stacktrace.sh?"
Q2: this patch says:
./decode_stacktrace.sh vmlinux /home/sasha/linux/ < input.log > output.log
Where can I find the input.log? I know it's sort of stack info. Will it only be available when CONFIG_DEBUG_INFO is y or when there is a kernel panic or oops?

Yes, you will need CONFIG_DEBUG_INFO=y in order for decode_stacktrace.sh to extract useful debugging information (such as file names and line numbers) from the kernel image (vmlinux). In theory, the script will run fine even without debug info, but it will give you less information (no file names and line numbers).
NOTE: in recent kernels (>= v5.12) a multiple choice option for the DWARF version was added (CONFIG_DEBUG_INFO_DWARF{4,5,_TOOLCHAIN_DEFAULT}). Then, in v5.18 CONFIG_DEBUG_INFO was completely removed, and the multiple choice option is now the one responsible for enabling debug info. Any choice except CONFIG_DEBUG_INFO_NONE is fine as long as your toolchain supports it.
Where can I find the input.log?
The "log" they are referring to is simply the kernel log. This will be available regardless of CONFIG_DEBUG_INFO. Kernel panics and other OOPSes will usually write stack traces to the kernel log. You can take this directly from the console (if console logging is enabled), or using the dmesg command. See also this doc page for more info. Once you have the log, simply copy-paste it into a text file (input.log), and pass that file to the script's standard input with < input.log.
For an actual panic on the same system you are working on (and not simply inside QEMU or a VM), chances are that the logging can only be done to console, so you might want to enable that:
$ dmesg --console-level 7
$ dmesg --console-on
Take a look at Where are kernel panic logs? on Ask Ubuntu for more ways to capture the log in case of a panic.
As an example, here's the log for a crash generated running echo c > /proc/sysrq-trigger as root on a system running inside QEMU on my machine. I copy-pasted the kernel log into input.log, which contains the following:
[ 7.952685] sysrq: Trigger a crash
[ 7.952850] Kernel panic - not syncing: sysrq triggered crash
[ 7.953098] CPU: 0 PID: 71 Comm: linuxrc Not tainted 5.19.0-rc2 #1
[ 7.953259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 7.953655] Call Trace:
[ 7.954133] <TASK>
[ 7.954332] dump_stack_lvl+0x34/0x44
[ 7.954651] panic+0x102/0x27b
[ 7.954756] ? _printk+0x53/0x6a
[ 7.954847] sysrq_handle_crash+0x11/0x20
[ 7.954953] __handle_sysrq.cold+0x43/0x11b
[ 7.955065] write_sysrq_trigger+0x1f/0x30
[ 7.955167] proc_reg_write+0x4c/0x90
[ 7.955267] vfs_write+0xb4/0x290
[ 7.955362] ksys_write+0x5a/0xd0
[ 7.955453] do_syscall_64+0x3b/0x90
[ 7.955553] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
[ 7.956427] RSP: 002b:00007ffde8168508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7.956625] RAX: ffffffffffffffda RBX: 000000000101a8a0 RCX: 00000000004a8531
[ 7.956787] RDX: 0000000000000002 RSI: 00000000010201e0 RDI: 0000000000000001
[ 7.956949] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefefefeff62
[ 7.957113] R10: 00000000000001b6 R11: 0000000000000246 R12: 00000000010201e0
[ 7.957275] R13: 0000000000000002 R14: 00007ffde8168701 R15: 00007ffde8168578
[ 7.957467] </TASK>
[ 7.957806] Kernel Offset: 0x34a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7.958215] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
The only section that you really need to extract from the above is the following:
[ 7.954332] dump_stack_lvl+0x34/0x44
[ 7.954651] panic+0x102/0x27b
[ 7.954756] ? _printk+0x53/0x6a
[ 7.954847] sysrq_handle_crash+0x11/0x20
[ 7.954953] __handle_sysrq.cold+0x43/0x11b
[ 7.955065] write_sysrq_trigger+0x1f/0x30
[ 7.955167] proc_reg_write+0x4c/0x90
[ 7.955267] vfs_write+0xb4/0x290
[ 7.955362] ksys_write+0x5a/0xd0
[ 7.955453] do_syscall_64+0x3b/0x90
[ 7.955553] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
Running decode_stacktrace.sh poiting it to my vmlinux and the kernel source directory yields more info:
$ ./scripts/decode_stacktrace.sh /path/to/vmlinux /path/to/kernel-source-dir < input.log
[ 7.952685] sysrq: Trigger a crash
[ 7.952850] Kernel panic - not syncing: sysrq triggered crash
[ 7.953098] CPU: 0 PID: 71 Comm: linuxrc Not tainted 5.19.0-rc2 #1
[ 7.953259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 7.953655] Call Trace:
[ 7.954133] <TASK>
[ 7.954332] dump_stack_lvl (/path/to/kernel-source-dir/lib/dump_stack.c:107 (discriminator 1))
[ 7.954651] panic (/path/to/kernel-source-dir/kernel/panic.c:292)
[ 7.954756] ? _printk (/path/to/kernel-source-dir/kernel/printk/printk.c:2426)
[ 7.954847] sysrq_handle_crash (/path/to/kernel-source-dir/drivers/tty/sysrq.c:155)
[ 7.954953] __handle_sysrq.cold (/path/to/kernel-source-dir/drivers/tty/sysrq.c:626)
[ 7.955065] write_sysrq_trigger (/path/to/kernel-source-dir/drivers/tty/sysrq.c:1168)
[ 7.955167] proc_reg_write (/path/to/kernel-source-dir/fs/proc/inode.c:335 /path/to/kernel-source-dir/fs/proc/inode.c:347)
[ 7.955267] vfs_write (/path/to/kernel-source-dir/fs/read_write.c:589)
[ 7.955362] ksys_write (/path/to/kernel-source-dir/fs/read_write.c:644)
[ 7.955453] do_syscall_64 (/path/to/kernel-source-dir/arch/x86/entry/common.c:50 /path/to/kernel-source-dir/arch/x86/entry/common.c:80)
[ 7.955553] entry_SYSCALL_64_after_hwframe (/path/to/kernel-source-dir/arch/x86/entry/entry_64.S:115)
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
All code
========
0: e0 ff loopne 0x1
2: ff (bad)
3: ff f7 push %rdi
5: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
9: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
10: eb b3 jmp 0xffffffffffffffc5
12: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
19: 8b 05 d2 26 1e 00 mov 0x1e26d2(%rip),%eax # 0x1e26f1
1f: 85 c0 test %eax,%eax
21: 75 16 jne 0x39
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 57 ja 0x89
32: c3 retq
33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
39: 08 .byte 0x8
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 57 ja 0x5f
8: c3 retq
9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
f: 08 .byte 0x8
[ 7.956427] RSP: 002b:00007ffde8168508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7.956625] RAX: ffffffffffffffda RBX: 000000000101a8a0 RCX: 00000000004a8531
[ 7.956787] RDX: 0000000000000002 RSI: 00000000010201e0 RDI: 0000000000000001
[ 7.956949] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefefefeff62
[ 7.957113] R10: 00000000000001b6 R11: 0000000000000246 R12: 00000000010201e0
[ 7.957275] R13: 0000000000000002 R14: 00007ffde8168701 R15: 00007ffde8168578
[ 7.957467] </TASK>
[ 7.957806] Kernel Offset: 0x34a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7.958215] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Regarding Q2, I setup a serial console where I can get all the kernel messages, including the input.log.
Below I list the steps to setup a serial console.
Components: 1x Linux machine, 1 x PC with Win10 and Putty, 1x USB to Serial Converter
Your Win10 should automatically recognize the USB2Serial converter and install the driver. The converter appears in Device Manager, in my case COM6.
Setup Putty, Serial, baud-rate 115200, Databits 8, Stopbits 1, Parity none, Flowcontrol XON/XOFF.
Make sure that serial port is enabled in Linux machine.
$ dmesg | grep tty
[ 0.108687] printk: console [tty0] enabled
[ 2.752397] 00:03:ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 2.773666] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
It shows that both ttyS0(COM1) and ttyS1(COM2) work fine. In my case I choose ttyS1.
Use getty to manage ttyS1.
sudo /sbin/getty -L 115200 ttyS1 vt102
Then the Putty would prompt login, just input your username and password to get into the command line interface.
But still at this stage, the ttyS1 is another pseudo-terminal, but not a console. You need to configure it as a console.
Make sure that you've already compiled in the serail port in your Linux kernel image. make menuconfig : Device driver ‣ Character devices ‣ Serial drivers ‣ 8250/16550 and compatible serial support ‣ Console on 8250/16550 and compatible serial port: choose built-in. If it's alreay that, leave it. Otherwise, change the configuration, rebuilt the kernel, and reboot.
Check the current console.
$cat /sys/devices/virtual/tty/console/active
tty0
It means that current console is tty0. We need to add ttyS1 as a console.
Add console=ttyS1 in Linux kernel boot argument.
$sudo vim /etc/default/grub
Change the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" to GRUB_CMDLINE_LINUX_DEFAULT="quiet console=ttyS1,115200 splash"
Then
$sudo update-grub
$sudo reboot
Check the current console.
$ cat /sys/devices/virtual/tty/console/active
ttyS1
That's it.
Call getty again.
sudo /sbin/getty -L 115200 ttyS1 vt102
Then Putty would get the login prompt and after you login, it becomes a real console.

Booting Debian Wheezy on a BeagleCore ? (Kernel panic - not syncing: Attempted to kill init!)

I'm trying to boot a Debian Wheezy Image, Ker 3.8 on my BeagleCore (a smaller version of BeagleBone) with TI AM335x Cortex-A8 processor.
I took the Debian Image from beagleboard site.
When I try to boot, on a serial interface for debug, I get this messages:
U-Boot SPL 2016.01-00001-g4eb802e (Jan 13 2016 - 11:14:31)
Trying to boot from MMC
bad magic
U-Boot 2016.01-00001-g4eb802e (Jan 13 2016 - 11:14:31 -0600), Build: jenkins-github_Bootloader-Builder-313
Watchdog enabled
I2C: ready
DRAM: 512 MiB
Reset Source: Power-on reset has occurred.
MMC: OMAP SD/MMC: 0, OMAP SD/MMC: 1
Using default environment
Net: <ethaddr> not set. Validating first E-fuse MAC
Could not get PHY for cpsw: addr 0
cpsw, usb_ether
Press SPACE to abort autoboot in 2 seconds
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
gpio: pin 56 (gpio 56) value is 0
gpio: pin 55 (gpio 55) value is 0
gpio: pin 54 (gpio 54) value is 0
gpio: pin 53 (gpio 53) value is 1
switch to partitions #0, OK
mmc0 is current device
gpio: pin 54 (gpio 54) value is 1
Checking for: /uEnv.txt ...
Checking for: /boot.scr ...
Checking for: /boot/boot.scr ...
Checking for: /boot/uEnv.txt ...
gpio: pin 55 (gpio 55) value is 1
2181 bytes read in 16 ms (132.8 KiB/s)
Loaded environment from /boot/uEnv.txt
Checking if uname_r is set in /boot/uEnv.txt...
gpio: pin 56 (gpio 56) value is 1
Running uname_boot ...
loading /boot/vmlinuz-3.8.13-bone79 ...
5644336 bytes read in 333 ms (16.2 MiB/s)
loading /boot/dtbs/3.8.13-bone79/am335x-boneblack.dtb ...
26118 bytes read in 24 ms (1 MiB/s)
loading /boot/initrd.img-3.8.13-bone79 ...
2905600 bytes read in 179 ms (15.5 MiB/s)
debug: [console=ttyO0,115200n8 capemgr.enable_partno=BB-UART1,BB-UART2,BB-UART4,BB-UART5 capemgr.disable_partno=BB-BONELT-HDMI,BB-BONELT-HDMIN root=UUID=4d8c9d4c-a16d-47ac-a32c-43d0155df072 ro rootfstype=ext4 rootwait coherent_pool=1M quiet init=/lib/systemd/systemd cape_universal=enable] ...
debug: [bootz 0x82000000 0x88080000:2c5600 0x88000000] ...
Kernel image # 0x82000000 [ 0x000000 - 0x562030 ]
## Flattened Device Tree blob at 88000000
Booting using the fdt blob at 0x88000000
Loading Ramdisk to 8fd3a000, end 8ffff600 ... OK
Loading Device Tree to 8fd30000, end 8fd39605 ... OK
Starting kernel ...
Uncompressing Linux... done, booting the kernel.
[ 0.384810] omap2_mbox_probe: platform not supported
[ 0.540541] tps65217-bl tps65217-bl: no platform data provided
[ 0.604330] bone-capemgr bone_capemgr.9: slot #0: No cape found
[ 0.641437] bone-capemgr bone_capemgr.9: slot #1: No cape found
[ 0.678546] bone-capemgr bone_capemgr.9: slot #2: No cape found
[ 0.715656] bone-capemgr bone_capemgr.9: slot #3: No cape found
[ 0.741854] omap_hsmmc mmc.5: of_parse_phandle_with_args of 'reset' failed
[ 0.803809] pinctrl-single 44e10800.pinmux: pin 44e10854 already requested by 44e10800.pinmux; cannot claim for gpio-leds.8
[ 0.815463] pinctrl-single 44e10800.pinmux: pin-21 (gpio-leds.8) status -22
[ 0.822748] pinctrl-single 44e10800.pinmux: could not request pin 21 on device pinctrl-single
[ 0.893233] Unhandled fault: external abort on non-linefetch (0x1008) at 0xe0858c20
[ 0.901225] Internal error: : 1008 [#1] SMP THUMB2
[ 0.906217] Modules linked in:
[ 0.909405] CPU: 0 Not tainted (3.8.13-bone79 #1)
[ 0.914691] PC is at cpts_fifo_read.constprop.1+0x18/0xc4
[ 0.920317] LR is at cpts_systim_read+0x11/0x7c
[ 0.925040] pc : [<c0326468>] lr : [<c0326761>] psr: 000001b3
[ 0.925040] sp : df071db8 ip : 00000000 fp : de231664
[ 0.936993] r10: de231000 r9 : de231758 r8 : c084e0c0
[ 0.942440] r7 : 00000001 r6 : ffffffff r5 : 00000010 r4 : de231670
[ 0.949241] r3 : e0858c00 r2 : 00000001 r1 : de2316d0 r0 : de231670
[ 0.956039] Flags: nzcv IRQs off FIQs on Mode SVC_32 ISA Thumb Segment kernel
[ 0.963925] Control: 50c5387d Table: 80004019 DAC: 00000015
[ 0.969907] Process swapper/0 (pid: 1, stack limit = 0xdf070240)
[ 0.976163] Stack: (0xdf071db8 to 0xdf072000)
[ 0.980699] 1da0: e0858c00 de2316d0
[ 0.989219] 1dc0: de2316bc 35318bf5 00000000 0000001d c052e7a8 c0326761 de2316e8 de2316bc
[ 0.997740] 1de0: 35318bf5 c00611f1 de231670 20000113 de2316e8 c0326927 35318bf5 00000000
[ 1.006259] 1e00: 00000000 00000004 df0d5410 de231000 df0d5400 c0325bab df0d8ac0 de231540
[ 1.014775] 1e20: c0893bb8 0000002b de231540 df0d5400 df0d5410 00000005 00000000 df0d5410
[ 1.023298] 1e40: e0858800 e0858a00 e0858a20 e0858a40 e0858a60 e08588c0 e08588e0 00000008
[ 1.031813] 1e60: 00000001 0000003c 4a102000 4a102000 00002000 00000010 00000001 de231298
[ 1.040338] 1e80: e0858d00 0000000a 00000400 00000002 00000020 00000008 df0d5410 c094362c
[ 1.048868] 1ea0: df0d5410 c08b2c40 00000000 c0829039 00000102 c0846d70 00000000 c02c82b1
[ 1.057381] 1ec0: c02c82a1 c02c7753 00000000 df0d5410 c08b2c40 df0d5444 00000000 c02c78b3
[ 1.065896] 1ee0: c08b2c40 c02c7869 00000000 c02c6887 df049478 df0c6180 c08b2c40 c08a8090
[ 1.074421] 1f00: de23d140 c02c7247 c0753554 c08b2c40 c08b2c40 df070000 c08d4180 00000000
[ 1.082937] 1f20: c0829039 c02c7bb5 00000000 c0833968 df070000 c08d4180 00000000 c0829039
[ 1.091461] 1f40: 00000102 c000867f 00000007 00000007 c088bc98 c0833964 c0833968 00000007
[ 1.099978] 1f60: c0833948 c08d4180 c080d1c9 c0846d70 00000000 c080d6a3 00000007 00000007
[ 1.108503] 1f80: c080d1c9 c0d60fc0 00000000 c04ccfb1 00000000 00000000 00000000 00000000
[ 1.117013] 1fa0: 00000000 c04ccfb7 00000000 c000c8fd 00000000 00000000 00000000 00000000
[ 1.125537] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[ 1.134055] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000
[ 1.142587] [<c0326468>] (cpts_fifo_read.constprop.1+0x18/0xc4) from [<c0326761>] (cpts_systim_read+0x11/0x7c)
[ 1.153018] [<c0326761>] (cpts_systim_read+0x11/0x7c) from [<c00611f1>] (timecounter_init+0x11/0x1c)
[ 1.162545] [<c00611f1>] (timecounter_init+0x11/0x1c) from [<c0326927>] (cpts_register+0xf3/0x1b8)
[ 1.171894] [<c0326927>] (cpts_register+0xf3/0x1b8) from [<c0325bab>] (cpsw_probe+0x823/0x960)
[ 1.180877] [<c0325bab>] (cpsw_probe+0x823/0x960) from [<c02c82b1>] (platform_drv_probe+0x11/0x14)
[ 1.190222] [<c02c82b1>] (platform_drv_probe+0x11/0x14) from [<c02c7753>] (driver_probe_device+0x53/0x168)
[ 1.200282] [<c02c7753>] (driver_probe_device+0x53/0x168) from [<c02c78b3>] (__driver_attach+0x4b/0x4c)
[ 1.210093] [<c02c78b3>] (__driver_attach+0x4b/0x4c) from [<c02c6887>] (bus_for_each_dev+0x27/0x48)
[ 1.219521] [<c02c6887>] (bus_for_each_dev+0x27/0x48) from [<c02c7247>] (bus_add_driver+0xe3/0x168)
[ 1.228949] [<c02c7247>] (bus_add_driver+0xe3/0x168) from [<c02c7bb5>] (driver_register+0x3d/0xc4)
[ 1.238289] [<c02c7bb5>] (driver_register+0x3d/0xc4) from [<c000867f>] (do_one_initcall+0x1f/0xf4)
[ 1.247630] [<c000867f>] (do_one_initcall+0x1f/0xf4) from [<c080d6a3>] (kernel_init_freeable+0xc3/0x158)
[ 1.257516] [<c080d6a3>] (kernel_init_freeable+0xc3/0x158) from [<c04ccfb7>] (kernel_init+0x7/0x98)
[ 1.266951] [<c04ccfb7>] (kernel_init+0x7/0x98) from [<c000c8fd>] (ret_from_fork+0x11/0x34)
[ 1.275659] Code: 2701 f100 09e8 6823 (6a1a) 07d3
[ 1.280655] ---[ end trace b2036333b4d03ad2 ]---
[ 1.285687] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
With a Debian Jessie image, Ker 4.4, board is booting normally.
Any idea how to solve this, is kindly appreciated.
Thank you.

As per my knowledge there was some issue with CTPS driver in kernel V3.13 (forget the exact kernel version). So it was an open issue. May be with newer kernel version they have fixed it. If you have the source code then try by disabling the CTPS driver CONFIG_TI_CPTS=n.

"unable to handle kernel null pointer derefernce at null" after trying to modprode driver

I have a script that initializes a driver on startup, which worked beautifully before I enabled kernel tracing and recompiled the kernel to try and debug an issue with a piece of software. If I try to initialize the driver in any way (modprobe, insmod, etc) this output prints to the screen:
[ 26.263308] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 26.263322] IP: [<c108664d>] trace_module_notify+0x16b/0x20a
[ 26.263325] *pde = 00000000
[ 26.263329] Oops: 0000 [#1] PREEMPT SMP
[ 26.263335] Modules linked in: phddrv(O+)
[ 26.263343] Pid: 704, comm: insmod Tainted: G O 3.6.3-rt9 #21 Advanced Digital Logic, Inc CB4053/ADLS15PC
[ 26.263346] EIP: 0060:[<c108664d>] EFLAGS: 00010213 CPU: 0
[ 26.263350] EIP is at trace_module_notify+0x16b/0x20a
[ 26.263353] EAX: ee6e9274 EBX: f082550c ECX: ee6e920c EDX: f082550c
[ 26.263356] ESI: 00000000 EDI: ee6e92dc EBP: ee6ebf4c ESP: ee6ebf24
[ 26.263359] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 26.263362] CR0: 8005003b CR2: 00000000 CR3: 2f2ea000 CR4: 000007d0
[ 26.263365] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 26.263367] DR6: ffff0ff0 DR7: 00000400
[ 26.263371] Process insmod (pid: 704, ti=ee6ea000 task=ef218000 task.ti=ee6ea000)
[ 26.263372] Stack:
[ 26.263381] ee6e9274 ee6e9344 ee6e92dc ee6e920c ee6e9274 ee6e9344 c2086424 c15a5d58
[ 26.263388] 00000000 00000001 ee6ebf68 c1046d33 f082550c c15a51bc c15a3778 00000000
[ 26.263396] c15a3790 ee6ebf8c c1046fa9 fffffffd 00000000 f082550c 00000001 f082550c
[ 26.263397] Call Trace:
[ 26.263407] [<c1046d33>] notifier_call_chain+0x2b/0x4d
[ 26.263413] [<c1046fa9>] __blocking_notifier_call_chain+0x3c/0x51
[ 26.263419] [<c1046fcf>] blocking_notifier_call_chain+0x11/0x13
[ 26.263426] [<c10671b7>] sys_init_module+0x57/0x190
[ 26.263434] [<c13a3d10>] sysenter_do_call+0x12/0x26
[ 26.263489] Code: 00 c7 42 04 64 5d 5a c1 89 15 64 5d 5a c1 89 45 ec 8d 42 74 83 c2 0c 89 45 e8 89 55 e4 eb 19 57 8b 4d e4 89 da ff 75 ec ff 75 e8 <8b> 06 83 c6 04 e8 c2 fb ff ff 83 c4 0c 3b 75 f0 72 e2 eb 77 b8
[ 26.263495] EIP: [<c108664d>] trace_module_notify+0x16b/0x20a SS:ESP 0068:ee6ebf24
[ 26.263497] CR2: 0000000000000000
[ 26.267381] ---[ end trace 0000000000000002 ]---
Any hint as to what is going on would be greatly appreciated!

I got similar issue as yours (almost the same stack trace of panic).
The root cause on my side is that after I changed the kernel config (enable trace point) I only rebuilt the kernel bzImage but forgot to rebuilt the ko modules! That may cause some execution mismatch between the new kernel and old ko modules.
After rebuild and update both kernel image and ko modules, the issue is gone.

Somewhere in the driver there is a NULL pointer. A pointer variabile has value NULL and the driver is trying to use it.
myPtr->value; /* if myPtr is NULL, this will raise the kernel oops */
You have to debug the driver to find where and why there is a NULL pointer

Unable to Debug following kernel crash triggered via SysRq

I am getting following Oops message while testing on a device which is running on Linux Kernel 3.4.5 and ARM processor.I am unable to trace the issue.
If you look at call stack , you would see it starting from ret_fast_syscall() , now actually sysrq for Crash has been triggered inside the code but i am not getting from where.How can i find that. I have Lauterbach installed but no idea from where to find which part of kernel code has actually triggered this SysRq.
[50728.239318] C0 [ sh] SysRq : Trigger a crash
[50728.239501] C0 [ sh] **************** READ GIC status
[50728.239654] C0 [ sh] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[50728.239776] C0 [ sh] pgd = ebc9c000
[50728.239929] C0 [ sh] [00000000] *pgd=00000000
[50728.240081] C0 [ sh] Internal error: Oops: 805 [#1] PREEMPT SMP ARM
[50728.240234] C0 [ sh] kona_fb: die notifier invoked
[50728.240356] C0 [ sh] Modules linked in: bcmdhd dm_crypt(O) moc_crypto(PO) moc_platform_mod(O) texfat(PO)
[50728.241088] C0 [ sh] CPU: 0 Tainted: P W O (3.4.5-g4192471-dirty #116)
[50728.241271] C0 [ sh] PC is at sysrq_handle_crash+0x14/0x20
[50728.241424] C0 [ sh] LR is at __handle_sysrq+0xa0/0x14c
[50728.241516] C0 [ sh] pc : [<c026ad48>] lr : [<c026b210>] psr: 60000093
[50728.241516] C0 [ sh] sp : ebd0ff20 ip : 0000000c fp : 4003a404
[50728.241821] C0 [ sh] r10: 4070002c r9 : edbc2f0c r8 : 00000000
[50728.241912] C0 [ sh] r7 : 60000013 r6 : 00000063 r5 : 00000007 r4 : c0a26b90
[50728.242065] C0 [ sh] r3 : 00000001 r2 : 00000000 r1 : c07ab3a2 r0 : 00000063
[50728.242248] C0 [ sh] Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user
[50728.242401] C0 [ sh] Control: 10c53c7d Table: adc9c06a DAC: 00000015
[50728.242492] C0 [ sh]
.................. (Register Contents).............
[50728.284851] C0 [ sh] Stack: (0xebd0ff20 to 0xebd10000)
[50728.284942] C0 [ sh] ff20: 00000002 c026b2bc ec2a4ef0 ebd0ff88 00000002 c026b2e0 edbc2ec0 c014c840
[50728.285125] C0 [ sh] ff40: 00000002 ec2a4ef0 4070002c ebd0ff88 00000002 ebd0e000 00000000 c0109d24
[50728.285308] C0 [ sh] ff60: ec2a4ef0 4070002c ec2a4ef0 4070002c 00000000 00000000 00000002 c0109f5c
[50728.285491] C0 [ sh] ff80: 00000001 00000000 00000000 00000000 00000003 00000002 00000001 00000004
[50728.285675] C0 [ sh] ffa0: c000e304 c000e140 00000003 00000002 00000001 4070002c 00000002 ffffffff
[50728.285858] C0 [ sh] ffc0: 00000003 00000002 00000001 00000004 4070002c 00000000 406ffc84 4003a404
[50728.286010] C0 [ sh] ffe0: 40035f38 bea5e710 40020257 40150e44 20000010 00000001 00000000 00000000
[50728.286224] C0 [ sh] [<c026ad48>] (sysrq_handle_crash+0x14/0x20) from [<c026b210>] (__handle_sysrq+0xa0/0x14c)
***[50728.286376] C0 [ sh] [<c026b210>] (__handle_sysrq+0xa0/0x14c) from [<c026b2e0>] (write_sysrq_trigger+0x24/0x34)
[50728.286560] C0 [ sh] [<c026b2e0>] (write_sysrq_trigger+0x24/0x34) from [<c014c840>] (proc_reg_write+0x80/0x94)
[50728.286682] C0 [ sh] [<c014c840>] (proc_reg_write+0x80/0x94) from [<c0109d24>] (vfs_write+0xb0/0x128)
[50728.286865] C0 [ sh] [<c0109d24>] (vfs_write+0xb0/0x128) from [<c0109f5c>] (sys_write+0x38/0x64)
[50728.287048] C0 [ sh] [<c0109f5c>] (sys_write+0x38/0x64) from [<c000e140>] (ret_fast_syscall+0x0/0x48)***
[50728.287200] C0 [ sh] Code: e3a03001 e5823000 f57ff04f e3a02000 (e5c23000)
[50728.287384] C0 [ sh] ---[ end trace 1b75b31a2719ed7e ]---
[50728.287475] C0 [ sh] Kernel panic - not syncing: Fatal exception

[50728.239318] C0 [ sh] SysRq : Trigger a crash
Someone at userland triggered a crash as you would do via echo c > /proc/sysrq-trigger.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

What do these Linux Kernel Oops fields mean? - linux-kernel

Correct me if I am wrong but, OOPS 0002 means no page found when writing in kernel mode: bit 0 == 0 means no page found, 1 means a protection fault bit 1 == 0 means read, 1 means write bit 2 == 0 means kernel, 1 means user-mode

Related

On rhel8 os When I run a program the kernel crash, how can I determine which line of code reports the error ？

How to use decode_stacktrace.sh?

Booting Debian Wheezy on a BeagleCore ? (Kernel panic - not syncing: Attempted to kill init!)

"unable to handle kernel null pointer derefernce at null" after trying to modprode driver

Unable to Debug following kernel crash triggered via SysRq

Categories

Resources