On rhel8 os When I run a program the kernel crash, how can I determine which line of code reports the error ？

On rhel8 os When I run a program the kernel crash, how can I determine which line of code reports the error ？ - linux-kernel

I don't know about kernel, I don't know how to troubleshoot.
When the problem happened, my system kernel crashed, the following is the vmcore-dmesg.txt log
##dmesg logs
[ 378.442884] SPDMD-LUN:[ERROR]lun_del:2218 device nvfile-mgmtd-0 is busy :2
[ 424.511211] XFS (nvfile-storage-0): Mounting V5 Filesystem
[ 424.513045] XFS (nvfile-storage-2): Mounting V5 Filesystem
[ 424.538953] XFS (nvfile-storage-2): Starting recovery (logdev: internal)
[ 424.546536] XFS (nvfile-storage-2): Ending recovery (logdev: internal)
[ 425.512217] XFS (nvfile-storage-0): Starting recovery (logdev: internal)
[ 425.518929] XFS (nvfile-storage-1): Mounting V5 Filesystem
[ 425.520813] XFS (nvfile-storage-3): Mounting V5 Filesystem
[ 426.288987] XFS (nvfile-storage-3): Starting recovery (logdev: internal)
[ 426.475465] XFS (nvfile-storage-1): Starting recovery (logdev: internal)
[ 427.496551] XFS (nvfile-storage-0): Ending recovery (logdev: internal)
[ 428.320932] XFS (nvfile-storage-3): Ending recovery (logdev: internal)
[ 428.977479] XFS (nvfile-storage-1): Ending recovery (logdev: internal)
[ 445.927118] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 445.927685] PGD 0 P4D 0
[ 445.928216] Oops: 0000 [#1] SMP NOPTI
[ 445.928723] CPU: 4 PID: 591 Comm: kworker/u193:1 Kdump: loaded Tainted: G OE --------- - - 4.18.0-305.el8.x86_64 #1
[ 445.929735] Hardware name: Lenovo ThinkSystem SR860 V2/7Z60CTO1WW, BIOS M5E118K-1.52 08/06/2021
[ 445.930305] Workqueue: xfs-cil/nvfile-storage- xlog_cil_push_work [xfs]
[ 445.930910] RIP: 0010:blk_queue_split+0x1c6/0x660
[ 445.933048] Code: c0 45 31 f6 89 44 24 44 89 7c 24 40 31 ff 85 db 0f 84 56 04 00 00 8b 44 24 44 48 89 c1 48 89 44 24 30 48 c1 e1 04 49 03 4d 78 <8b> 41 08 8b 71 0c 48 8b 11 44 29 f8 39 d8 48 89 54 24 48 0f 47 c3
[ 445.936323] RSP: 0018:ffffbcbf9a4bfaf8 EFLAGS: 00010246
[ 445.936711] RAX: 0000000000000000 RBX: 0000000000003000 RCX: 0000000000000000
[ 445.937028] RDX: 0000000000000018 RSI: 0000000000000018 RDI: 0000000000000000
[ 445.937343] RBP: ffffbcbf9a4bfb90 R08: 0000000000000000 R09: ffff940c1965d700
[ 445.937659] R10: 0000000000000000 R11: 0000000000000000 R12: ffff94ca6f590000
[ 445.937979] R13: ffff940bda0ddc80 R14: 0000000000000000 R15: 0000000000000000
[ 445.938304] FS: 0000000000000000(0000) GS:ffff940effd00000(0000) knlGS:0000000000000000

Related

How to use decode_stacktrace.sh?

Q1: admin-guide/bug-hunting.html says that:
If the kernel is compiled with CONFIG_DEBUG_INFO, you can enhance
the quality of the stack trace by using
file:scripts/decode_stacktrace.sh.
Is CONFIG_DEBUG_INFO a prerequisite for running the script decode_stacktrace.sh?"
Q2: this patch says:
./decode_stacktrace.sh vmlinux /home/sasha/linux/ < input.log > output.log
Where can I find the input.log? I know it's sort of stack info. Will it only be available when CONFIG_DEBUG_INFO is y or when there is a kernel panic or oops?

Yes, you will need CONFIG_DEBUG_INFO=y in order for decode_stacktrace.sh to extract useful debugging information (such as file names and line numbers) from the kernel image (vmlinux). In theory, the script will run fine even without debug info, but it will give you less information (no file names and line numbers).
NOTE: in recent kernels (>= v5.12) a multiple choice option for the DWARF version was added (CONFIG_DEBUG_INFO_DWARF{4,5,_TOOLCHAIN_DEFAULT}). Then, in v5.18 CONFIG_DEBUG_INFO was completely removed, and the multiple choice option is now the one responsible for enabling debug info. Any choice except CONFIG_DEBUG_INFO_NONE is fine as long as your toolchain supports it.
Where can I find the input.log?
The "log" they are referring to is simply the kernel log. This will be available regardless of CONFIG_DEBUG_INFO. Kernel panics and other OOPSes will usually write stack traces to the kernel log. You can take this directly from the console (if console logging is enabled), or using the dmesg command. See also this doc page for more info. Once you have the log, simply copy-paste it into a text file (input.log), and pass that file to the script's standard input with < input.log.
For an actual panic on the same system you are working on (and not simply inside QEMU or a VM), chances are that the logging can only be done to console, so you might want to enable that:
$ dmesg --console-level 7
$ dmesg --console-on
Take a look at Where are kernel panic logs? on Ask Ubuntu for more ways to capture the log in case of a panic.
As an example, here's the log for a crash generated running echo c > /proc/sysrq-trigger as root on a system running inside QEMU on my machine. I copy-pasted the kernel log into input.log, which contains the following:
[ 7.952685] sysrq: Trigger a crash
[ 7.952850] Kernel panic - not syncing: sysrq triggered crash
[ 7.953098] CPU: 0 PID: 71 Comm: linuxrc Not tainted 5.19.0-rc2 #1
[ 7.953259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 7.953655] Call Trace:
[ 7.954133] <TASK>
[ 7.954332] dump_stack_lvl+0x34/0x44
[ 7.954651] panic+0x102/0x27b
[ 7.954756] ? _printk+0x53/0x6a
[ 7.954847] sysrq_handle_crash+0x11/0x20
[ 7.954953] __handle_sysrq.cold+0x43/0x11b
[ 7.955065] write_sysrq_trigger+0x1f/0x30
[ 7.955167] proc_reg_write+0x4c/0x90
[ 7.955267] vfs_write+0xb4/0x290
[ 7.955362] ksys_write+0x5a/0xd0
[ 7.955453] do_syscall_64+0x3b/0x90
[ 7.955553] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
[ 7.956427] RSP: 002b:00007ffde8168508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7.956625] RAX: ffffffffffffffda RBX: 000000000101a8a0 RCX: 00000000004a8531
[ 7.956787] RDX: 0000000000000002 RSI: 00000000010201e0 RDI: 0000000000000001
[ 7.956949] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefefefeff62
[ 7.957113] R10: 00000000000001b6 R11: 0000000000000246 R12: 00000000010201e0
[ 7.957275] R13: 0000000000000002 R14: 00007ffde8168701 R15: 00007ffde8168578
[ 7.957467] </TASK>
[ 7.957806] Kernel Offset: 0x34a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7.958215] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
The only section that you really need to extract from the above is the following:
[ 7.954332] dump_stack_lvl+0x34/0x44
[ 7.954651] panic+0x102/0x27b
[ 7.954756] ? _printk+0x53/0x6a
[ 7.954847] sysrq_handle_crash+0x11/0x20
[ 7.954953] __handle_sysrq.cold+0x43/0x11b
[ 7.955065] write_sysrq_trigger+0x1f/0x30
[ 7.955167] proc_reg_write+0x4c/0x90
[ 7.955267] vfs_write+0xb4/0x290
[ 7.955362] ksys_write+0x5a/0xd0
[ 7.955453] do_syscall_64+0x3b/0x90
[ 7.955553] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
Running decode_stacktrace.sh poiting it to my vmlinux and the kernel source directory yields more info:
$ ./scripts/decode_stacktrace.sh /path/to/vmlinux /path/to/kernel-source-dir < input.log
[ 7.952685] sysrq: Trigger a crash
[ 7.952850] Kernel panic - not syncing: sysrq triggered crash
[ 7.953098] CPU: 0 PID: 71 Comm: linuxrc Not tainted 5.19.0-rc2 #1
[ 7.953259] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 7.953655] Call Trace:
[ 7.954133] <TASK>
[ 7.954332] dump_stack_lvl (/path/to/kernel-source-dir/lib/dump_stack.c:107 (discriminator 1))
[ 7.954651] panic (/path/to/kernel-source-dir/kernel/panic.c:292)
[ 7.954756] ? _printk (/path/to/kernel-source-dir/kernel/printk/printk.c:2426)
[ 7.954847] sysrq_handle_crash (/path/to/kernel-source-dir/drivers/tty/sysrq.c:155)
[ 7.954953] __handle_sysrq.cold (/path/to/kernel-source-dir/drivers/tty/sysrq.c:626)
[ 7.955065] write_sysrq_trigger (/path/to/kernel-source-dir/drivers/tty/sysrq.c:1168)
[ 7.955167] proc_reg_write (/path/to/kernel-source-dir/fs/proc/inode.c:335 /path/to/kernel-source-dir/fs/proc/inode.c:347)
[ 7.955267] vfs_write (/path/to/kernel-source-dir/fs/read_write.c:589)
[ 7.955362] ksys_write (/path/to/kernel-source-dir/fs/read_write.c:644)
[ 7.955453] do_syscall_64 (/path/to/kernel-source-dir/arch/x86/entry/common.c:50 /path/to/kernel-source-dir/arch/x86/entry/common.c:80)
[ 7.955553] entry_SYSCALL_64_after_hwframe (/path/to/kernel-source-dir/arch/x86/entry/entry_64.S:115)
[ 7.955773] RIP: 0033:0x4a8531
[ 7.955999] Code: e0 ff ff ff f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b3 0f 1f 80 00 00 00 00 8b 05 d2 26 1e 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 c3 66 0f 1f 44 00 00 8
All code
========
0: e0 ff loopne 0x1
2: ff (bad)
3: ff f7 push %rdi
5: d8 64 89 02 fsubs 0x2(%rcx,%rcx,4)
9: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
10: eb b3 jmp 0xffffffffffffffc5
12: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
19: 8b 05 d2 26 1e 00 mov 0x1e26d2(%rip),%eax # 0x1e26f1
1f: 85 c0 test %eax,%eax
21: 75 16 jne 0x39
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 57 ja 0x89
32: c3 retq
33: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
39: 08 .byte 0x8
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 57 ja 0x5f
8: c3 retq
9: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
f: 08 .byte 0x8
[ 7.956427] RSP: 002b:00007ffde8168508 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7.956625] RAX: ffffffffffffffda RBX: 000000000101a8a0 RCX: 00000000004a8531
[ 7.956787] RDX: 0000000000000002 RSI: 00000000010201e0 RDI: 0000000000000001
[ 7.956949] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefefefeff62
[ 7.957113] R10: 00000000000001b6 R11: 0000000000000246 R12: 00000000010201e0
[ 7.957275] R13: 0000000000000002 R14: 00007ffde8168701 R15: 00007ffde8168578
[ 7.957467] </TASK>
[ 7.957806] Kernel Offset: 0x34a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[ 7.958215] ---[ end Kernel panic - not syncing: sysrq triggered crash ]---

Regarding Q2, I setup a serial console where I can get all the kernel messages, including the input.log.
Below I list the steps to setup a serial console.
Components: 1x Linux machine, 1 x PC with Win10 and Putty, 1x USB to Serial Converter
Your Win10 should automatically recognize the USB2Serial converter and install the driver. The converter appears in Device Manager, in my case COM6.
Setup Putty, Serial, baud-rate 115200, Databits 8, Stopbits 1, Parity none, Flowcontrol XON/XOFF.
Make sure that serial port is enabled in Linux machine.
$ dmesg | grep tty
[ 0.108687] printk: console [tty0] enabled
[ 2.752397] 00:03:ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[ 2.773666] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
It shows that both ttyS0(COM1) and ttyS1(COM2) work fine. In my case I choose ttyS1.
Use getty to manage ttyS1.
sudo /sbin/getty -L 115200 ttyS1 vt102
Then the Putty would prompt login, just input your username and password to get into the command line interface.
But still at this stage, the ttyS1 is another pseudo-terminal, but not a console. You need to configure it as a console.
Make sure that you've already compiled in the serail port in your Linux kernel image. make menuconfig : Device driver ‣ Character devices ‣ Serial drivers ‣ 8250/16550 and compatible serial support ‣ Console on 8250/16550 and compatible serial port: choose built-in. If it's alreay that, leave it. Otherwise, change the configuration, rebuilt the kernel, and reboot.
Check the current console.
$cat /sys/devices/virtual/tty/console/active
tty0
It means that current console is tty0. We need to add ttyS1 as a console.
Add console=ttyS1 in Linux kernel boot argument.
$sudo vim /etc/default/grub
Change the line GRUB_CMDLINE_LINUX_DEFAULT="quiet splash" to GRUB_CMDLINE_LINUX_DEFAULT="quiet console=ttyS1,115200 splash"
Then
$sudo update-grub
$sudo reboot
Check the current console.
$ cat /sys/devices/virtual/tty/console/active
ttyS1
That's it.
Call getty again.
sudo /sbin/getty -L 115200 ttyS1 vt102
Then Putty would get the login prompt and after you login, it becomes a real console.

Where can I load a GPIO module at the earliest?

I wrote a kernel module which works as expected. But I want that to be loaded at the beginning of the boot process. So I moved this code to
OpenWRT/build_dir/target-i386_geode_eglibc-2.19/linux-x86_alix2/linux-3.10.49/arch/x86/platform
and my code is here:
#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/gpio.h>
#include <linux/delay.h>
#include <linux/cs5535.h>
MODULE_AUTHOR("Ramana");
MODULE_DESCRIPTION("POWER LED DRIVER");
#ifdef MODULE_LICENSE
MODULE_LICENSE("Dual BSD/GPL");
#endif
#define HW_VERSION_GPIO 15
#define LATCH_GPIO 6
#define DATA_GPIO 25
#define CLOCK_GPIO 27
#define HIGH 1
#define LOW 0
static void set_power_led(void)
{
uint8_t i = 0;
/*
* Configure Pins Q8 Q7 Q6....Q0 in shift register
* Set 0 to glow LED
*
* shift_reg:
* indices 0, 1 and 2 are for LED3
* 0 1 1 -> Blue_ON, GREEN_OFF, RED_OFF
*
* indices 3, 4 and 5 are for LED1
*
* indices 6, 7 and 8 are for LED2
* 0 1 1 -> Blue_ON, GREEN_OFF, RED_OFF
*
* The pins Q9 Q10 and Q11 are don't care, so we are not using here
*/
uint8_t shift_reg[9] = {0, 1, 1, 1, 1, 1, 1, 1, 1};
/*
* Clear register before set
*/
for (i = 0; i < 9 ; i ++) {
gpio_set_value(CLOCK_GPIO, LOW);
if (shift_reg[i] == 0) {
gpio_set_value(DATA_GPIO, HIGH);
} else {
gpio_set_value(DATA_GPIO, LOW);
}
gpio_set_value(CLOCK_GPIO, HIGH);
}
gpio_set_value(LATCH_GPIO, HIGH);
msleep(1);
gpio_set_value(LATCH_GPIO, LOW);
}
static int __init power_led_init(void)
{
/*
* If GPIO 15 is high, it is old hardware
*/
printk(KERN_INFO "LED INIT\n");
if (!gpio_is_valid(LATCH_GPIO)) {
printk(KERN_INFO "LEDs: Latch gpio is not valid\n");
return -ENODEV;
}
if (!gpio_is_valid(DATA_GPIO)) {
printk(KERN_INFO "LEDs: Data gpio is not valid\n");
return -ENODEV;
}
if (!gpio_is_valid(CLOCK_GPIO)) {
printk(KERN_INFO "LEDs: Clock gpio is not valid\n");
return -ENODEV;
}
gpio_request(LATCH_GPIO, "sysfs");
gpio_request(DATA_GPIO, "sysfs");
gpio_request(CLOCK_GPIO, "sysfs");
gpio_direction_output(LATCH_GPIO, LOW);
gpio_direction_output(DATA_GPIO, LOW);
gpio_direction_output(CLOCK_GPIO, LOW);
set_power_led();
printk(KERN_INFO "Power LED: registered\n");
return 0;
}
static void __exit power_led_exit(void)
{
uint8_t i;
for (i = 0; i < 9 ; i++) {
gpio_set_value(CLOCK_GPIO, LOW);
gpio_set_value(DATA_GPIO, LOW);
gpio_set_value(CLOCK_GPIO, HIGH);
}
gpio_set_value(LATCH_GPIO, HIGH);
msleep(1);
gpio_set_value(LATCH_GPIO, LOW);
}
module_init(power_led_init);
module_exit(power_led_exit);
With this there is a kernel panic:
[ 0.104709] LED INIT
[ 0.105306] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[ 0.106284] IP: [<c115acc2>] __gpio_set_value+0x12/0x80
[ 0.106284] *pde = 00000000
[ 0.106284] Oops: 0000 [#1]
[ 0.106284] Modules linked in:
[ 0.106284] CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.49 #33
[ 0.106284] task: cf834000 ti: cf840000 task.ti: cf840000
[ 0.106284] EIP: 0060:[<c115acc2>] EFLAGS: 00010286 CPU: 0
[ 0.106284] EIP is at __gpio_set_value+0x12/0x80
[ 0.106284] EAX: c13a1624 EBX: c13a1624 ECX: ffffffea EDX: 00000000
[ 0.106284] ESI: 00000000 EDI: 00000000 EBP: cf841f80 ESP: cf841f24
[ 0.106284] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 0.106284] CR0: 8005003b CR2: 0000004c CR3: 01371000 CR4: 00000090
[ 0.106284] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.106284] DR6: ffff0ff0 DR7: 00000400
[ 0.106284] Stack:
[ 0.106284] cf841f3b cf841f44 0000003e c133b05e c12b0d03 006e6967 01010101 01010101
[ 0.106284] 00000000 c133afbb c1000172 cfdff401 00060006 c13019c0 c12ced19 cfdff460
[ 0.106284] 00000000 cfdff460 00000200 c114956a c136c480 00000006 0000003e cf840000
[ 0.106284] Call Trace:
[ 0.106284] [<c133b05e>] ? power_led_init+0xa3/0x112
[ 0.106284] [<c133afbb>] ? alix_init+0xf6/0xf6
[ 0.106284] [<c1000172>] ? do_one_initcall+0xb2/0x150
[ 0.106284] [<c114956a>] ? strcpy+0xa/0x20
[ 0.106284] [<c132da22>] ? kernel_init_freeable+0xd1/0x173
[ 0.106284] [<c132d4aa>] ? do_early_param+0x77/0x77
[ 0.106284] [<c124ad68>] ? kernel_init+0x8/0x170
[ 0.106284] [<c1251322>] ? ret_from_kernel_thread+0x6/0x28
[ 0.106284] [<c1251337>] ? ret_from_kernel_thread+0x1b/0x28
[ 0.106284] [<c124ad60>] ? rest_init+0x60/0x60
[ 0.106284] Code: d6 ab aa aa aa ff d1 5b 5e 5f c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 57 f
[ 0.106284] EIP: [<c115acc2>] __gpio_set_value+0x12/0x80 SS:ESP 0068:cf841f24
[ 0.106284] CR2: 000000000000004c
[ 0.106284] ---[ end trace 23021a4cac17faa2 ]---
[ 0.107751] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 0.107751]
Why this is crashing here and where can I add this to make this module loaded at the earliest.
Full minicom log
PC Engines ALIX.3 v0.99h
640 KB Base Memory
153603174448128645128089697280113664130048146432162816179200195584211968228352244736261120 KB Extended Memory
01F0 Master 045A InnoDisk Corp. - iCF4000 8GB
Phys C/H/S 16000/16/63 Log C/H/S 1003/255/63 LBA
GRUB loading....
Booting `OpenWrt'
[ 0.000000] Linux version 3.10.49 (savari#Ramana) (gcc version 4.8.3 (OpenWrt/Linaro GCC 4.8-2014.04 unknown) ) #40 Tue Nov 8 13:11:49 IST 2016
[ 0.000000] e820: BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009ffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000000f0000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000000fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
[ 0.000000] Notice: NX (Execute Disable) protection missing in CPU!
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820: last_pfn = 0x10000 max_arch_pfn = 0x100000
[ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff]
[ 0.000000] init_memory_mapping: [mem 0x0fc00000-0x0fffffff]
[ 0.000000] init_memory_mapping: [mem 0x08000000-0x0fbfffff]
[ 0.000000] init_memory_mapping: [mem 0x00100000-0x07ffffff]
[ 0.000000] 256MB LOWMEM available.
[ 0.000000] mapped low ram: 0 - 10000000
[ 0.000000] low ram: 0 - 10000000
[ 0.000000] Zone ranges:
[ 0.000000] DMA [mem 0x00001000-0x00ffffff]
[ 0.000000] Normal [mem 0x01000000-0x0fffffff]
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x00001000-0x0009ffff]
[ 0.000000] node 0: [mem 0x00100000-0x0fffffff]
[ 0.000000] Using APIC driver default
[ 0.000000] No local APIC present or hardware disabled
[ 0.000000] APIC: disable apic facility
[ 0.000000] APIC: switched to apic NOOP
[ 0.000000] e820: [mem 0x10000000-0xffefffff] available for PCI devices
[ 0.000000] Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64927
[ 0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz block2mtd.block2mtd=/dev/hda2,131072,rootfs,5 root=/dev/mtdblock0 rootfstype=jffs2 rootwait console=tty0 console=ttyS0,38400n8 noinitrd
[ 0.000000] PID hash table entries: 1024 (order: 0, 4096 bytes)
[ 0.000000] Dentry cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.000000] Inode-cache hash table entries: 16384 (order: 4, 65536 bytes)
[ 0.000000] Initializing CPU#0
[ 0.000000] Memory: 255600k/262144k available (2377k kernel code, 6156k reserved, 873k data, 260k init, 0k highmem)
[ 0.000000] virtual kernel memory layout:
[ 0.000000] fixmap : 0xfffa3000 - 0xfffff000 ( 368 kB)
[ 0.000000] vmalloc : 0xd0800000 - 0xfffa1000 ( 759 MB)
[ 0.000000] lowmem : 0xc0000000 - 0xd0000000 ( 256 MB)
[ 0.000000] .init : 0xc132d000 - 0xc136e000 ( 260 kB)
[ 0.000000] .data : 0xc1252630 - 0xc132cd00 ( 873 kB)
[ 0.000000] .text : 0xc1000000 - 0xc1252630 (2377 kB)
[ 0.000000] Checking if this processor honours the WP bit even in supervisor mode...Ok.
[ 0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[ 0.000000] NR_IRQS:2304 nr_irqs:256 16
[ 0.000000] console [ttyS0] enabled
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 498.062 MHz processor
[ 0.003005] Calibrating delay loop (skipped), value calculated using timer frequency.. 996.12 BogoMIPS (lpj=498062)
[ 0.005013] pid_max: default: 32768 minimum: 301
[ 0.007653] Mount-cache hash table entries: 512
[ 0.011056] Last level iTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.011056] Last level dTLB entries: 4KB 0, 2MB 0, 4MB 0
[ 0.011056] tlb_flushall_shift: -1
[ 0.012009] CPU: Geode(TM) Integrated Processor by AMD PCS (fam: 05, model: 0a, stepping: 02)
[ 0.018327] Performance Events: no PMU driver, software events only.
[ 0.026472] NET: Registered protocol family 16
[ 0.030862] PCI: PCI BIOS revision 2.10 entry at 0xfced9, last bus=0
[ 0.031012] PCI: Using configuration type 1 for base access
[ 0.051103] bio: create slab <bio-0> at 0
[ 0.057403] SCSI subsystem initialized
[ 0.060534] PCI: Probing PCI hardware
[ 0.062281] PCI host bridge to bus 0000:00
[ 0.063030] pci_bus 0000:00: root bus resource [io 0x0000-0xffff]
[ 0.064033] pci_bus 0000:00: root bus resource [mem 0x00000000-0xffffffff]
[ 0.065020] pci_bus 0000:00: No busn resource found for root bus, will use [bus 00-ff]
[ 0.078410] Switching to clocksource pit
[ 0.086307] NET: Registered protocol family 2
[ 0.088599] TCP established hash table entries: 2048 (order: 2, 16384 bytes)
[ 0.089847] TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
[ 0.090949] TCP: Hash tables configured (established 2048 bind 2048)
[ 0.092136] TCP: reno registered
[ 0.093867] UDP hash table entries: 256 (order: 0, 4096 bytes)
[ 0.095398] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[ 0.096873] NET: Registered protocol family 1
[ 0.100630] platform rtc_cmos: registered platform RTC device (no PNP device found)
[ 0.104665] alix: system is recognized as "PC Engines ALIX.3 v0.99h"
[ 0.106619] LED INIT
[ 0.107229] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[ 0.108207] IP: [<c115acc2>] __gpio_set_value+0x12/0x80
[ 0.108207] *pde = 00000000
[ 0.108207] Oops: 0000 [#1]
[ 0.108207] Modules linked in:
[ 0.108207] CPU: 0 PID: 1 Comm: swapper Not tainted 3.10.49 #40
[ 0.108207] task: cf834000 ti: cf840000 task.ti: cf840000
[ 0.108207] EIP: 0060:[<c115acc2>] EFLAGS: 00010286 CPU: 0
[ 0.108207] EIP is at __gpio_set_value+0x12/0x80
[ 0.108207] EAX: c13a1624 EBX: c13a1624 ECX: ffffffea EDX: 00000000
[ 0.108207] ESI: 00000000 EDI: 00000000 EBP: cf841f80 ESP: cf841f24
[ 0.108207] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068
[ 0.108207] CR0: 8005003b CR2: 0000004c CR3: 01371000 CR4: 00000090
[ 0.108207] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 0.108207] DR6: ffff0ff0 DR7: 00000400
[ 0.108207] Stack:
[ 0.108207] cf841f3b cf841f44 0000003e c133b031 c12b0deb 006e6967 01010101 01010101
[ 0.108207] 00000000 c133afbb c1000172 00000001 00060006 c1301bf0 c12ceeb1 cfdff460
[ 0.108207] 00000000 cfdff460 00000200 00000000 c136c480 00000006 0000003e cf840000
[ 0.108207] Call Trace:
[ 0.108207] [<c133b031>] ? power_led_init+0x76/0xe5
[ 0.108207] [<c133afbb>] ? alix_init+0xf6/0xf6
[ 0.108207] [<c1000172>] ? do_one_initcall+0xb2/0x150
[ 0.108207] [<c132da22>] ? kernel_init_freeable+0xd1/0x173
[ 0.108207] [<c132d4aa>] ? do_early_param+0x77/0x77
[ 0.108207] [<c124b058>] ? kernel_init+0x8/0x170
[ 0.108207] [<c1251622>] ? ret_from_kernel_thread+0x6/0x28
[ 0.108207] [<c1251637>] ? ret_from_kernel_thread+0x1b/0x28
[ 0.108207] [<c124b050>] ? rest_init+0x60/0x60
[ 0.108207] Code: d6 ab aa aa aa ff d1 5b 5e 5f c3 8d b4 26 00 00 00 00 8d bc 27 00 00 00 00 57 89 d7 56 53 e8 46 f4 ff ff 85 c0 89 c3 74 5d 8b 30 <f6> 46 4c 01 74 10 ba 8a 07 00 00 b8 5c da 2c c1 e8 b9 bb ec ff
[ 0.108207] EIP: [<c115acc2>] __gpio_set_value+0x12/0x80 SS:ESP 0068:cf841f24
[ 0.108207] CR2: 000000000000004c
[ 0.108207] ---[ end trace 7b3836317c1bee78 ]---
[ 0.108879] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[ 0.108879]

I think you have put this in kernel source directory. You should place this in drivers/gpio/ in kernel source. Then add the entries in Kconfig and Makefile of drivers/gpio/ directory. In Kconfig, you can specify the dependency. Selecting it with make menuconfig, you will be able to compile this module as a part of kernel, and it will be loaded at boot-time when kernel is loaded. You can decrease the time further in this case using early_initcall() instead of module_init().
If you are not putting this at kernel source (i.e. not building the driver as part of kernel), then you should call this using "insmod my_module.ko" in a shell script, and put it in init.d and call it at your desired runlevel.

hadoop cause system crash with "soft lock" and "hard lock"

I am running hadoop2.2 on redhat6.3-6.5,and all of my machines crashed after a while. /var/log/messages shows repeatedly:
Aug 11 06:30:42 jn4_73_128 kernel: BUG: soft lockup - CPU#1 stuck for 67s! [jsvc:11508]
Aug 11 06:30:42 jn4_73_128 kernel: Modules linked in: bridge stp llc iptable_filter ip_tables mptctl mptbase xfs exportfs power_meter microcode dcdbas serio_raw iTCO_w
dt iTCO_vendor_support i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif wmi mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_m
od [last unloaded: scsi_wait_scan]
Aug 11 06:30:42 jn4_73_128 kernel: CPU 1
Aug 11 06:30:42 jn4_73_128 kernel: Modules linked in: bridge stp llc iptable_filter ip_tables mptctl mptbase xfs exportfs power_meter microcode dcdbas serio_raw iTCO_w
dt iTCO_vendor_support i7core_edac edac_core sg bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif wmi mpt2sas scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_m
od [last unloaded: scsi_wait_scan]
Aug 11 06:30:42 jn4_73_128 kernel:
Aug 11 06:30:42 jn4_73_128 kernel: Pid: 11508, comm: jsvc Tainted: G W --------------- 2.6.32-279.el6.x86_64 #1 Dell Inc. PowerEdge R510/084YMW
Aug 11 06:30:42 jn4_73_128 kernel: RIP: 0010:[<ffffffff8104d088>] [<ffffffff8104d088>] wait_for_rqlock+0x28/0x40
Aug 11 06:30:42 jn4_73_128 kernel: RSP: 0018:ffff8807786c3ee8 EFLAGS: 00000202
Aug 11 06:30:42 jn4_73_128 kernel: RAX: 00000000f6e9f6e1 RBX: ffff8807786c3ee8 RCX: ffff880028216680
Aug 11 06:30:42 jn4_73_128 kernel: RDX: 00000000fffff6e9 RSI: ffff88061cd29370 RDI: 0000000000000286
Aug 11 06:30:42 jn4_73_128 kernel: RBP: ffffffff8100bc0e R08: 0000000000000001 R09: 0000000000000001
Aug 11 06:30:42 jn4_73_128 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000286
Aug 11 06:30:42 jn4_73_128 kernel: R13: ffff8807786c3eb8 R14: ffffffff810e0f6e R15: ffff8807786c3e48
Aug 11 06:30:42 jn4_73_128 kernel: FS: 0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
Aug 11 06:30:42 jn4_73_128 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 11 06:30:42 jn4_73_128 kernel: CR2: 0000000000e5bd70 CR3: 0000000001a85000 CR4: 00000000000006e0
Aug 11 06:30:42 jn4_73_128 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Aug 11 06:30:42 jn4_73_128 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Aug 11 06:30:42 jn4_73_128 kernel: Process jsvc (pid: 11508, threadinfo ffff8807786c2000, task ffff880c1def3500)
Aug 11 06:30:42 jn4_73_128 kernel: Stack:
Aug 11 06:30:42 jn4_73_128 kernel: ffff8807786c3f68 ffffffff8107091b 0000000000000000 ffff8807786c3f28
Aug 11 06:30:42 jn4_73_128 kernel: <d> ffff880701735260 ffff880c1def39c8 ffff880c1def39c8 0000000000000000
Aug 11 06:30:42 jn4_73_128 kernel: <d> ffff8807786c3f28 ffff8807786c3f28 ffff8807786c3f78 00007f092d0ad700
Aug 11 06:30:42 jn4_73_128 kernel: Call Trace:
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff8107091b>] ? do_exit+0x5ab/0x870
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff81070ce7>] ? sys_exit+0x17/0x20
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
Aug 11 06:30:42 jn4_73_128 kernel: Code: ff ff 90 55 48 89 e5 0f 1f 44 00 00 48 c7 c0 80 66 01 00 65 48 8b 0c 25 b0 e0 00 00 0f ae f0 48 01 c1 eb 09 0f 1f 80 00 00 00 00 <f3> 90 8b 01 89 c2 c1 fa 10 66 39 c2 75 f2 c9 c3 0f 1f 84 00 00
Aug 11 06:30:42 jn4_73_128 kernel: Call Trace:
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff8107091b>] ? do_exit+0x5ab/0x870
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff81070ce7>] ? sys_exit+0x17/0x20
Aug 11 06:30:42 jn4_73_128 kernel: [<ffffffff8100b0f2>] ? system_call_fastpath+0x16/0x1b
</em>
and finally crashed
crash /usr/lib/debug/lib/modules/2.6.32-431.5.1.el6.x86_64/vmlinux /opt/crash/127.0.0.1-2014-08-10-09\:47\:38/vmcore
crash 6.1.0-5.el6
Copyright (C) 2002-2012 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.3.1
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
please wait... (determining panic task)
WARNING: active task ffff881071850040 on cpu 12 not found in PID hash
KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.5.1.el6.x86_64/vmlinux
DUMPFILE: /opt/crash/127.0.0.1-2014-08-10-09:47:38/vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Sun Aug 10 09:47:32 2014
UPTIME: 7 days, 16:00:19
LOAD AVERAGE: 11.01, 3.11, 1.08
TASKS: 724
NODENAME: master1.otocyon.com
RELEASE: 2.6.32-431.5.1.el6.x86_64
VERSION: #1 SMP Fri Jan 10 14:46:43 EST 2014
MACHINE: x86_64 (1895 Mhz)
MEMORY: 64 GB
PANIC: "Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 0"
PID: 23976
COMMAND: "sh"
TASK: ffff881071850aa0 [THREAD_INFO: ffff880a05c80000]
CPU: 0
STATE: TASK_INTERRUPTIBLE (PANIC)
crash> bt
PID: 23976 TASK: ffff881071850aa0 CPU: 0 COMMAND: "sh"
#0 [ffff880028207b50] machine_kexec at ffffffff81038f3b
#1 [ffff880028207bb0] crash_kexec at ffffffff810c5d82
#2 [ffff880028207c80] panic at ffffffff8152751a
#3 [ffff880028207d00] watchdog_overflow_callback at ffffffff810e696d
#4 [ffff880028207d20] __perf_event_overflow at ffffffff8111c847
#5 [ffff880028207da0] perf_event_overflow at ffffffff8111ce14
#6 [ffff880028207db0] intel_pmu_handle_irq at ffffffff81022d87
#7 [ffff880028207e90] perf_event_nmi_handler at ffffffff8152bd69
#8 [ffff880028207ea0] notifier_call_chain at ffffffff8152d825
#9 [ffff880028207ee0] atomic_notifier_call_chain at ffffffff8152d88a
#10 [ffff880028207ef0] notify_die at ffffffff810a153e
#11 [ffff880028207f20] do_nmi at ffffffff8152b4eb
#12 [ffff880028207f50] nmi at ffffffff8152adb0
[exception RIP: task_rq_unlock_wait+44]
RIP: ffffffff810534fc RSP: ffff880a05c81dc8 RFLAGS: 00000016
RAX: 000000000ec70ebe RBX: ffff881071850040 RCX: ffff8800282d6840
RDX: 0000000000000ec7 RSI: 0000000000000000 RDI: ffff881071850040
RBP: ffff880a05c81dc8 R8: dead000000200200 R9: dead000000200200
R10: ffff8810734a42d0 R11: 0000000000000246 R12: 00000000000114b8
R13: ffff8810734a4180 R14: ffff881071fd3440 R15: ffff881071fd3c48
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#13 [ffff880a05c81dc8] task_rq_unlock_wait at ffffffff810534fc
#14 [ffff880a05c81dd0] release_task at ffffffff81075454
#15 [ffff880a05c81e10] wait_consider_task at ffffffff81075fb6
#16 [ffff880a05c81e80] do_wait at ffffffff810763e6
#17 [ffff880a05c81ee0] sys_wait4 at ffffffff810765d3
#18 [ffff880a05c81f80] system_call_fastpath at ffffffff8100b072
RIP: 0000003e1a2ac8be RSP: 00007fffa58c6330 RFLAGS: 00010207
RAX: 000000000000003d RBX: ffffffff8100b072 RCX: 0000003e1a232be0
RDX: 0000000000000000 RSI: 00007fffa58c62ec RDI: ffffffffffffffff
RBP: 00000000ffffffff R8: 000000000203b8d0 R9: 000000000203d590
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000005d00
ORIG_RAX: 000000000000003d CS: 0033 SS: 002b
It happened on machines from different vendors,and I have tried to update to the latest kernel from redhat.
Can anyone with the same experience help?

"unable to handle kernel null pointer derefernce at null" after trying to modprode driver

I have a script that initializes a driver on startup, which worked beautifully before I enabled kernel tracing and recompiled the kernel to try and debug an issue with a piece of software. If I try to initialize the driver in any way (modprobe, insmod, etc) this output prints to the screen:
[ 26.263308] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 26.263322] IP: [<c108664d>] trace_module_notify+0x16b/0x20a
[ 26.263325] *pde = 00000000
[ 26.263329] Oops: 0000 [#1] PREEMPT SMP
[ 26.263335] Modules linked in: phddrv(O+)
[ 26.263343] Pid: 704, comm: insmod Tainted: G O 3.6.3-rt9 #21 Advanced Digital Logic, Inc CB4053/ADLS15PC
[ 26.263346] EIP: 0060:[<c108664d>] EFLAGS: 00010213 CPU: 0
[ 26.263350] EIP is at trace_module_notify+0x16b/0x20a
[ 26.263353] EAX: ee6e9274 EBX: f082550c ECX: ee6e920c EDX: f082550c
[ 26.263356] ESI: 00000000 EDI: ee6e92dc EBP: ee6ebf4c ESP: ee6ebf24
[ 26.263359] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 26.263362] CR0: 8005003b CR2: 00000000 CR3: 2f2ea000 CR4: 000007d0
[ 26.263365] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 26.263367] DR6: ffff0ff0 DR7: 00000400
[ 26.263371] Process insmod (pid: 704, ti=ee6ea000 task=ef218000 task.ti=ee6ea000)
[ 26.263372] Stack:
[ 26.263381] ee6e9274 ee6e9344 ee6e92dc ee6e920c ee6e9274 ee6e9344 c2086424 c15a5d58
[ 26.263388] 00000000 00000001 ee6ebf68 c1046d33 f082550c c15a51bc c15a3778 00000000
[ 26.263396] c15a3790 ee6ebf8c c1046fa9 fffffffd 00000000 f082550c 00000001 f082550c
[ 26.263397] Call Trace:
[ 26.263407] [<c1046d33>] notifier_call_chain+0x2b/0x4d
[ 26.263413] [<c1046fa9>] __blocking_notifier_call_chain+0x3c/0x51
[ 26.263419] [<c1046fcf>] blocking_notifier_call_chain+0x11/0x13
[ 26.263426] [<c10671b7>] sys_init_module+0x57/0x190
[ 26.263434] [<c13a3d10>] sysenter_do_call+0x12/0x26
[ 26.263489] Code: 00 c7 42 04 64 5d 5a c1 89 15 64 5d 5a c1 89 45 ec 8d 42 74 83 c2 0c 89 45 e8 89 55 e4 eb 19 57 8b 4d e4 89 da ff 75 ec ff 75 e8 <8b> 06 83 c6 04 e8 c2 fb ff ff 83 c4 0c 3b 75 f0 72 e2 eb 77 b8
[ 26.263495] EIP: [<c108664d>] trace_module_notify+0x16b/0x20a SS:ESP 0068:ee6ebf24
[ 26.263497] CR2: 0000000000000000
[ 26.267381] ---[ end trace 0000000000000002 ]---
Any hint as to what is going on would be greatly appreciated!

I got similar issue as yours (almost the same stack trace of panic).
The root cause on my side is that after I changed the kernel config (enable trace point) I only rebuilt the kernel bzImage but forgot to rebuilt the ko modules! That may cause some execution mismatch between the new kernel and old ko modules.
After rebuild and update both kernel image and ko modules, the issue is gone.

Somewhere in the driver there is a NULL pointer. A pointer variabile has value NULL and the driver is trying to use it.
myPtr->value; /* if myPtr is NULL, this will raise the kernel oops */
You have to debug the driver to find where and why there is a NULL pointer

What do these Linux Kernel Oops fields mean?

I have already encountered some Oops in my developer's life and whereas I am familiar with some information that I can retrieve from these Oops, there are still pieces of information I can't understand and therefore, can't use to solve problems.
Below you will find an Oops example and I will describe what I can deduce from it. Then, I will ask what the remaining info can teach me about the problem.
[ 716.485951] BUG: unable to handle kernel paging request at fc132158
[ 716.485973] IP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[ 716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000
[ 716.485997] Oops: 0002 [#1] SMP
[ 716.486004] Modules linked in: ubi(O) mtdchar nandsim nand mtd nand_ids nand_bch bch nand_ecc bnep rfcomm bluetooth parport_pc ppdev lp parport nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc binfmt_misc dm_crypt snd_hda_codec_hdmi snd_hda_codec_analog kvm_intel snd_hda_intel snd_hda_codec snd_hwdep kvm snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event hid_generic snd_seq cdc_acm snd_timer snd_seq_device mei tpm_tis snd mac_hid serio_raw soundcore lpc_ich snd_page_alloc microcode coretemp usbhid hid nouveau usb_storage ttm drm_kms_helper drm floppy e1000e i2c_algo_bit mxm_wmi video wmi
[ 716.486128] Pid: 3994, comm: ubimkvol Tainted: G O 3.8.0-rc3+ #3 LENOVO 6239AS8/LENOVO
[ 716.486136] EIP: 0060:[<fc1936e7>] EFLAGS: 00010246 CPU: 0
[ 716.486144] EIP is at ubi_change_vtbl_record+0x87/0x1c0 [ubi]
[ 716.486151] EAX: 000000ac EBX: eb5ea000 ECX: 0000002b EDX: 00000000
[ 716.486157] ESI: eb4d1d74 EDI: fc132158 EBP: eb4d1d40 ESP: eb4d1d20
[ 716.486164] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[ 716.486170] CR0: 8005003b CR2: fc132158 CR3: 27542000 CR4: 000407f0
[ 716.486176] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 716.486183] DR6: ffff0ff0 DR7: 00000400
[ 716.486188] Process ubimkvol (pid: 3994, ti=eb4d0000 task=ec01d9b0 task.ti=eb4d0000)
[ 716.486195] Stack:
[ 716.486199] e755f000 eb4d1d2c c11cad11 eb4d1d34 eb543c00 eb5ea000 00000000 eb4d1e20
[ 716.486215] eb4d1e30 fc195412 e755f000 fc1adf01 eb5ea26c 00000002 0000009e eb5ea480
[ 716.486232] 00000002 e755f22c e755f2ac e755f000 eb4d1d74 2a000000 01000000 00000000
[ 716.486248] Call Trace:
[ 716.486257] [<c11cad11>] ? sysfs_create_file+0x21/0x30
[ 716.486266] [<fc195412>] ubi_create_volume+0x4b2/0x790 [ubi]
[ 716.486277] [<fc19967a>] ubi_cdev_ioctl+0x5da/0xac0 [ubi]
[ 716.486285] [<c117202a>] ? link_path_walk+0x5a/0x7d0
[ 716.486294] [<fc1990a0>] ? vol_cdev_ioctl+0x440/0x440 [ubi]
[ 716.486842] [<c1177e12>] do_vfs_ioctl+0x82/0x5b0
[ 716.487703] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.488564] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.489422] [<c1171ced>] ? final_putname+0x1d/0x40
[ 716.489891] [<c1171eb4>] ? putname+0x24/0x40
[ 716.489891] [<c1167239>] ? do_sys_open+0x169/0x1d0
[ 716.489891] [<c11783b0>] sys_ioctl+0x70/0x80
[ 716.489891] [<c16205cd>] sysenter_do_call+0x12/0x38
[ 716.489891] Code: ac 00 00 00 03 bb c8 04 00 00 f7 c7 01 00 00 00 0f 85 ee 00 00 00 f7 c7 02 00 00 00 0f 85 ca 00 00 00 89 c1 31 d2 c1 e9 02 a8 02 <f3> a5 74 0b 0f b7 16 66 89 17 ba 02 00 00 00 a8 01 74 07 0f b6
[ 716.489891] EIP: [<fc1936e7>] ubi_change_vtbl_record+0x87/0x1c0 [ubi] SS:ESP 0068:eb4d1d20
[ 716.489891] CR2: 00000000fc132158
[ 716.516453] ---[ end trace 473b15a7780e19ea ]---
It seems that the kernel wanted to access a wrong page. Now,
The Oops code 0002 tells me that it occurred while trying to read something in user-mode.
The Instruction Pointer is at ubi_change_vtbl_record, which means the offending instruction is located in this function.
I can deduce the path that lead to the faulting function from the
call trace (an ioctl launched from process ubimkvol)
From there, Is the "stack" a dump of the raw stack of the task ? I can see that some values mentioned are also function addresses found in the call trace. Then, I got fancy looking values like EAX, EBX ... DR7. I think they are CPU registers but still, I don't know what they really are.
Finally, the following line gets me lost :
[ 716.485986] *pdpt = 00000000019e6001 *pde = 000000002c558067 *pte = 0000000000000000
What are pdpt, pde and pte ? I feel they are information about the page fault but I could not retrieve further information after some googling around.

Yes, EAX, etc. are 32-bit x86 processor registers. pdpt (page directory pointer table), pde (page directory entry), and pte (page table entry) are all paging structures.
IP (also EIP for 32-bit or RIP for 64-bit processors) is the instruction pointer at the time of the Oops.
The stack is the raw stack for this processor. Each processor will have its own stack. Note that on this architecture the stack grows down (addresses start with 0xfxxxxxx).

Correct me if I am wrong but,
OOPS 0002 means no page found when writing in kernel mode:
bit 0 == 0 means no page found, 1 means a protection fault
bit 1 == 0 means read, 1 means write
bit 2 == 0 means kernel, 1 means user-mode

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

On rhel8 os When I run a program the kernel crash, how can I determine which line of code reports the error ？ - linux-kernel

Related

How to use decode_stacktrace.sh?

Where can I load a GPIO module at the earliest?

hadoop cause system crash with "soft lock" and "hard lock"

"unable to handle kernel null pointer derefernce at null" after trying to modprode driver

What do these Linux Kernel Oops fields mean?

Categories

Resources