It confuses me that there was large inactive file page cache 734812kB and dirty cache 800088kB seemed could be reclaimed, why did oom-killer happen? .
The vm.swappiness was set 0, as says in linux document, 0 does not mean avoiding swap completely, and I have found swap space used reached 300MB with swappiness=0 with the same setting and OS on other server.
OS info:
CentOS 6.4 kernel: 2.6.32-358.23.2.el6.x86_64
oom-killer log as following:
Aug 26 14:34:48 withivan.me kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Aug 26 14:34:48 withivan.me kernel: java cpuset=/ mems_allowed=0
Aug 26 14:34:48 withivan.me kernel: Pid: 28505, comm: java Not tainted 2.6.32-358.23.2.el6.x86_64 #1
Aug 26 14:34:48 withivan.me kernel: Call Trace:
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810cb641>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111ce40>] ? dump_header+0x90/0x1b0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8121d4ec>] ? security_real_capable_noaudit+0x3c/0x70
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d2c2>] ? oom_kill_process+0x82/0x2a0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d201>] ? select_bad_process+0xe1/0x120
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d700>] ? out_of_memory+0x220/0x3c0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8112c3dc>] ? __alloc_pages_nodemask+0x8ac/0x8d0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81160c6a>] ? alloc_pages_current+0xaa/0x110
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8148d667>] ? tcp_sendmsg+0x677/0xa20
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81435f33>] ? sock_sendmsg+0x123/0x150
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810aa43e>] ? futex_wake+0x10e/0x120
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810ac3a0>] ? do_futex+0x100/0xb60
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8119cfdf>] ? destroy_inode+0x2f/0x60
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81436249>] ? sys_sendto+0x139/0x190
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8103b8cc>] ? kvm_clock_read+0x1c/0x20
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810a1507>] ? getnstimeofday+0x57/0xe0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810a15fa>] ? do_gettimeofday+0x1a/0x50
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Aug 26 14:34:48 withivan.me kernel: Mem-Info:
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32 per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 186, btch: 31 usd: 32
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 186, btch: 31 usd: 1
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 186, btch: 31 usd: 4
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 186, btch: 31 usd: 38
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 186, btch: 31 usd: 57
Aug 26 14:34:48 withivan.me kernel: active_anon:1697553 inactive_anon:373583 isolated_anon:0
Aug 26 14:34:48 withivan.me kernel: active_file:174263 inactive_file:199171 isolated_file:0
Aug 26 14:34:48 withivan.me kernel: unevictable:0 dirty:216860 writeback:1 unstable:0
Aug 26 14:34:48 withivan.me kernel: free:35470 slab_reclaimable:14993 slab_unreclaimable:6945
Aug 26 14:34:48 withivan.me kernel: mapped:3423 shmem:45 pagetables:5263 bounce:0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA free:15740kB min:148kB low:184kB high:220kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15344kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 3512 10077 10077
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32 free:61200kB min:34800kB low:43500kB high:52200kB active_anon:2479864kB inactive_anon:621800kB active_file:39752kB inactive_file:61872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596500kB mlocked:0kB dirty:67352kB writeback:0kB mapped:20kB shmem:0kB slab_reclaimable:21672kB slab_unreclaimable:1952kB kernel_stack:3792kB pagetables:4344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:94464 all_unreclaimable? no
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 0 6565 6565
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal free:64940kB min:65048kB low:81308kB high:97572kB active_anon:4310348kB inactive_anon:872532kB active_file:657300kB inactive_file:734812kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6722560kB mlocked:0kB dirty:800088kB writeback:4kB mapped:13672kB shmem:180kB slab_reclaimable:38300kB slab_unreclaimable:25828kB kernel_stack:4568kB pagetables:16708kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:605088 all_unreclaimable? no
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 0 0 0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA: 3*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15740kB
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32: 1247*4kB 1189*8kB 969*16kB 379*32kB 92*64kB 12*128kB 1*256kB 8*512kB 5*1024kB 1*2048kB 0*4096kB = 61076kB
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal: 2047*4kB 1672*8kB 781*16kB 309*32kB 46*64kB 3*128kB 1*256kB 20*512kB 7*1024kB 0*2048kB 0*4096kB = 64940kB
Aug 26 14:34:48 withivan.me kernel: 379124 total pagecache pages
Aug 26 14:34:48 withivan.me kernel: 4685 pages in swap cache
Aug 26 14:34:48 withivan.me kernel: Swap cache stats: add 167082, delete 162397, find 114795/130707
Aug 26 14:34:48 withivan.me kernel: Free swap = 4166416kB
Aug 26 14:34:48 withivan.me kernel: Total swap = 4194296kB
Aug 26 14:34:48 withivan.me kernel: 2621439 pages RAM
Aug 26 14:34:48 withivan.me kernel: 89408 pages reserved
Aug 26 14:34:48 withivan.me kernel: 384993 pages shared
Aug 26 14:34:48 withivan.me kernel: 2116876 pages non-shared
Related
I am using cgroups with libvirt to limit the memory a group of qemu-kvm guests can use on a custom linux kernel 4.7.8. After doing a couple of tests with it I started seeing kernel panics after the oom-killer is called when that libvirt cgroup runs out of memory. That happens even when I set the cgroup memory way below the total and the system is idle besides running vms (loads of memory left for other tasks out of the cgroup). For the record, my system has 32GB and I was using 20GB for the guests cgroups.
Here is part of the the crash log (it is veeery long but I can link to the full log later):
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727982] Call Trace:
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727975] 0000000000000296 ffff8801f1638dc0 ffffffff811b50f6 ffff88083dff9800
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728007] [<ffffffff811608ce>] oom_kill_process+0xc2/0x487
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728014] [<ffffffff812e22c2>] ? selinux_capable+0x1f/0x21
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727994] [<ffffffff811b50f6>] ? mem_cgroup_select_victim_node+0x17d/0x1ac
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728018] [<ffffffff812d95ac>] ? security_capable_noaudit+0x2b/0x46
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728028] [<ffffffff811b235b>] ? mem_cgroup_iter+0x250/0x265
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728025] [<ffffffff811afd1c>] ? css_put+0x18/0x1a
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728031] [<ffffffff811607c1>] ? oom_badness+0x10f/0x15a
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728036] [<ffffffff811af46d>] ? get_mem_cgroup_from_mm+0x52/0x71
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728039] [<ffffffff811b4039>] mem_cgroup_out_of_memory+0x2c7/0x311
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728054] [<ffffffff81160ffd>] pagefault_out_of_memory+0x1f/0x76
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728003] [<ffffffff81171d2c>] ? try_to_free_mem_cgroup_pages+0x10d/0x16a
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728061] [<ffffffff810999bf>] mm_fault_error+0x66/0x103
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727966] CPU: 1 PID: 15433 Comm: qemu-system-x86 Tainted: G O 4.7.8 #25
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728064] [<ffffffff81099e4c>] __do_page_fault+0x3f0/0x4d8
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728071] [<ffffffff8109a043>] do_page_fault+0x26/0x2f
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728068] [<ffffffff81191a7d>] ? SyS_mremap+0x46c/0x4cf
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728074] [<ffffffff81a3dca8>] page_fault+0x28/0x30
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728023] [<ffffffff8111469e>] ? css_next_descendant_pre+0x32/0x53
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728051] [<ffffffff811b14d2>] ? mem_cgroup_count_precharge_pte_range+0xe8/0xe8
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727999] [<ffffffff8115fe5f>] dump_header+0x5e/0x286
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728058] [<ffffffff8118e399>] ? vma_adjust+0x4b5/0x58b
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728047] [<ffffffff811b43bb>] mem_cgroup_oom_synchronize+0x1ed/0x27b
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728010] [<ffffffff811b53f5>] ? task_in_mem_cgroup+0xc9/0xd6
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727989] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.728044] [<ffffffff810da147>] ? finish_wait+0x65/0x70
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727979] ffff8803259d7cf8 ffff8803259d7b38 ffffffff8115fe5f 024200ca00000003
Message from syslogd# at Fri May 11 15:53:57 2018 ...
kernel: [ 6380.727971] ffff880802cbb700 ffff8803259d7a08 ffffffff81336719 ffff88081948fc00
May 11 15:53:57 kernel: [ 6380.727959] qemu-system-x86 invoked oom-killer: gfp_mask=0x24000c0(GFP_KERNEL), order=0, oom_score_adj=0
May 11 15:53:57 kernel: [ 6380.727966] CPU: 1 PID: 15433 Comm: qemu-system-x86 Tainted: G O 4.7.8 #25
May 11 15:53:57 kernel: [ 6380.727968] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
May 11 15:53:57 kernel: [ 6380.727971] ffff880802cbb700 ffff8803259d7a08 ffffffff81336719 ffff88081948fc00
May 11 15:53:57 kernel: [ 6380.727975] 0000000000000296 ffff8801f1638dc0 ffffffff811b50f6 ffff88083dff9800
May 11 15:53:57 kernel: [ 6380.727979] ffff8803259d7cf8 ffff8803259d7b38 ffffffff8115fe5f 024200ca00000003
May 11 15:53:57 kernel: [ 6380.727982] Call Trace:
May 11 15:53:57 kernel: [ 6380.727989] [<ffffffff81336719>] dump_stack+0x65/0x8c
May 11 15:53:57 kernel: [ 6380.727994] [<ffffffff811b50f6>] ? mem_cgroup_select_victim_node+0x17d/0x1ac
May 11 15:53:57 kernel: [ 6380.727999] [<ffffffff8115fe5f>] dump_header+0x5e/0x286
May 11 15:53:57 kernel: [ 6380.728003] [<ffffffff81171d2c>] ? try_to_free_mem_cgroup_pages+0x10d/0x16a
May 11 15:53:57 kernel: [ 6380.728007] [<ffffffff811608ce>] oom_kill_process+0xc2/0x487
May 11 15:53:57 kernel: [ 6380.728010] [<ffffffff811b53f5>] ? task_in_mem_cgroup+0xc9/0xd6
May 11 15:53:57 kernel: [ 6380.728014] [<ffffffff812e22c2>] ? selinux_capable+0x1f/0x21
May 11 15:53:57 kernel: [ 6380.728018] [<ffffffff812d95ac>] ? security_capable_noaudit+0x2b/0x46
May 11 15:53:57 kernel: [ 6380.728023] [<ffffffff8111469e>] ? css_next_descendant_pre+0x32/0x53
May 11 15:53:57 kernel: [ 6380.728025] [<ffffffff811afd1c>] ? css_put+0x18/0x1a
May 11 15:53:57 kernel: [ 6380.728028] [<ffffffff811b235b>] ? mem_cgroup_iter+0x250/0x265
May 11 15:53:57 kernel: [ 6380.728031] [<ffffffff811607c1>] ? oom_badness+0x10f/0x15a
May 11 15:53:57 kernel: [ 6380.728036] [<ffffffff811af46d>] ? get_mem_cgroup_from_mm+0x52/0x71
May 11 15:53:57 kernel: [ 6380.728039] [<ffffffff811b4039>] mem_cgroup_out_of_memory+0x2c7/0x311
May 11 15:53:57 kernel: [ 6380.728044] [<ffffffff810da147>] ? finish_wait+0x65/0x70
May 11 15:53:57 kernel: [ 6380.728047] [<ffffffff811b43bb>] mem_cgroup_oom_synchronize+0x1ed/0x27b
May 11 15:53:57 kernel: [ 6380.728051] [<ffffffff811b14d2>] ? mem_cgroup_count_precharge_pte_range+0xe8/0xe8
May 11 15:53:57 kernel: [ 6380.728054] [<ffffffff81160ffd>] pagefault_out_of_memory+0x1f/0x76
May 11 15:53:57 kernel: [ 6380.728058] [<ffffffff8118e399>] ? vma_adjust+0x4b5/0x58b
May 11 15:53:57 kernel: [ 6380.728061] [<ffffffff810999bf>] mm_fault_error+0x66/0x103
May 11 15:53:57 kernel: [ 6380.728064] [<ffffffff81099e4c>] __do_page_fault+0x3f0/0x4d8
May 11 15:53:57 kernel: [ 6380.728068] [<ffffffff81191a7d>] ? SyS_mremap+0x46c/0x4cf
May 11 15:53:57 kernel: [ 6380.728071] [<ffffffff8109a043>] do_page_fault+0x26/0x2f
May 11 15:53:57 kernel: [ 6380.728074] [<ffffffff81a3dca8>] page_fault+0x28/0x30
May 11 15:53:57 kernel: [ 6380.728077] Task in /machine/ubc3.libvirt-qemu killed as a result of limit of /machine
May 11 15:53:57 kernel: [ 6380.728082] memory: usage 31457280kB, limit 31457280kB, failcnt 0
May 11 15:53:57 kernel: [ 6380.728084] memory+swap: usage 31457280kB, limit 31457280kB, failcnt 129072
May 11 15:53:57 kernel: [ 6380.728086] kmem: usage 2200kB, limit 9007199254740988kB, failcnt 0
May 11 15:53:57 kernel: [ 6380.728088] Memory cgroup stats for /machine: cache:0KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728102] Memory cgroup stats for /machine/ubc4.libvirt-qemu: cache:12KB rss:0KB rss_huge:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:8KB active_file:4KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728114] Memory cgroup stats for /machine/ubc1.libvirt-qemu: cache:32KB rss:6312484KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:6312488KB inactive_file:12KB active_file:0KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728126] Memory cgroup stats for /machine/ubc2.libvirt-qemu: cache:60KB rss:8408808KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8408804KB inactive_file:20KB active_file:20KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728137] Memory cgroup stats for /machine/ubc3.libvirt-qemu: cache:60KB rss:8410240KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8410244KB inactive_file:24KB active_file:16KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728149] Memory cgroup stats for /machine/ubc4.libvirt-qemu: cache:88KB rss:8323296KB rss_huge:0KB mapped_file:20KB dirty:0KB writeback:0KB swap:0KB inactive_anon:16KB active_anon:8323260KB inactive_file:32KB active_file:36KB unevictable:0KB
May 11 15:53:57 kernel: [ 6380.728161] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
May 11 15:53:57 kernel: [ 6380.728189] [ 5493] 0 5493 1626653 1580390 3170 9 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728193] [15427] 0 15427 2151137 2104431 4195 11 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728197] [17683] 0 17683 2151136 2104812 4195 11 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728202] [18273] 0 18273 2152955 2083131 4156 12 0 0 qemu-system-x86
May 11 15:53:57 kernel: [ 6380.728205] Memory cgroup out of memory: Kill process 17683 (qemu-system-x86) score 260 or sacrifice child
May 11 15:53:57 kernel: [ 6380.728217] Killed process 17683 (qemu-system-x86) total-vm:8604544kB, anon-rss:8409956kB, file-rss:9272kB, shmem-rss:20kB
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379203] CPU: 4 PID: 121 Comm: kworker/4:1 Tainted: G O 4.7.8 #25
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379206] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379222] 0000000000000296 000000000000039a 0000000000000000 ffffffff81f4b6bb
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379227] 0000000000000000 ffff8808038f7748 ffffffff810a938a 000000091d19fce0
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379232] Call Trace:
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379244] [<ffffffff810a938a>] __warn+0xdc/0xf7
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379239] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379254] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379272] [<ffffffff810353e7>] drop_spte+0x15/0xa4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379267] [<ffffffff810d3d7c>] ? update_group_capacity+0x25/0x1d0
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379282] [<ffffffff81071dbe>] ? sched_clock+0x9/0xd
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379287] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379293] [<ffffffff81069828>] ? __switch_to+0x458/0x4ea
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379298] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379304] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379316] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379321] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379327] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0xb
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379346] [<ffffffff81349bab>] ? percpu_ref_kill_and_confirm+0x60/0x65
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379337] [<ffffffff811a600f>] __mmu_notifier_release+0x4d/0xe3
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379341] [<ffffffff811ab39c>] ? kfree+0x167/0x178
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379357] [<ffffffff811f4e9b>] ? exit_aio+0xc6/0xd5
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379367] [<ffffffff810a6f30>] __mmput+0x19/0xbc
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379371] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379376] [<ffffffff810a6fe3>] mmput_async_fn+0x10/0x12
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379387] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379407] [<ffffffff810cf9d4>] ? default_wake_function+0xd/0xf
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379392] [<ffffffff810c0ace>] worker_thread+0x36d/0x43c
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379397] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379413] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379417] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379422] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379426] [<ffffffff810c49e8>] kthread+0xc8/0xd2
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379436] [<ffffffff81a3bf3f>] ret_from_fork+0x1f/0x40
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379431] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379441] [<ffffffff810c4920>] ? kthread_freezable_should_stop+0x61/0x61
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379457] Modules linked in: bridge stp llc ipip ip_gre vfio_iommu_type1 vfio_pci vfio vfio_virqfd qcserial qmi_wwan usbnet cdc_wdm clear_stats(O) fusion(O) gpio_pca953x i2c_i801 i2c_acpi_sbus(O) gpio_exar e1000e
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379481] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379489] ffffffff81f4b6bb ffff8808038f7708 ffffffff81336719 ffff880800294f00
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379486] Workqueue: events mmput_async_fn
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379494] 0000000000000296 dead000000000100 0000000000000000 ffffffff81f4b6bb
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379499] 0000000000000000 ffff8808038f7748 ffffffff810a938a 0000000900000100
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379504] Call Trace:
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379523] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379259] [<ffffffff810d258f>] ? check_preempt_wakeup+0x115/0x1b4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379263] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379527] [<ffffffff810d258f>] ? check_preempt_wakeup+0x115/0x1b4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379351] [<ffffffff8118cfb3>] exit_mmap+0x22/0x102
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379331] [<ffffffff8101a7f5>] kvm_mmu_notifier_release+0x2e/0x41
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379536] [<ffffffff810d3d7c>] ? update_group_capacity+0x25/0x1d0
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379541] [<ffffffff810353e7>] drop_spte+0x15/0xa4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379550] [<ffffffff81071dbe>] ? sched_clock+0x9/0xd
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379555] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379249] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379532] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379565] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379382] [<ffffffff810c0620>] process_one_work+0x212/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379570] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379580] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379585] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379590] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0xb
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379362] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379595] [<ffffffff8101a7f5>] kvm_mmu_notifier_release+0x2e/0x41
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379518] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379609] [<ffffffff81349bab>] ? percpu_ref_kill_and_confirm+0x60/0x65
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379604] [<ffffffff811ab39c>] ? kfree+0x167/0x178
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379619] [<ffffffff811f4e9b>] ? exit_aio+0xc6/0xd5
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379614] [<ffffffff8118cfb3>] exit_mmap+0x22/0x102
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379624] [<ffffffff810d04da>] ? sched_clock_cpu+0x21/0xb4
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379637] [<ffffffff810a6fe3>] mmput_async_fn+0x10/0x12
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379632] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379642] [<ffffffff810c0620>] process_one_work+0x212/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379652] [<ffffffff810c0ace>] worker_thread+0x36d/0x43c
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379647] [<ffffffff81a396c6>] ? schedule+0x98/0xa6
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379277] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379600] [<ffffffff811a600f>] __mmu_notifier_release+0x4d/0xe3
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379667] [<ffffffff810cf9d4>] ? default_wake_function+0xd/0xf
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379672] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379661] [<ffffffff810cf9ab>] ? try_to_wake_up+0x240/0x25c
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379681] [<ffffffff810c0761>] ? process_one_work+0x353/0x353
Message from syslogd# at Fri May 11 15:53:58 2018 ...
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379574] [<ffffffff814585e7>] ? extract_buf+0xf7/0x106
kernel: [ 6381.379696] [<ffffffff81a3bf3f>] ret_from_fork+0x1f/0x40
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379686] [<ffffffff810c49e8>] kthread+0xc8/0xd2
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379715] Modules linked in: bridge stp llc ipip ip_gre vfio_iommu_type1 vfio_pci vfio vfio_virqfd qcserial qmi_wwan usbnet cdc_wdm clear_stats(O) fusion(O) gpio_pca953x i2c_i801 i2c_acpi_sbus(O) gpio_exar e1000e
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379508] [<ffffffff81336719>] dump_stack+0x65/0x8c
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379701] [<ffffffff810c4920>] ? kthread_freezable_should_stop+0x61/0x61
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379545] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379736] CPU: 4 PID: 121 Comm: kworker/4:1 Tainted: G W O 4.7.8 #25
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379656] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379739] Hardware name: ADLINK TECHNOLOGY Inc. Express-SL/, BIOS 1.22.10.KA08 05/03/2017
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379743] Workqueue: events mmput_async_fn
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379760] Call Trace:
Message from syslogd# at Fri May 11 15:53:58 2018 ...
kernel: [ 6381.379750] 0000000000000296 dead000000000100 0000000000000000 ffffffff81f4b6bb
.
.
.
May 11 15:54:13 kernel: [ 6396.890938] Workqueue: events mmput_async_fn
May 11 15:54:13 kernel: [ 6396.890940] ffffffff81f4b6bb ffff8808038f7708 ffffffff81336719 0000000000294f00
May 11 15:54:13 kernel: [ 6396.890945] 0000000000000296 ffffea0002d58220 0000000000000000 ffffffff81f4b6bb
May 11 15:54:13 kernel: [ 6396.890950] 0000000000000000 ffff8808038f7748 ffffffff810a938a 0000000902d58220
May 11 15:54:13 kernel: [ 6396.890954] Call Trace:
May 11 15:54:13 kernel: [ 6396.890958] [<ffffffff81336719>] dump_stack+0x65/0x8c
May 11 15:54:13 kernel: [ 6396.890963] [<ffffffff810a938a>] __warn+0xdc/0xf7
May 11 15:54:13 kernel: [ 6396.890968] [<ffffffff810a93bd>] warn_slowpath_null+0x18/0x1a
May 11 15:54:13 kernel: [ 6396.890972] [<ffffffff81034a38>] mmu_spte_clear_track_bits+0xe6/0x147
May 11 15:54:13 kernel: [ 6396.890977] [<ffffffff81032058>] ? gfn_to_rmap+0x27/0x5a
May 11 15:54:13 kernel: [ 6396.890981] [<ffffffff810353e7>] drop_spte+0x15/0xa4
May 11 15:54:13 kernel: [ 6396.890986] [<ffffffff81035513>] mmu_page_zap_pte+0x48/0xc1
May 11 15:54:13 kernel: [ 6396.890990] [<ffffffff8112b110>] ? kprobe_flush_task+0x8d/0xe8
May 11 15:54:13 kernel: [ 6396.890995] [<ffffffff81035703>] kvm_mmu_prepare_zap_page+0x177/0x2ef
May 11 15:54:13 kernel: [ 6396.891000] [<ffffffff810cee3a>] ? finish_task_switch+0x19f/0x1d5
May 11 15:54:13 kernel: [ 6396.891005] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
May 11 15:54:13 kernel: [ 6396.891009] [<ffffffff81a39517>] ? __schedule+0x56f/0x594
May 11 15:54:13 kernel: [ 6396.891014] [<ffffffff81a39562>] ? preempt_schedule_common+0x26/0x31
May 11 15:54:13 kernel: [ 6396.891020] [<ffffffff810363a7>] kvm_mmu_invalidate_zap_all_pages+0xcc/0x104
May 11 15:54:13 kernel: [ 6396.891024] [<ffffffff813498c2>] ? percpu_ref_put+0x2e/0x2e
May 11 15:54:13 kernel: [ 6396.891029] [<ffffffff8102800e>] kvm_arch_flush_shadow_all+0x9/0x
I decided to try disabling the oom-killer on the cgroup and exhaust the cgroup memory to see what happens, and I was expecting the guests to hang until I manually killed one of them (as it is described in cgroup documentation). But surprisingly one of the guests(randomly it seems) gets killed everytime I repeat the test. I am quite confused because if the oom-killer is disabled what is killing the processes?
Here is the messages I get from the kernel on that case and also in the case when the oom-killer is enabled but the system does not crash:
kernel: [ 1143.934857] cache: task_struct(10:ubc2.libvirt-qemu),
object size: 3520, buffer size: 3520, default order: 3, min order: 0
kernel: [ 1143.934860] node 0: slabs: 3, objs: 27, free: 0
kernel: [ 1143.944535] SLUB: Unable to allocate memory on node -1,
gfp=0x24000c0(GFP_KERNEL)
kernel: [ 1143.944541] cache: cred_jar(10:ubc2.libvirt-qemu), object
size: 168, buffer size: 192, default order: 0, min order: 0
kernel: [ 1143.944545] node 0: slabs: 2, objs: 42, free: 0
From my observations, it seems that there is something killing the processes before the oom-killer is called (when it is enabled), in which case the system will recover fine, but when the oom-killer does get called the system crashes and the machine needs a reboot.
So my questions are:
What can be causing the oom-killer to crash the machine?
What is killing the guests when the oom-killer is disabled?
It would be great if anyone has any clues on this matter!
Thanks!
Note: I am using kernel v4.7.8 built with buildroot and compiled with uClibc on a x86 platform. Also, no swap on this system.
I was trying to run DPDK KNI application with dpdk version 16.07.2,
For that I first unbinded the ports from ixgbe and binded them to igb_uio module with following command
echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id
I compiled the kni application for target machine with Linux version 4.4.20 (sushila#dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017
and when I ran the application it hung with the following message
Feb 28 10:09:37 (none) user.alert kernel: [ 87.029554] BUG: unable to handle kernel paging request at 0000077e1d012900
Feb 28 10:09:37 (none) user.alert kernel: [ 87.029695] IP: [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.029801] PGD 0
Feb 28 10:09:37 (none) user.warn kernel: [ 87.029889] Oops: 0000 [#1] SMP
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030010] Modules linked in: rte_kni(O) igb_uio(O)
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030167] CPU: 7 PID: 709 Comm: kni_single Tainted: G IO 4.4.20 #1
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030242] Hardware name: /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030320] task: ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030395] RIP: 0010:[<ffffffffa0033722>] [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030517] RSP: 0018:ffff8805a7ae3d30 EFLAGS: 00010286
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030576] RAX: 0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030639] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030701] RBP: ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030766] R10: 00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030829] R13: ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030893] FS: 0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030971] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031031] CR2: 0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031094] Stack:
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031148] ffff88062fcf5940 ffff8805a8ad8560 0000000000000000 ffff88060000054e
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031367] 0000077e1d012900 00000000b8c12800 00000000b8c11ec0 00000000b8c11580
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031587] 00000000b8c10c40 00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031811] Call Trace:
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031871] [<ffffffffa00343af>] kni_net_rx+0xf/0x20 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031937] [<ffffffffa0032f05>] kni_thread_single+0x45/0xb0 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032004] [<ffffffffa0032ec0>] ? kni_init_net+0x50/0x50 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032067] [<ffffffff8107b7cb>] kthread+0xdb/0x100
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032125] [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032186] [<ffffffff81834c2f>] ret_from_fork+0x3f/0x70
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032246] [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032306] Code: 48 89 85 d0 fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e1 48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6 01 e8 e4 e8 11 e1 e9 5e fe ff ff
Feb 28 10:09:37 (none) user.alert kernel: [ 87.034742] RIP [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034844] RSP <ffff8805a7ae3d30>
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034900] CR2: 0000077e1d012900
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034956] ---[ end trace 5b31765eb0372d51 ]---
In there I saw it was failing somewhere in kni_net_rx_normal() function of kni_net.c file.
So I narrowed down the line of code where it was failing and it came to line 169 where the memcpy happens
Next I tried to print some addresses in that function and it gave me
kva data addresses: data_kva 0000077e1d012900 kva->buff_add 00007f7e1d012880 kva->data_off 128 kni->mbuf_va (null) and kni->mbuf_kva ffff880000000000
Next I tried to see if I can print the data in data_kva address and it failed there, so it looks like it fails when I try to access data_kva # 0000077e1d012900, I guess address is wrong, I dont know why, Can you give me some idea on this or some things to try out to debug the problem.
I want to send sk_buff by "dev_queue_xmit", when I just send 2 packets, the network card may be hang.
I want to know how to debug this condition.
the /var/log/messages is:
[root#10g-host2 test]# tail -f /var/log/messages
Sep 29 10:38:22 10g-host2 acpid: waiting for events: event logging is off
Sep 29 10:38:23 10g-host2 acpid: client connected from 2018[68:68]
Sep 29 10:38:23 10g-host2 acpid: 1 client rule loaded
Sep 29 10:38:24 10g-host2 automount[2210]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Sep 29 10:38:24 10g-host2 mcelog: failed to prefill DIMM database from DMI data
Sep 29 10:38:24 10g-host2 xinetd[2246]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Sep 29 10:38:24 10g-host2 xinetd[2246]: Started working: 0 available services
Sep 29 10:38:25 10g-host2 abrtd: Init complete, entering main loop
Sep 29 10:39:41 10g-host2 kernel: vmalloc mmap_buf=ffffc90016e29000 mmap_size=4096
Sep 29 10:39:41 10g-host2 kernel: insmod module wsmmap successfully!
Sep 29 10:39:49 10g-host2 kernel: mmap_buf + 1024 is ffffc90016e29400
Sep 29 10:39:49 10g-host2 kernel: data ffffc90016e2942a, len is 42
Sep 29 10:39:49 10g-host2 kernel: udp data ffffc90016e29422
Sep 29 10:39:49 10g-host2 kernel: ip data ffffc90016e2940e
Sep 29 10:39:49 10g-host2 kernel: eth data ffffc90016e29400
Sep 29 10:39:49 10g-host2 kernel: h_source is ffffc90016e29406, dev_addr is ffff880c235c4750, len is 6result is 0
Sep 29 10:39:50 10g-host2 kernel: mmap_buf + 1024 is ffffc90016e29400
Sep 29 10:39:50 10g-host2 kernel: data ffffc90016e2942a, len is 42
Sep 29 10:39:50 10g-host2 kernel: udp data ffffc90016e29422
Sep 29 10:39:50 10g-host2 kernel: ip data ffffc90016e2940e
Sep 29 10:39:50 10g-host2 kernel: eth data ffffc90016e29400
Sep 29 10:39:50 10g-host2 kernel: h_source is ffffc90016e29406, dev_addr is ffff880c235c4750, len is 6result is 0
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
Sep 29 10:39:52 10g-host2 kernel: Tx Queue <11>
Sep 29 10:39:52 10g-host2 kernel: TDH, TDT <0>, <5>
Sep 29 10:39:52 10g-host2 kernel: next_to_use <5>
Sep 29 10:39:52 10g-host2 kernel: next_to_clean <0>
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: tx_buffer_info[next_to_clean]
Sep 29 10:39:52 10g-host2 kernel: time_stamp <fffd3dd8>
Sep 29 10:39:52 10g-host2 kernel: jiffies <fffd497f>
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: tx hang 1 detected on queue 11, resetting adapter
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Reset adapter
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: master disable timed out
Sep 29 10:39:53 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: detected SFP+: 5
Sep 29 10:39:54 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
some information of my computer is:
ethtool -i eth2
driver: ixgbe
version: 3.21.2
firmware-version: 0x1bab0001
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.5 (Final)
Release: 6.5
Codename: Final
kernel version is: 2.6.32-431.el6.x86_64
Thank you for your help.
I use vmalloc() which alloc the memory for skb->data, so this let NIC down. I fix it by use kmalloc().
I am running Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-32-generic x86_64) using Vagrant [vagrant:amd64 1:1.6.3] (VirtualBox). My host system is the same OS version.
After several days of flawless (or so it seems) operation, the vagrant box will stop responding... or more specifically:
My Supervisor managed services no longer respond (webserver etc...).
I can vagrant ssh into the box and navigate around most directories
Anything interacting with the shared /vagrant directory will not respond (including sudo supervisorctl).
Running vagrant halt from the host machine will fail to halt peacefully and will eventually forcefully be halted.
Re-upping the box afterwards will then give several more happy days.
The only thing that I can see (that might be of relevance) in /var/log/syslog is the following:
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.711678] BUG: unable to handle kernel paging request at 0000006c0000003f
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714065] IP: [<ffffffffa00a10f6>] vbglPhysHeapExcludeBlock+0x16/0x60 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] PGD 3c7c7067 PUD 0
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Oops: 0002 [#1] SMP
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Modules linked in: rpcsec_gss_krb5 nfsv4 vboxsf(OF) nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache dm_crypt ppdev serio_raw parport_pc vboxguest(OF) parport ahci psmouse libahci e1000
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] CPU: 0 PID: 1632 Comm: vminfo Tainted: GF O 3.13.0-32-generic #57-Ubuntu
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] task: ffff88003c202fe0 ti: ffff88003cd20000 task.ti: ffff88003cd20000
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RIP: 0010:[<ffffffffa00a10f6>] [<ffffffffa00a10f6>] vbglPhysHeapExcludeBlock+0x16/0x60 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RSP: 0018:ffff88003cd21d78 EFLAGS: 00010206
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RAX: 0000006c00000027 RBX: ffff88003ce5014c RCX: ffff88003ce60000
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RDX: ffff88003ce50124 RSI: ffff88003ce50124 RDI: ffff88003ce5016c
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RBP: ffff88003cd21d78 R08: 0000000000000292 R09: ffff88003ce5014c
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] R10: ffff88003c6fcc10 R11: 0000000000000246 R12: ffff88003ce50124
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] R13: ffff88003ce5014c R14: 0000000000000020 R15: 0000000000000000
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] FS: 00007fc8b5475700(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] CR2: 0000006c0000003f CR3: 000000003ccf0000 CR4: 00000000000006f0
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Stack:
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] ffff88003cd21d98 ffffffffa00a1609 ffff88003ce5014c ffff88003cd21e70
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] ffff88003cd21db0 ffffffffa009faae ffff88003cd21e78 ffff88003cd21e38
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] ffffffffa009de5e ffff880000000000 0000000000000000 0000000000000050
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Call Trace:
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffffa00a1609>] VbglPhysHeapFree+0xc9/0xe0 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffffa009faae>] VbglGRFree+0x1e/0x30 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffffa009de5e>] VBoxGuestCommonIOCtl+0x54e/0x1b90 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff811a103b>] ? kfree+0xab/0x140
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffffa009b7ce>] vboxguestLinuxIOCtl+0x9e/0x200 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff8101b7e9>] ? sched_clock+0x9/0x10
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff8109d1ad>] ? sched_clock_local+0x1d/0x80
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff811cfd10>] do_vfs_ioctl+0x2e0/0x4c0
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff8109ddf4>] ? vtime_account_user+0x54/0x60
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff811cff71>] SyS_ioctl+0x81/0xa0
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] [<ffffffff8172c87f>] tracesys+0xe1/0xe6
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] Code: 05 00 00 c7 05 c8 05 02 00 00 00 00 00 5d c3 66 0f 1f 44 00 00 66 66 66 66 90 48 8b 47 10 55 48 89 e5 48 85 c0 74 0c 48 8b 57 18 <48> 89 50 18 488 8b 47 10 48 8b 57 18 48 85 d2 74 19 48 89 42 10
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RIP [<ffffffffa00a10f6>] vbglPhysHeapExcludeBlock+0x16/0x60 [vboxguest]
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] RSP <ffff88003cd21d78>
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.714663] CR2: 0000006c0000003f
Aug 20 22:45:31 vagrant-ubuntu-trusty-64 kernel: [226490.828096] ---[ end trace 1eefe230ded2b9f8 ]---
Happy to supply any more information that people request that I've missed out.
If you look at http://www.vagrantbox.es/ then you will notice that virtualbox guest additions isn't included in the VM, which is the cause for some of the problems you are having.
You might be able to install Virtualbox Guest Additions using Puppet or what ever provisioning option you choose. I imagine you will have to manually setup the /vagrant shared folder.
This is for a class, but we are stumped. We are currently writing a USB driver for a Logitech camera that uses USBCore. What's happening is we load the module, then when we connect the USB Camera, and the kernel crashes and gives us a kernel trace (below). After a bit of debugging, we are pretty sure it crashes on usb_register_dev within the probe function, but we can't figure out why. We were hoping that someone would have any helpful suggestions or to send us on the right path. We're not asking for answers, just guidance.
We have looked at all of our variable initalizers and based on our notes and skull examples, it looks alright. Below are code snippits to the important functions and the call trace.
Kernel (Custom school, but based on 3.2.34):
Linux ETSELE 3.2.34etsele #1 SMP PREEMPT Tue Jan 22 18:22:05 EST 2013 i686 i686 i386 GNU/Linux
Init:
static int __init
usb_cam_init(void) {
int result = 0;
if ((result = usb_register(&cam_driver)))
printk("usb_register failed. Error number %d", result);
return result;
}
Probe:
static int
usb_cam_probe(struct usb_interface * intf, const struct usb_device_id * devid) {
int retval = 0;
struct usb_host_interface *interface;
struct usb_endpoint_descriptor *endpoint;
struct usb_device *dev = interface_to_usbdev(intf);
struct usb_cam *usbdev = NULL;
int n, m, altSetNum, activeInterface = -1;
printk("kmalloc\n");
usbdev = kmalloc(sizeof(struct usb_ele_cam), GFP_KERNEL); ///////////////
printk("usb_get_dev\n");
usbdev->usb_dev = usb_get_dev(dev);
usbdev->class = (struct usb_class_driver *) kmalloc(sizeof(struct usb_class_driver), GFP_KERNEL);
usbdev->class->name = "cam";
usbdev->class->fops = &cam_fops;
usbdev->class->minor_base = 0;
// usbdev->class->mode = O_RDWR;
printk("for\n");
for (n = 0; n < intf->num_altsetting; n++) {
interface = &intf->altsetting[n];
altSetNum = interface->desc.bAlternateSetting;
for (m = 0; m < interface->desc.bNumEndpoints; m++) {
endpoint = &interface->endpoint[m].desc;
if (!usbdev->bulk_in_endpointAddr && (endpoint->bEndpointAddress & USB_DIR_IN)
&& ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_BULK)) {
usbdev->bulk_in_size = endpoint->wMaxPacketSize;
usbdev->bulk_in_endpointAddr = endpoint->bEndpointAddress;
usbdev->bulk_in_buffer = kmalloc(usbdev->bulk_in_size, GFP_KERNEL);
activeInterface = altSetNum;
break;
}
}
if (activeInterface != -1)
break;
}
printk("usb_set_intfdata\n");
usb_set_intfdata(intf, usbdev);
printk("usb_register_dev\n");
usb_register_dev(intf, usbdev->class);
//printk("Not able to get a minor for this device");
printk("usb_set_interface\n");
usb_set_interface(dev, interface->desc.bInterfaceNumber, activeInterface);
return retval;
}
Structures and global variables:
struct usb_cam {
struct usb_device *usb_dev;
struct usb_interface *usb_inf;
struct usb_class_driver *class;
struct semaphore sem;
unsigned char *bulk_in_buffer;
size_t bulk_in_size;
__u8 bulk_in_endpointAddr;
__u8 bulk_out_endpointAddr;
int errors;
int open_count;
struct kref kref;
};
Logs from kern.log:
Nov 26 11:25:15 ETSELE kernel: [ 123.845972] usbcore: deregistering interface driver uvcvideo
Nov 26 11:25:32 ETSELE kernel: [ 140.234188] kmalloc
Nov 26 11:25:32 ETSELE kernel: [ 140.234192] usb_get_dev
Nov 26 11:25:32 ETSELE kernel: [ 140.234194] for
Nov 26 11:25:32 ETSELE kernel: [ 140.234196] usb_set_intfdata
Nov 26 11:25:32 ETSELE kernel: [ 140.234198] usb_register_dev
Nov 26 11:25:32 ETSELE kernel: [ 140.234450] BUG: unable to handle kernel paging request at 6d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.234506] IP: [<6d742e65>] 0x6d742e64
Nov 26 11:25:32 ETSELE kernel: [ 140.234539] *pdpt = 000000002bf84001 *pde = 0000000000000000
Nov 26 11:25:32 ETSELE kernel: [ 140.234585] Oops: 0010 [#1] PREEMPT SMP
Nov 26 11:25:32 ETSELE kernel: [ 140.234619] Modules linked in: usb_cam(O+) snd_usb_audio snd_usbmidi_lib videodev vtsspp(O) sep3_10(O) pax(O) autofs4 apwr3_1(O) bnep rfcomm bluetooth parport_pc ppdev tpm_infineon binfmt_misc snd_hda_codec_realtek nfsd nfs snd_hda_intel lockd snd_hda_codec fscache auth_rpcgss snd_hwdep nfs_acl snd_pcm sunrpc snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device hp_wmi sparse_keymap snd dm_multipath psmouse serio_raw tpm_tis mac_hid soundcore snd_page_alloc mei(C) lp parport dm_raid45 xor dm_mirror dm_region_hash dm_log btrfs zlib_deflate libcrc32c usbhid hid e1000e i915 drm_kms_helper drm i2c_algo_bit video wmi zram(C) [last unloaded: uvcvideo]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Pid: 3153, comm: insmod Tainted: G C O 3.2.34etsele #1 Hewlett-Packard HP Compaq 6000 Pro MT PC/3048h
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP: 0060:[<6d742e65>] EFLAGS: 00210206 CPU: 0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP is at 0x6d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EAX: ea9b8800 EBX: ea9b8800 ECX: 6d742e65 EDX: eb18dc90
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] ESI: eb18dc90 EDI: eb18dc90 EBP: eb18dc30 ESP: eb18dc24
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Process insmod (pid: 3153, ti=eb18c000 task=eb30d400 task.ti=eb18c000)
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Stack:
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] c144a1fd ea9b8800 eb18dc4c eb18dc44 c13b05df ea9b8800 00000000 ea9b8808
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] eb18dca0 c13b6be1 00000000 eb3bc0d0 eb18dc94 c11c5560 00000000 00000000
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] 0000000a eb3bffff 00000001 14e7232a eb18dcd5 ffffffff eb18dc80 f7022208
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Call Trace:
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c144a1fd>] ? usb_devnode+0x2d/0x40
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b05df>] device_get_devnode+0x5f/0xd0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b6be1>] devtmpfs_create_node+0x41/0x100
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c11c5560>] ? sysfs_do_create_link+0xb0/0x1e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13aff3f>] device_add+0x1ff/0x620
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b9ae0>] ? device_pm_init+0x60/0x80
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b0377>] device_register+0x17/0x20
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b0431>] device_create_vargs+0xb1/0xe0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b048d>] device_create+0x2d/0x30
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c144a093>] usb_register_dev+0x133/0x270
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15e8afd>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f98d81dc>] ele784_probe+0x17c/0x1bc [usb_cam]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c14482ae>] usb_probe_interface+0xce/0x210
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2815>] ? driver_sysfs_add+0x75/0xa0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2a0f>] driver_probe_device+0x8f/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15e7392>] ? mutex_lock_nested+0x42/0x50
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2cf9>] __driver_attach+0x99/0xa0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2c60>] ? driver_probe_device+0x2e0/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b1979>] bus_for_each_dev+0x49/0x70
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2661>] driver_attach+0x21/0x30
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2c60>] ? driver_probe_device+0x2e0/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b22b7>] bus_add_driver+0x1c7/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b31d6>] driver_register+0x66/0x110
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c12e5912>] ? __raw_spin_lock_init+0x32/0x60
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c1447229>] usb_register_driver+0x79/0x140
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f90bc01b>] ele784_init+0x1b/0x1000 [usb_cam]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c103b3ef>] ? set_memory_nx+0x5f/0x70
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c1003035>] do_one_initcall+0x35/0x170
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f90bc000>] ? 0xf90bbfff
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c10a3aeb>] sys_init_module+0x2db/0x1d60
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15ef79f>] sysenter_do_call+0x12/0x38
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Code: Bad EIP value.
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP: [<6d742e65>] 0x6d742e65 SS:ESP 0068:eb18dc24
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] CR2: 000000006d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.361304] ---[ end trace 3f64a15c3c778575 ]---
Your usb_class_driver structure must be correctly initialized.
You could use kzalloc instead of kmalloc, but having multiple classes for multiple cameras would be wrong, so you should make the camera class a static variable (like in every other driver that uses usb_register_dev).