echo "minor number" > misc char driver - gets truncated - linux-kernel

I have written a misc char driver and when I echo the minor number of the driver to misc character driver it does not receive the first digit.
The write function is like this,
static ssize_t hellowld_write(struct file *file, const char *buf,size_t len,
loff_t *ppos)
{
pr_debug("In hellowld_write len %d buf %s \n",len,buf);
minor[0] = '\0';
snprintf(minor, 20,"%d", hellowld_device.minor);
strncpy(inputminor, buf,len);
pr_debug("In hellowld_write -- minor %s -- recvNum %s end\n",minor, inputminor);
pr_debug("In hellowld_write -- strlen minor %d -- strlen inputminor %d end\n", strlen(minor), strlen(inputminor));
if( strncmp(minor, inputminor, strlen(minor)) == 0)
{
pr_debug("In hellowld_write -- strcmp passed \n");
return strlen(minor);
}
else
{
pr_debug("In hellowld_write -- strcmp failed \n");
return -EFAULT;
}
}
When I do,
echo "54" >/dev/hellowld
Dec 31 11:36:40 pavan-linux kernel: [16981.382424] In hellowld_write len 1 buf
Dec 31 11:36:40 pavan-linux kernel: [16981.382424]
Dec 31 11:36:40 pavan-linux kernel: [16981.382424] o
Dec 31 11:36:40 pavan-linux kernel: [16981.382437] In hellowld_write -- minor 54 -- recvNum
Dec 31 11:36:40 pavan-linux kernel: [16981.382437] 4
Dec 31 11:36:40 pavan-linux kernel: [16981.382437] end
Dec 31 11:36:40 pavan-linux kernel: [16981.382444] In hellowld_write -- strlen minor 2 -- strlen inputminor 3 end
Dec 31 11:36:40 pavan-linux kernel: [16981.382448] In hellowld_write -- strcmp failed
echo "22" >/dev/hellowld
Dec 31 11:36:52 pavan-linux kernel: [16993.356993] In hellowld_write len 3 buf 22
Dec 31 11:36:52 pavan-linux kernel: [16993.356993]
Dec 31 11:36:52 pavan-linux kernel: [16993.356993] o
Dec 31 11:36:52 pavan-linux kernel: [16993.357011] In hellowld_write -- minor 54 -- recvNum 22
Dec 31 11:36:52 pavan-linux kernel: [16993.357011] end
Dec 31 11:36:52 pavan-linux kernel: [16993.357017] In hellowld_write -- strlen minor 2 -- strlen inputminor 3 end
Dec 31 11:36:52 pavan-linux kernel: [16993.357021] In hellowld_write -- strcmp failed
why is that when I echo the minor number, the data received in the hellowld_write is truncated ?

Related

Time passed until WaitForSingleObject returned

Is there a way to know how much time passed from when my call is blocked at WaitForSingleObject function using Windbg.
source for demo compiled
f:\src\wait>dir /b
wait.cpp
f:\src\wait>type wait.cpp
#include <windows.h>
#include <stdio.h>
int main(void)
{
HANDLE handl = GetStdHandle(STD_INPUT_HANDLE);
DWORD res = 0;
INPUT_RECORD record;
DWORD numRead;
while (res != WAIT_FAILED)
{
res = WaitForSingleObject(handl, 0x3000);
printf("%x\n", res);
if (res == WAIT_OBJECT_0)
{
ReadConsoleInput(handl, &record, 1, &numRead);
if (record.EventType == KEY_EVENT)
{
if (record.Event.KeyEvent.bKeyDown)
printf("key pressed\n");
}
}
}
}
f:\src\wait>cl /nologo wait.cpp
wait.cpp
f:\src\wait>dir /b
wait.cpp
wait.exe
wait.obj
loading the exe in windbg and getting time on Waits
f:\src\wait>cdb wait.exe
Microsoft (R) Windows Debugger Version 10.0.17763.132 AMD64
0:000> bp KERNELBASE!WaitForSingleObjectEx ".echotime ; gc"
0:000> $$ lets find the first ret instruction and set a bp there
0:000> # ret KERNELBASE!WaitForSingleObjectEx
KERNELBASE!WaitForSingleObjectEx+0x12f:
00007ffa`839f846f c3 ret
0:000> bp 00007ffa`839f846f ".echo on ret;.echotime;gc"
0:000> $$ listing the bps
0:000> bl
0 e 00007ffa`839f8340 0001 (0001) 0:**** KERNELBASE!WaitForSingleObjectEx ".echotime ; gc"
1 e 00007ffa`839f846f 0001 (0001) 0:**** KERNELBASE!WaitForSingleObjectEx+0x12f ".echo on ret;.echotime;gc"
0:000> executing the exe take note of interval between WAIT_TIMEOUT and WAIT_OBJECT_0 and the
processing time of 4 ~millseconds
0:000> g
Debugger (not debuggee) time: Thu Aug 29 12:04:54.261 2019 < first hit
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:06.551 2019 <first wait_timeout
102 <--- console output of res WAIT_TIMEOUT
Debugger (not debuggee) time: Thu Aug 29 12:05:06.555 2019 <loop and hit wait
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:18.844 2019 <second timeout
102
Debugger (not debuggee) time: Thu Aug 29 12:05:18.848 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:25.205 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:25.209 2019 << 4 milliseconds
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:25.333 2019 << 113 milliseconds
0
Debugger (not debuggee) time: Thu Aug 29 12:05:25.337 2019 <hits wait again
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:25.668 2019 < timeout
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:25.673 2019 < 5 milliseconds
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:25.797 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:25.802 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.196 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:26.200 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.324 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:26.329 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.716 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:26.721 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.741 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:26.745 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.804 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:26.807 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:26.845 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:26.849 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:27.300 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:27.306 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:27.445 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:27.449 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:27.797 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:27.801 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:27.956 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:27.961 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:28.364 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:28.370 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:28.484 2019
0
Debugger (not debuggee) time: Thu Aug 29 12:05:28.489 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:40.778 2019
102
Debugger (not debuggee) time: Thu Aug 29 12:05:40.780 2019
on ret
Debugger (not debuggee) time: Thu Aug 29 12:05:42.292 2019
0
key pressed
Debugger (not debuggee) time: Thu Aug 29 12:05:42.298 2019
(1044.2984): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ffa`86ad3150 cc int 3
0:002>

BUG: unable to handle kernel paging request, DPDK

I was trying to run DPDK KNI application with dpdk version 16.07.2,
For that I first unbinded the ports from ixgbe and binded them to igb_uio module with following command
echo 0000:05:00.1 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0000:05:00.0 > /sys/bus/pci/drivers/ixgbe/unbind
echo 0x8086 0x1528 > /sys/bus/pci/drivers/igb_uio/new_id
I compiled the kni application for target machine with Linux version 4.4.20 (sushila#dev03) (gcc version 4.9.2 (crosstool-NG 1.20.0) ) #1 SMP Fri Feb 24 14:32:28 CST 2017
and when I ran the application it hung with the following message
Feb 28 10:09:37 (none) user.alert kernel: [ 87.029554] BUG: unable to handle kernel paging request at 0000077e1d012900
Feb 28 10:09:37 (none) user.alert kernel: [ 87.029695] IP: [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.029801] PGD 0
Feb 28 10:09:37 (none) user.warn kernel: [ 87.029889] Oops: 0000 [#1] SMP
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030010] Modules linked in: rte_kni(O) igb_uio(O)
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030167] CPU: 7 PID: 709 Comm: kni_single Tainted: G IO 4.4.20 #1
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030242] Hardware name: /DX58SO2, BIOS SOX5820J.86A.0603.2010.1117.1506 11/17/2010
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030320] task: ffff8805a8ad8000 ti: ffff8805a7ae0000 task.ti: ffff8805a7ae0000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030395] RIP: 0010:[<ffffffffa0033722>] [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030517] RSP: 0018:ffff8805a7ae3d30 EFLAGS: 00010286
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030576] RAX: 0000077e1d012900 RBX: 0000000000000020 RCX: 0000000000000010
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030639] RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffffffa00388a3
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030701] RBP: ffff8805a7ae3e80 R08: 000000000000000a R09: 00000000fffffffe
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030766] R10: 00000000ffff2fea R11: 0000000000000006 R12: ffff8805a8a75000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030829] R13: ffff8800b8c12800 R14: 0000000000000000 R15: ffff8805a8a75800
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030893] FS: 0000000000000000(0000) GS:ffff88062fce0000(0000) knlGS:0000000000000000
Feb 28 10:09:37 (none) user.warn kernel: [ 87.030971] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031031] CR2: 0000077e1d012900 CR3: 0000000001e0a000 CR4: 00000000000006e0
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031094] Stack:
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031148] ffff88062fcf5940 ffff8805a8ad8560 0000000000000000 ffff88060000054e
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031367] 0000077e1d012900 00000000b8c12800 00000000b8c11ec0 00000000b8c11580
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031587] 00000000b8c10c40 00000000b8c10300 00000000b8c0f9c0 00000000b8c0f080
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031811] Call Trace:
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031871] [<ffffffffa00343af>] kni_net_rx+0xf/0x20 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.031937] [<ffffffffa0032f05>] kni_thread_single+0x45/0xb0 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032004] [<ffffffffa0032ec0>] ? kni_init_net+0x50/0x50 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032067] [<ffffffff8107b7cb>] kthread+0xdb/0x100
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032125] [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032186] [<ffffffff81834c2f>] ret_from_fork+0x3f/0x70
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032246] [<ffffffff8107b6f0>] ? kthread_park+0x60/0x60
Feb 28 10:09:37 (none) user.warn kernel: [ 87.032306] Code: 48 89 85 d0 fe ff ff eb 80 41 f6 c6 0f 75 0e 48 c7 c7 9f 88 03 a0 31 c0 e8 02 e9 11 e1 48 8b 85 d0 fe ff ff 48 c7 c7 a3 88 03 a0 <42> 0f b6 34 30 31 c0 49 83 c6 01 e8 e4 e8 11 e1 e9 5e fe ff ff
Feb 28 10:09:37 (none) user.alert kernel: [ 87.034742] RIP [<ffffffffa0033722>] kni_net_rx_normal+0x2e2/0x440 [rte_kni]
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034844] RSP <ffff8805a7ae3d30>
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034900] CR2: 0000077e1d012900
Feb 28 10:09:37 (none) user.warn kernel: [ 87.034956] ---[ end trace 5b31765eb0372d51 ]---
In there I saw it was failing somewhere in kni_net_rx_normal() function of kni_net.c file.
So I narrowed down the line of code where it was failing and it came to line 169 where the memcpy happens
Next I tried to print some addresses in that function and it gave me
kva data addresses: data_kva 0000077e1d012900 kva->buff_add 00007f7e1d012880 kva->data_off 128 kni->mbuf_va (null) and kni->mbuf_kva ffff880000000000
Next I tried to see if I can print the data in data_kva address and it failed there, so it looks like it fails when I try to access data_kva # 0000077e1d012900, I guess address is wrong, I dont know why, Can you give me some idea on this or some things to try out to debug the problem.

why oom-killer with large inactive cache and enough free swap space?

It confuses me that there was large inactive file page cache 734812kB and dirty cache 800088kB seemed could be reclaimed, why did oom-killer happen? .
The vm.swappiness was set 0, as says in linux document, 0 does not mean avoiding swap completely, and I have found swap space used reached 300MB with swappiness=0 with the same setting and OS on other server.
OS info:
CentOS 6.4 kernel: 2.6.32-358.23.2.el6.x86_64
oom-killer log as following:
Aug 26 14:34:48 withivan.me kernel: java invoked oom-killer: gfp_mask=0xd0, order=0, oom_adj=0, oom_score_adj=0
Aug 26 14:34:48 withivan.me kernel: java cpuset=/ mems_allowed=0
Aug 26 14:34:48 withivan.me kernel: Pid: 28505, comm: java Not tainted 2.6.32-358.23.2.el6.x86_64 #1
Aug 26 14:34:48 withivan.me kernel: Call Trace:
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810cb641>] ? cpuset_print_task_mems_allowed+0x91/0xb0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111ce40>] ? dump_header+0x90/0x1b0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8121d4ec>] ? security_real_capable_noaudit+0x3c/0x70
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d2c2>] ? oom_kill_process+0x82/0x2a0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d201>] ? select_bad_process+0xe1/0x120
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8111d700>] ? out_of_memory+0x220/0x3c0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8112c3dc>] ? __alloc_pages_nodemask+0x8ac/0x8d0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81160c6a>] ? alloc_pages_current+0xaa/0x110
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8148d667>] ? tcp_sendmsg+0x677/0xa20
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81435f33>] ? sock_sendmsg+0x123/0x150
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81096da0>] ? autoremove_wake_function+0x0/0x40
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810aa43e>] ? futex_wake+0x10e/0x120
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810ac3a0>] ? do_futex+0x100/0xb60
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8119cfdf>] ? destroy_inode+0x2f/0x60
Aug 26 14:34:48 withivan.me kernel: [<ffffffff81436249>] ? sys_sendto+0x139/0x190
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8103b8cc>] ? kvm_clock_read+0x1c/0x20
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8103b8d9>] ? kvm_clock_get_cycles+0x9/0x10
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810a1507>] ? getnstimeofday+0x57/0xe0
Aug 26 14:34:48 withivan.me kernel: [<ffffffff810a15fa>] ? do_gettimeofday+0x1a/0x50
Aug 26 14:34:48 withivan.me kernel: [<ffffffff8100b072>] ? system_call_fastpath+0x16/0x1b
Aug 26 14:34:48 withivan.me kernel: Mem-Info:
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 0, btch: 1 usd: 0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32 per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 186, btch: 31 usd: 32
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 186, btch: 31 usd: 1
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal per-cpu:
Aug 26 14:34:48 withivan.me kernel: CPU 0: hi: 186, btch: 31 usd: 4
Aug 26 14:34:48 withivan.me kernel: CPU 1: hi: 186, btch: 31 usd: 38
Aug 26 14:34:48 withivan.me kernel: CPU 2: hi: 186, btch: 31 usd: 0
Aug 26 14:34:48 withivan.me kernel: CPU 3: hi: 186, btch: 31 usd: 57
Aug 26 14:34:48 withivan.me kernel: active_anon:1697553 inactive_anon:373583 isolated_anon:0
Aug 26 14:34:48 withivan.me kernel: active_file:174263 inactive_file:199171 isolated_file:0
Aug 26 14:34:48 withivan.me kernel: unevictable:0 dirty:216860 writeback:1 unstable:0
Aug 26 14:34:48 withivan.me kernel: free:35470 slab_reclaimable:14993 slab_unreclaimable:6945
Aug 26 14:34:48 withivan.me kernel: mapped:3423 shmem:45 pagetables:5263 bounce:0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA free:15740kB min:148kB low:184kB high:220kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15344kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 3512 10077 10077
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32 free:61200kB min:34800kB low:43500kB high:52200kB active_anon:2479864kB inactive_anon:621800kB active_file:39752kB inactive_file:61872kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596500kB mlocked:0kB dirty:67352kB writeback:0kB mapped:20kB shmem:0kB slab_reclaimable:21672kB slab_unreclaimable:1952kB kernel_stack:3792kB pagetables:4344kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:94464 all_unreclaimable? no
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 0 6565 6565
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal free:64940kB min:65048kB low:81308kB high:97572kB active_anon:4310348kB inactive_anon:872532kB active_file:657300kB inactive_file:734812kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:6722560kB mlocked:0kB dirty:800088kB writeback:4kB mapped:13672kB shmem:180kB slab_reclaimable:38300kB slab_unreclaimable:25828kB kernel_stack:4568kB pagetables:16708kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:605088 all_unreclaimable? no
Aug 26 14:34:48 withivan.me kernel: lowmem_reserve[]: 0 0 0 0
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA: 3*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15740kB
Aug 26 14:34:48 withivan.me kernel: Node 0 DMA32: 1247*4kB 1189*8kB 969*16kB 379*32kB 92*64kB 12*128kB 1*256kB 8*512kB 5*1024kB 1*2048kB 0*4096kB = 61076kB
Aug 26 14:34:48 withivan.me kernel: Node 0 Normal: 2047*4kB 1672*8kB 781*16kB 309*32kB 46*64kB 3*128kB 1*256kB 20*512kB 7*1024kB 0*2048kB 0*4096kB = 64940kB
Aug 26 14:34:48 withivan.me kernel: 379124 total pagecache pages
Aug 26 14:34:48 withivan.me kernel: 4685 pages in swap cache
Aug 26 14:34:48 withivan.me kernel: Swap cache stats: add 167082, delete 162397, find 114795/130707
Aug 26 14:34:48 withivan.me kernel: Free swap = 4166416kB
Aug 26 14:34:48 withivan.me kernel: Total swap = 4194296kB
Aug 26 14:34:48 withivan.me kernel: 2621439 pages RAM
Aug 26 14:34:48 withivan.me kernel: 89408 pages reserved
Aug 26 14:34:48 withivan.me kernel: 384993 pages shared
Aug 26 14:34:48 withivan.me kernel: 2116876 pages non-shared

How to debug this condition of "eth2: tx hang 1 detected on queue 11, resetting adapter"?

I want to send sk_buff by "dev_queue_xmit", when I just send 2 packets, the network card may be hang.
I want to know how to debug this condition.
the /var/log/messages is:
[root#10g-host2 test]# tail -f /var/log/messages
Sep 29 10:38:22 10g-host2 acpid: waiting for events: event logging is off
Sep 29 10:38:23 10g-host2 acpid: client connected from 2018[68:68]
Sep 29 10:38:23 10g-host2 acpid: 1 client rule loaded
Sep 29 10:38:24 10g-host2 automount[2210]: lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
Sep 29 10:38:24 10g-host2 mcelog: failed to prefill DIMM database from DMI data
Sep 29 10:38:24 10g-host2 xinetd[2246]: xinetd Version 2.3.14 started with libwrap loadavg labeled-networking options compiled in.
Sep 29 10:38:24 10g-host2 xinetd[2246]: Started working: 0 available services
Sep 29 10:38:25 10g-host2 abrtd: Init complete, entering main loop
Sep 29 10:39:41 10g-host2 kernel: vmalloc mmap_buf=ffffc90016e29000 mmap_size=4096
Sep 29 10:39:41 10g-host2 kernel: insmod module wsmmap successfully!
Sep 29 10:39:49 10g-host2 kernel: mmap_buf + 1024 is ffffc90016e29400
Sep 29 10:39:49 10g-host2 kernel: data ffffc90016e2942a, len is 42
Sep 29 10:39:49 10g-host2 kernel: udp data ffffc90016e29422
Sep 29 10:39:49 10g-host2 kernel: ip data ffffc90016e2940e
Sep 29 10:39:49 10g-host2 kernel: eth data ffffc90016e29400
Sep 29 10:39:49 10g-host2 kernel: h_source is ffffc90016e29406, dev_addr is ffff880c235c4750, len is 6result is 0
Sep 29 10:39:50 10g-host2 kernel: mmap_buf + 1024 is ffffc90016e29400
Sep 29 10:39:50 10g-host2 kernel: data ffffc90016e2942a, len is 42
Sep 29 10:39:50 10g-host2 kernel: udp data ffffc90016e29422
Sep 29 10:39:50 10g-host2 kernel: ip data ffffc90016e2940e
Sep 29 10:39:50 10g-host2 kernel: eth data ffffc90016e29400
Sep 29 10:39:50 10g-host2 kernel: h_source is ffffc90016e29406, dev_addr is ffff880c235c4750, len is 6result is 0
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Detected Tx Unit Hang
Sep 29 10:39:52 10g-host2 kernel: Tx Queue <11>
Sep 29 10:39:52 10g-host2 kernel: TDH, TDT <0>, <5>
Sep 29 10:39:52 10g-host2 kernel: next_to_use <5>
Sep 29 10:39:52 10g-host2 kernel: next_to_clean <0>
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: tx_buffer_info[next_to_clean]
Sep 29 10:39:52 10g-host2 kernel: time_stamp <fffd3dd8>
Sep 29 10:39:52 10g-host2 kernel: jiffies <fffd497f>
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: tx hang 1 detected on queue 11, resetting adapter
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: Reset adapter
Sep 29 10:39:52 10g-host2 kernel: ixgbe 0000:03:00.0: master disable timed out
Sep 29 10:39:53 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: detected SFP+: 5
Sep 29 10:39:54 10g-host2 kernel: ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
some information of my computer is:
ethtool -i eth2
driver: ixgbe
version: 3.21.2
firmware-version: 0x1bab0001
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: no
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.5 (Final)
Release: 6.5
Codename: Final
kernel version is: 2.6.32-431.el6.x86_64
Thank you for your help.
I use vmalloc() which alloc the memory for skb->data, so this let NIC down. I fix it by use kmalloc().

usb_register_dev crashing linux kernel

This is for a class, but we are stumped. We are currently writing a USB driver for a Logitech camera that uses USBCore. What's happening is we load the module, then when we connect the USB Camera, and the kernel crashes and gives us a kernel trace (below). After a bit of debugging, we are pretty sure it crashes on usb_register_dev within the probe function, but we can't figure out why. We were hoping that someone would have any helpful suggestions or to send us on the right path. We're not asking for answers, just guidance.
We have looked at all of our variable initalizers and based on our notes and skull examples, it looks alright. Below are code snippits to the important functions and the call trace.
Kernel (Custom school, but based on 3.2.34):
Linux ETSELE 3.2.34etsele #1 SMP PREEMPT Tue Jan 22 18:22:05 EST 2013 i686 i686 i386 GNU/Linux
Init:
static int __init
usb_cam_init(void) {
int result = 0;
if ((result = usb_register(&cam_driver)))
printk("usb_register failed. Error number %d", result);
return result;
}
Probe:
static int
usb_cam_probe(struct usb_interface * intf, const struct usb_device_id * devid) {
int retval = 0;
struct usb_host_interface *interface;
struct usb_endpoint_descriptor *endpoint;
struct usb_device *dev = interface_to_usbdev(intf);
struct usb_cam *usbdev = NULL;
int n, m, altSetNum, activeInterface = -1;
printk("kmalloc\n");
usbdev = kmalloc(sizeof(struct usb_ele_cam), GFP_KERNEL); ///////////////
printk("usb_get_dev\n");
usbdev->usb_dev = usb_get_dev(dev);
usbdev->class = (struct usb_class_driver *) kmalloc(sizeof(struct usb_class_driver), GFP_KERNEL);
usbdev->class->name = "cam";
usbdev->class->fops = &cam_fops;
usbdev->class->minor_base = 0;
// usbdev->class->mode = O_RDWR;
printk("for\n");
for (n = 0; n < intf->num_altsetting; n++) {
interface = &intf->altsetting[n];
altSetNum = interface->desc.bAlternateSetting;
for (m = 0; m < interface->desc.bNumEndpoints; m++) {
endpoint = &interface->endpoint[m].desc;
if (!usbdev->bulk_in_endpointAddr && (endpoint->bEndpointAddress & USB_DIR_IN)
&& ((endpoint->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK) == USB_ENDPOINT_XFER_BULK)) {
usbdev->bulk_in_size = endpoint->wMaxPacketSize;
usbdev->bulk_in_endpointAddr = endpoint->bEndpointAddress;
usbdev->bulk_in_buffer = kmalloc(usbdev->bulk_in_size, GFP_KERNEL);
activeInterface = altSetNum;
break;
}
}
if (activeInterface != -1)
break;
}
printk("usb_set_intfdata\n");
usb_set_intfdata(intf, usbdev);
printk("usb_register_dev\n");
usb_register_dev(intf, usbdev->class);
//printk("Not able to get a minor for this device");
printk("usb_set_interface\n");
usb_set_interface(dev, interface->desc.bInterfaceNumber, activeInterface);
return retval;
}
Structures and global variables:
struct usb_cam {
struct usb_device *usb_dev;
struct usb_interface *usb_inf;
struct usb_class_driver *class;
struct semaphore sem;
unsigned char *bulk_in_buffer;
size_t bulk_in_size;
__u8 bulk_in_endpointAddr;
__u8 bulk_out_endpointAddr;
int errors;
int open_count;
struct kref kref;
};
Logs from kern.log:
Nov 26 11:25:15 ETSELE kernel: [ 123.845972] usbcore: deregistering interface driver uvcvideo
Nov 26 11:25:32 ETSELE kernel: [ 140.234188] kmalloc
Nov 26 11:25:32 ETSELE kernel: [ 140.234192] usb_get_dev
Nov 26 11:25:32 ETSELE kernel: [ 140.234194] for
Nov 26 11:25:32 ETSELE kernel: [ 140.234196] usb_set_intfdata
Nov 26 11:25:32 ETSELE kernel: [ 140.234198] usb_register_dev
Nov 26 11:25:32 ETSELE kernel: [ 140.234450] BUG: unable to handle kernel paging request at 6d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.234506] IP: [<6d742e65>] 0x6d742e64
Nov 26 11:25:32 ETSELE kernel: [ 140.234539] *pdpt = 000000002bf84001 *pde = 0000000000000000
Nov 26 11:25:32 ETSELE kernel: [ 140.234585] Oops: 0010 [#1] PREEMPT SMP
Nov 26 11:25:32 ETSELE kernel: [ 140.234619] Modules linked in: usb_cam(O+) snd_usb_audio snd_usbmidi_lib videodev vtsspp(O) sep3_10(O) pax(O) autofs4 apwr3_1(O) bnep rfcomm bluetooth parport_pc ppdev tpm_infineon binfmt_misc snd_hda_codec_realtek nfsd nfs snd_hda_intel lockd snd_hda_codec fscache auth_rpcgss snd_hwdep nfs_acl snd_pcm sunrpc snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device hp_wmi sparse_keymap snd dm_multipath psmouse serio_raw tpm_tis mac_hid soundcore snd_page_alloc mei(C) lp parport dm_raid45 xor dm_mirror dm_region_hash dm_log btrfs zlib_deflate libcrc32c usbhid hid e1000e i915 drm_kms_helper drm i2c_algo_bit video wmi zram(C) [last unloaded: uvcvideo]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Pid: 3153, comm: insmod Tainted: G C O 3.2.34etsele #1 Hewlett-Packard HP Compaq 6000 Pro MT PC/3048h
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP: 0060:[<6d742e65>] EFLAGS: 00210206 CPU: 0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP is at 0x6d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EAX: ea9b8800 EBX: ea9b8800 ECX: 6d742e65 EDX: eb18dc90
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] ESI: eb18dc90 EDI: eb18dc90 EBP: eb18dc30 ESP: eb18dc24
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Process insmod (pid: 3153, ti=eb18c000 task=eb30d400 task.ti=eb18c000)
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Stack:
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] c144a1fd ea9b8800 eb18dc4c eb18dc44 c13b05df ea9b8800 00000000 ea9b8808
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] eb18dca0 c13b6be1 00000000 eb3bc0d0 eb18dc94 c11c5560 00000000 00000000
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] 0000000a eb3bffff 00000001 14e7232a eb18dcd5 ffffffff eb18dc80 f7022208
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Call Trace:
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c144a1fd>] ? usb_devnode+0x2d/0x40
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b05df>] device_get_devnode+0x5f/0xd0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b6be1>] devtmpfs_create_node+0x41/0x100
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c11c5560>] ? sysfs_do_create_link+0xb0/0x1e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13aff3f>] device_add+0x1ff/0x620
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b9ae0>] ? device_pm_init+0x60/0x80
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b0377>] device_register+0x17/0x20
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b0431>] device_create_vargs+0xb1/0xe0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b048d>] device_create+0x2d/0x30
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c144a093>] usb_register_dev+0x133/0x270
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15e8afd>] ? _raw_spin_unlock_irqrestore+0x5d/0x80
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f98d81dc>] ele784_probe+0x17c/0x1bc [usb_cam]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c14482ae>] usb_probe_interface+0xce/0x210
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2815>] ? driver_sysfs_add+0x75/0xa0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2a0f>] driver_probe_device+0x8f/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15e7392>] ? mutex_lock_nested+0x42/0x50
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2cf9>] __driver_attach+0x99/0xa0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2c60>] ? driver_probe_device+0x2e0/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b1979>] bus_for_each_dev+0x49/0x70
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2661>] driver_attach+0x21/0x30
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b2c60>] ? driver_probe_device+0x2e0/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b22b7>] bus_add_driver+0x1c7/0x2e0
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c13b31d6>] driver_register+0x66/0x110
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c12e5912>] ? __raw_spin_lock_init+0x32/0x60
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c1447229>] usb_register_driver+0x79/0x140
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f90bc01b>] ele784_init+0x1b/0x1000 [usb_cam]
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c103b3ef>] ? set_memory_nx+0x5f/0x70
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c1003035>] do_one_initcall+0x35/0x170
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<f90bc000>] ? 0xf90bbfff
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c10a3aeb>] sys_init_module+0x2db/0x1d60
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] [<c15ef79f>] sysenter_do_call+0x12/0x38
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] Code: Bad EIP value.
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] EIP: [<6d742e65>] 0x6d742e65 SS:ESP 0068:eb18dc24
Nov 26 11:25:32 ETSELE kernel: [ 140.235146] CR2: 000000006d742e65
Nov 26 11:25:32 ETSELE kernel: [ 140.361304] ---[ end trace 3f64a15c3c778575 ]---
Your usb_class_driver structure must be correctly initialized.
You could use kzalloc instead of kmalloc, but having multiple classes for multiple cameras would be wrong, so you should make the camera class a static variable (like in every other driver that uses usb_register_dev).

Resources