spinlock lockup suspected reasons - linux-kernel

What could be reasons for the following message:
BUG: spinlock lockup suspected on CPU#0, sh/11786
lock: kmap_lock+0x0/0x40, .magic: dead4ead, .owner: sh/11787, .owner_cpu: 1

Blockquote
BUG: spinlock lockup suspected on CPU#0, sh/11786
This indicates the CPU0 is lockup, and the thread/Process would be sh (or start by sh, I am not sure). You should have a look at the stack strace info dumped by the kernel. For example:
127|uid=0 gid=1007#nutshell:/var # [ 172.285647] BUG: spinlock lockup on CPU#0, swapper/0, 983482f0
[ 172.291523] [<8003cb44>] (unwind_backtrace+0x0/0xf8) from [<801853e4>] (do_raw_spin_lock+0x100/0x164)
[ 172.300768] [<801853e4>] (do_raw_spin_lock+0x100/0x164) from [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60)
[ 172.310618] [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60) from [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092])
[ 172.321636] [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092]) from [<800abee0>] (handle_irq_event_percpu+0x50/0x184)
[ 172.332781] [<800abee0>] (handle_irq_event_percpu+0x50/0x184) from [<800ac050>] (handle_irq_event+0x3c/0x5c)
[ 172.342622] [<800ac050>] (handle_irq_event+0x3c/0x5c) from [<800ae00c>] (handle_level_irq+0xac/0xfc)
[ 172.351767] [<800ae00c>] (handle_level_irq+0xac/0xfc) from [<800ab82c>] (generic_handle_irq+0x2c/0x40)
[ 172.361090] [<800ab82c>] (generic_handle_irq+0x2c/0x40) from [<800552e8>] (mx3_gpio_irq_handler+0x78/0x140)
[ 172.370843] [<800552e8>] (mx3_gpio_irq_handler+0x78/0x140) from [<800ab82c>] (generic_handle_irq+0x2c/0x40)
[ 172.380595] [<800ab82c>] (generic_handle_irq+0x2c/0x40) from [<80036904>] (handle_IRQ+0x4c/0xac)
[ 172.389402] [<80036904>] (handle_IRQ+0x4c/0xac) from [<80035ad0>] (__irq_svc+0x50/0xd0)
[ 172.397416] [<80035ad0>] (__irq_svc+0x50/0xd0) from [<80036bb4>] (default_idle+0x28/0x2c)
[ 172.405603] [<80036bb4>] (default_idle+0x28/0x2c) from [<80036e9c>] (cpu_idle+0x9c/0x108)
[ 172.413793] [<80036e9c>] (cpu_idle+0x9c/0x108) from [<800088b4>] (start_kernel+0x294/0x2e4)
[ 172.422181] [<800088b4>] (start_kernel+0x294/0x2e4) from [<10008040>] (0x10008040)
[1]This would tell you the function call relationships. Notice the info:
[ 172.310618] [<80350508>] (_raw_spin_lock_irqsave+0x54/0x60) from [<7f3cf4a0>] (mlb_os81092_interrupt+0x18/0x68 [os81092])
This tells mlb_os81092_interrupt function try to use the spin_lock_irqsave to lock something. So we can just found this spinlock is used to lock what, and try to analyse or and logs to detect which one is holding the lock. Then found the method to avoid it.
[2]Also because the CPU0 is locked, and there can be MP system, you should make sure whether there is the a irq which may use the critical resource, if the handler of irq is assigned to other CPUs(like the CPU1), is's OK, but if CPU0 deals with the handler of irq, this would cause the deadlock if you use the spin_lock not the spin_lock_irqsave, so check it.

Related

raw_spin_lock(): unexpected null pointer exception

So I adapted some of the /kernel/sched/rt.c code to write my own simple CPU scheduler, and I'm getting a null pointer dereference exception when I try to acquire a lock. This is despite me printk()'ing all of the relevant pointers, and seeing that they're not NULL.
//Snippet from my adaptation of update_curr_rt()
//wrr_rq is a struct wrr_rq*
printk("Before loop, wrr_rq pointer is %p\n",wrr_rq);
printk("Before loop, &wrr_rq->wrr_runtime_lock is %p\n",&wrr_rq->wrr_runtime_lock);
for_each_sched_wrr_entity(wrr_se) {
printk("1\n");
wrr_rq = wrr_rq_of_se(wrr_se);
printk("2\n");
raw_spin_lock(&wrr_rq->wrr_runtime_lock);
printk("3\n");
[ 263.595176] Before loop, wrr_rq is 00000000aebb4d6d
[ 263.596283] Before loop, &wrr_rq->wrr_runtime_lock is 0000000015dee87f
[ 263.597764] 1
[ 263.598141] wrr_rq_of_se: called
[ 263.598888] 2
[ 263.599268] BUG: kernel NULL pointer dereference, address: 0000000000000068
[ 263.600836] #PF: supervisor write access in kernel mode
[ 263.602027] #PF: error_code(0x0002) - not-present page
...
[ 263.656134] RIP: 0010:_raw_spin_lock+0x7/0x20
I've printed all the relevant pointers and seen they're not NULL (and have values quite a bit above 0), but I still get this exception. I tried using the elixir browser to see what is happening with the raw_spin_lock() macro, and it doesn't seem like anything crazy is happening...
In addition, the runqueue lock is already held when this code is called (the runqueue lock is acquired by task_sched_runtime()).
Any thoughts appreciated.
Thanks.
Credit to #0andriy: It turns out that kernel NULL pointers when printed by %p get hashed to some other unique value that may not be NULL, and so when I printed things with %px I saw they were in fact NULL.

How to debug/find cma allocation failure reason?

Is there any open source debug methods/patch available for CMA failure debug ?
How to know the reason of CMA allocation failure
1.When the cma_allocation fails it dump backtrace of failure.
eg.
[ 35.360001] page:bef55be8 count:58 mapcount:56 mapping:bc4001dc index:0x3
[ 35.366855] flags: 0x8019040c(referenced|uptodate|arch_1|mappedtodisk|unevictable|mlocked)
[ 35.375173] raw: 8019040c bc4001dc 00000003 00000037 0000003a b9eb1a98 b9eb1a98 00000000
[ 35.383299] raw: be008c00
[ 35.385916] page dumped because: VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page))
[ 35.393995] page->mem_cgroup:be008c00
[ 35.397668] ------------[ cut here ]------------
[ 35.402281] kernel BUG at mm/vmscan.c:1350!
[ 35.406458] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 37.778079] Backtrace:
[ 37.780531] [<80360610>] (shrink_page_list) from [<803617c8>]
(reclaim_clean_pages_from_list+0x14c/0x1a8)
[ 37.790093] r10:b9c6fb88 r9:b9c6fb9c r8:b9c6fb0c r7:8141e100 r6:81216588 r5:b9c6fb9c
[ 37.797914] r4:bf05ffb8
[ 37.800444] [<8036167c>] (reclaim_clean_pages_from_list) from [<80352b2c>] (alloc_contig_range+0x17c/0x4e0)
[ 37.810178] r10:00000000 r9:8121e384 r8:814790c4 r7:b9c6e000 r6:0006a000 r5:00081a00
[ 37.817999] r4:b9c6fb9c
[ 37.820529] [<803529b0>] (alloc_contig_range) from [<803bd8c8>] (cma_alloc+0x154/0x5dc)
[ 37.828527] r10:00040000 r9:00017c00 r8:fffffff4 r7:00017c00 r6:8147bf24 r5:00009e00
[ 37.836347] r4:00069e00
[ 37.838878] [<803bd774>] (cma_alloc) from [<80694188>] (dma_alloc_from_contiguous+0x40/0x44)
[ 37.847310] r10:00000000 r9:80607f30 r8:b9c6fd64 r7:00017c00 r6:17c00000 r5:81216588
[ 37.855131] r4:00000001
[ 37.857661] [<80694148>] (dma_alloc_from_contiguous) from [<80218720>] (__alloc_from_contiguous+0x54/0x144)
[ 37.867396] [<802186cc>] (__alloc_from_contiguous) from [<80218854>] (cma_allocator_alloc+0x44/0x4c)
[ 37.876523] r10:00000000 r9:b9c6fe08 r8:81216588 r7:00c00000 r6:b94d0140 r5:80607f30
[ 37.884343] r4:00000001
[ 37.886870] [<80218810>] (cma_allocator_alloc) from [<80217e28>] (__dma_alloc+0x19c/0x2e4)
[ 37.895125] r5:bd2da400 r4:014000c0
[ 37.898695] [<80217c8c>] (__dma_alloc) from [<80218000>] (arm_dma_alloc+0x4c/0x54)
[ 37.906258] r10:00000080 r9:17c00000 r8:80c01778 r7:bd2da400 r6:8148ff6c r5:00c00000
[ 37.914079] r4:00000707
[ 37.916608] [<80217fb4>] (arm_dma_alloc) from [<80607f30>]
[ 37.924690] r5:81490278 r4:81216588
You can debug cma allocation failure using this backtrace.
Continuously check /proc/pagetypeinfo before allocation and after allocation, it will give you hint if pages are back or not to the initial stage.
To get pages information refer below link:-
link
stable kernel bug here is the patch link
According to link :-
This cma mechanism imposes following weaknesses.
Allocation failure
CMA could fail to allocate contiguous memory due to following reasons.
1-1. Direct pinning
Any kernel thread could pin any movable pages for a long time. If a movable page which needs to be migrated for a contiguous memory allocation is already pinned by someone, migration could not be completed. In consequence, contiguous allocation could be fail if the page is not be unpinned longtime.
1-2. Indirect pin
If a movable page have dependency with an object, the object would increase reference count of the movable page to assert it is safe to use the page. If a movable page which is needs to be migrated for a contiguous memory allocation is in the case, the page could not be free to be used by contiguous allocation.
In consequence, contiguous allocation could be failed.
In short, cma doesn't guarantee success and fast latency of contiguous memory allocation. And, the core cause is the fact that cma chosen 2nd-class client(movable pages) were not nice(hard to migrate / discard) enough

USB crash results in "nobody cared" warning

I am working on a USB crash related issue on my board which has a USB 2.0 based HCI . The issue is something like below:
1.691533] irq 36: nobody cared (try booting with the "irqpoll" option)
[ 1.698242] CPU: 0 PID: 87 Comm: kworker/0:1 Not tainted 4.9.88 #24
[ 1.704509] Hardware name: Freescale i.MX8QXP MEK (DT)
[ 1.709659] Workqueue: pm pm_runtime_work
[ 1.713675] Call trace:
[ 1.716123] [<ffff0000080897d0>] dump_backtrace+0x0/0x1b0
[ 1.721523] [<ffff000008089994>] show_stack+0x14/0x20
[ 1.726582] [<ffff0000083daff0>] dump_stack+0x94/0xb4
[ 1.731638] [<ffff00000810f064>] __report_bad_irq+0x34/0xf0
[ 1.737212] [<ffff00000810f4ec>] note_interrupt+0x2e4/0x330
[ 1.742790] [<ffff00000810c594>] handle_irq_event_percpu+0x44/0x58
[ 1.748974] [<ffff00000810c5f0>] handle_irq_event+0x48/0x78
[ 1.754553] [<ffff0000081100a8>] handle_fasteoi_irq+0xc0/0x1b0
[ 1.760390] [<ffff00000810b584>] generic_handle_irq+0x24/0x38
[ 1.766141] [<ffff00000810bbe4>] __handle_domain_irq+0x5c/0xb8
[ 1.771979] [<ffff000008081798>] gic_handle_irq+0x70/0x15c
1.807416] 7a40: 00000000000002ba ffff80002645bf00 00000000fa83b2da 0000000001fe116e
[ 1.815252] 7a60: ffff000088bf7c47 ffffffffffffffff 00000000000003f8 ffff0000085c47b8
[ 1.823088] 7a80: 0000000000000010 ffff800026484600 0000000000000001 ffff8000266e9718
[ 1.830925] 7aa0: ffff00000b8b0008 ffff800026784280 ffff00000b8b000c ffff00000b8d8018
[ 1.838760] 7ac0: 0000000000000001 ffff000008b76000 0000000000000000 ffff800026497b20
[ 1.846596] 7ae0: ffff00000810bd24 ffff800026497b20 ffff000008851d18 0000000000000145
[ 1.854433] 7b00: ffff000008b8d6c0 ffff0000081102d8 ffffffffffffffff ffff00000810dda8
[ 1.862268] [<ffff000008082eec>] el1_irq+0xac/0x120
[ 1.867155] [<ffff000008851d18>] _raw_spin_unlock_irqrestore+0x18/0x48
[ 1.873684] [<ffff00000810bd24>] __irq_put_desc_unlock+0x1c/0x48
[ 1.879695] [<ffff00000810de10>] enable_irq+0x48/0x70
[ 1.884756] [<ffff0000085ba8f8>] cdns3_enter_suspend+0x1f0/0x440
[ 1.890764] [<ffff0000085baca0>] cdns3_runtime_suspend+0x48/0x88
[ 1.896776] [<ffff0000084cf398>] pm_generic_runtime_suspend+0x28/0x40
[ 1.903223] [<ffff0000084dc3e8>] genpd_runtime_suspend+0x88/0x1d8
[ 1.909320] [<ffff0000084d0e08>] __rpm_callback+0x70/0x98
[ 1.914724] [<ffff0000084d0e50>] rpm_callback+0x20/0x88
[ 1.919954] [<ffff0000084d1b2c>] rpm_suspend+0xf4/0x4c8
[ 1.925184] [<ffff0000084d20fc>] rpm_idle+0x124/0x168
[ 1.930240] [<ffff0000084d26c0>] pm_runtime_work+0xa0/0xb8
[ 1.935732] [<ffff0000080dc1dc>] process_one_work+0x1dc/0x380
[ 1.941481] [<ffff0000080dc3c8>] worker_thread+0x48/0x4d0
[ 1.946885] [<ffff0000080e2408>] kthread+0xf8/0x100
[ 1.957080] handlers:
[ 1.959350] [<ffff0000085ba668>] cdns3_irq
[ 1.963449] Disabling IRQ #36
After making a small study on this kind of crash occurrence, I came to know that the kernel is trying to disable the IRQ line since it is not handled nearly for 100000 times.
I have a linux bsp code where so many unwanted components are stuffed and this code does not show any kind of above crash. Once I removed all the unwanted components , this crash started showing up during boot up. And the tricky part is, the crash does not happen all the time. The irq handler returns IRG_HANDLED ,like 7 times out of 10 tries. And I can say 7 times no crash is happening.
I added a print in the irq_handler and this case may be due to print delay, crash was not occuring even after 15 attempts of booting.
Anyone having any idea what is actually happening?
First - AFAIK it is not crash, but just warning. Second - you have already answered your question :) This is the situation when none of the registered irq handlers cared of this interrupt. As you said when it returns IRQ_HANDLED (7 out of 10) the message is not appear. It means that the interrupt handler 3 times out of 10 returns something else that IRQ_HANLDED or IRQ_WAKE_THREAD. Just check the situation when it occurs in sources - when it returns anything else than IRQ_HANDLED or IRQ_WAKE_THREAD.

What are the reasons for DMA timeouts?

I see a timeout happening when trying to allocate a DMA memory region. While I can remove this bug by using GFP_ATOMIC instead of GFP_KERNEL als gfp flags, so that the DMA allocation becomes non-interruptible, I wonder what the reasons for the occurrence of such a timeout? The requested memory region is not known to the system bus? The bus is saturated?
[ 87.400000] [<c0138eec>] (schedule_bug) from [<c05a0774>] (schedule+0x3c/0x528)
[ 87.410000] [<c05a0774>] (__schedule) from [<c05a0d0c>] (schedule+0xac/0xcc)
[ 87.410000] [<c05a0d0c>] (schedule) from [<c05a3e20>] (schedule_timeout+0x20/0x2b8)
[ 87.420000] [<c05a3e20>] (schedule_timeout) from [<c05a1804>] (wait_for_common+0xf8/0x1a8)
[ 87.430000] [<c05a1804>] (wait_for_common) from [<c012c540>] (flush_work+0x174/0x1ac)
[ 87.450000] [<c012c540>] (flush_work) from [<c01a0648>] (drain_all_pages+0x108/0x130)
[ 87.460000] [<c01a0648>] (drain_all_pages) from [<c01d6d34>] (start_isolate_page_range+0xbc/0x284)
[ 87.470000] [<c01d6d34>] (start_isolate_page_range) from [<c01a3310>] (alloc_contig_range+0xdc/0x330)
[ 87.480000] [<c01a3310>] (alloc_contig_range) from [<c01d7658>] (cma_alloc+0x170/0x308)
[ 87.490000] [<c01d7658>] (cma_alloc) from [<c011142c>] (__alloc_from_contiguous+0x40/0xd8)
[ 87.500000] [<c011142c>] (__alloc_from_contiguous) from [<c0111500>] (cma_allocator_alloc+0x3c/0x44)
[ 87.510000] [<c0111500>] (cma_allocator_alloc) from [<c010f86c>] (__dma_alloc+0x1d4/0x2fc)
[ 87.520000] [<c010f86c>] (__dma_alloc) from [<c010fa0c>] (arm_dma_alloc+0x3c/0x48)
[ 87.530000] [<c010fa0c>] (arm_dma_alloc) from [<c03ae018>] (tsg_ioctl+0x3e4/0x954)
Most likely, the reason is that you are calling sleep capable function (arm_dma_alloc) from atomic context, where sleep is forbidden. Probably you making this allocation under spin_lock.
The spin_lock calls preempt_disable, what means there is no rescheduling possible. That is how spin_lock works. Anyhow, if you will call sleeping primitive inside spin_lock / spin_unlock and reschedule will happen, the kernel scheduler subsystem will warn you, that you have rescheduled in atomic context (this is what you are seeing in this message).

blocked for more than 120 seconds

I am trying to write a block device driver that reads/writes blocks off of/to a network socket. At some point the when reading multiple blocks the application that uses this driver seems to hang (but would still accept input even though it does nothing with it) and the system in general seems responsive. dmesg shows the following message. And overall I can not use the driver for anything even if I started any other application that uses it.
I am using linux kernel v3.9.
Anyone can help fix this?
[ 489.779458] INFO: task xxd:2939 blocked for more than 120 seconds.
[ 489.779466] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 489.779469] xxd D 0000000000000000 0 2939 2237 0x00000006
[ 489.779475] ffff8801912a9998 0000000000000046 02fc000000000008 ffff8801bfff7000
[ 489.779479] ffff8801b2ef45f0 ffff8801912a9fd8 ffff8801912a9fd8 ffff8801912a9fd8
[ 489.779482] ffff8801b61e9750 ffff8801b2ef45f0 ffff8801912a9998 ffff8801b8e34af8
[ 489.779485] Call Trace:
[ 489.779497] [<ffffffff81131ad0>] ? __lock_page+0x70/0x70
[ 489.779505] [<ffffffff816e86a9>] schedule+0x29/0x70
[ 489.779510] [<ffffffff816e877f>] io_schedule+0x8f/0xd0
[ 489.779514] [<ffffffff81131ade>] sleep_on_page+0xe/0x20
[ 489.779518] [<ffffffff816e654a>] __wait_on_bit_lock+0x5a/0xc0
[ 489.779522] [<ffffffff811348aa>] ? find_get_pages+0xca/0x150
[ 489.779526] [<ffffffff81131ac7>] __lock_page+0x67/0x70
[ 489.779531] [<ffffffff8107fa50>] ? autoremove_wake_function+0x40/0x40
[ 489.779536] [<ffffffff81140bd2>] truncate_inode_pages_range+0x4b2/0x4c0
[ 489.779540] [<ffffffff81140c65>] truncate_inode_pages+0x15/0x20
[ 489.779545] [<ffffffff811d331c>] kill_bdev+0x2c/0x40
[ 489.779548] [<ffffffff811d3931>] __blkdev_put+0x71/0x1c0
[ 489.779552] [<ffffffff811aeb48>] ? __d_free+0x48/0x70
[ 489.779556] [<ffffffff811d3adb>] blkdev_put+0x5b/0x160
[ 489.779559] [<ffffffff811d3c05>] blkdev_close+0x25/0x30
[ 489.779564] [<ffffffff8119b16a>] __fput+0xba/0x240
[ 489.779568] [<ffffffff8119b2fe>] ____fput+0xe/0x10
[ 489.779572] [<ffffffff8107ba18>] task_work_run+0xc8/0xf0
[ 489.779577] [<ffffffff8105f797>] do_exit+0x2c7/0xa70
[ 489.779581] [<ffffffff8106f32e>] ? send_sig_info+0x1e/0x20
[ 489.779585] [<ffffffff8106f34c>] ? send_sig+0x1c/0x20
[ 489.779588] [<ffffffff8105ffd4>] do_group_exit+0x44/0xa0
[ 489.779592] [<ffffffff8106fe00>] get_signal_to_deliver+0x230/0x600
[ 489.779600] [<ffffffff81014398>] do_signal+0x58/0x8e0
[ 489.779605] [<ffffffff81014ca0>] do_notify_resume+0x80/0xc0
[ 489.779608] [<ffffffff816f241a>] int_signal+0x12/0x17
I had the synchronization done wrong around the socket. This meant some race conditions that left some requests without being served. Those not served requests caused the process to hang.
Adding some mutexes (not semaphores) fixed this.

Resources