Windbg: break on timer / scheduler interrupt and print EIP - windows

Is there a way to setup a breakpoint on the interrupt service routine on Windows which is responsible to trigger thread scheduling and print the EIP of the thread which was interrupted?
I tried to with hal!HalpClockInterrupt but it seems its not the right place. nt!KeUpdateRunTime seams better:
Breakpoint 3 hit
nt!KeUpdateRunTime:
805410dc a11cf0dfff mov eax,dword ptr ds:[FFDFF01Ch]
kd> !thread
THREAD 82c23bf0 Cid 0320.0474 Teb: 7ffa2000 Win32Thread: 00000000 RUNNING on processor 0
Impersonation token: e1c1f990 (Level Impersonation)
Owning Process 0 Image: <Unknown>
Attached Process 82c2dca0 Image: svchost.exe
Wait Start TickCount 6298 Ticks: 14 (0:00:00:00.218)
Context Switch Count 64 IdealProcessor: 0
UserTime 00:00:00.453
KernelTime 00:00:04.312
Win32 Start Address 0x7730a5f7
Start Address 0x7c8106f9
Stack Init f4dc1000 Current f4dc0d34 Base f4dc1000 Limit f4dbe000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 0
ChildEBP RetAddr Args to Child
f4dc0d54 805410ae 00000000 000000d1 0197fb94 nt!KeUpdateRunTime (FPO: [1,1,0])
f4dc0d54 806d2c9e 00000000 000000d1 0197fb94 nt!KeUpdateSystemTime+0x13e (FPO: [0,2] TrapFrame # f4dc0cdc)
f4dc0d54 805410ae 00000000 000000d1 0197fb94 hal!HalEndSystemInterrupt+0x4e (FPO: [2,2,0])
f4dc0d54 77306f5f 00000000 000000d1 0197fb94 nt!KeUpdateSystemTime+0x13e (FPO: [0,2] TrapFrame # f4dc0d64)
WARNING: Frame IP not in any known module. Following frames may be wrong.
0197fb94 77308dc1 0197fbdc 025c1ec0 03478e70 0x77306f5f
0197fbbc 77309b4a 0197fbdc 00000000 00000001 0x77308dc1
0197ff18 7730a711 02560008 00000000 00000000 0x77309b4a
0197ffb4 7c80b729 00000000 00000000 00000000 0x7730a711
0197ffec 00000000 7730a5f7 00000000 00000000 0x7c80b729
Question still is open how to get EIP. Seems Windbg knows how to do it but I would like to understand how. It seems a _KTRAP_FRAME is at _KTHREAD->KernelStack - 4.

You are very close, but as the current running thread was interrupted by an interrupt, the KTRAP_FRAME (saved registers from the interrupted thread) are put on the stack at that time (when nt!KeUpdateSystemTime() is called).
(Note: live Kernel Debugging on Windows XP SP3 x86).
Reload hal symbols ; see BPs and go:
0: kd>.reload /f hal
0: kd> bl
0 e 805450d0 0001 (0001) nt!KeUpdateSystemTime
1 e 806e5e54 0001 (0001) hal!HalpClockInterrupt
0: kd> g
OK,BP hit at nt!KeUpdateSystemTime:
Breakpoint 0 hit
nt!KeUpdateSystemTime:
805450d0 b90000dfff mov ecx,0FFDF0000h
Let see the stack, include FPO and trap frames:
0: kd> kv
ChildEBP RetAddr Args to Child
afb47d64 004482ef badb0d00 01bbb9c4 00000000 nt!KeUpdateSystemTime (FPO: [0,2] TrapFrame # afb47d64)
WARNING: Stack unwind information not available. Following frames may be wrong.
01f9d814 004483f1 01bb0020 01bbb9c4 000006a2 gfsvc32+0x482ef
01f9d828 004488ef 02c108c0 00081000 000003e8 gfsvc32+0x483f1
01f9d890 0044dc92 000102ee 01f9fd8c 02c108c0 gfsvc32+0x488ef
01f9feac 00437c59 000102ee 00000c90 00000000 gfsvc32+0x4dc92
01f9ffb4 7c80b729 00c9cb40 01e9fffc 00000020 gfsvc32+0x37c59
01f9ffe0 7c80b72f 00000000 00000000 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])
01f9ffe4 00000000 00000000 00000000 004a6727 kernel32!BaseThreadStart+0x3d (FPO: [Non-Fpo])
Userland thread was interrupted, trap frame is at 0xafb47d64. Let's see the thread:
0: kd> !thread
THREAD 8a3702e8 Cid 0c90.0cf8 Teb: 7ffd5000 Win32Thread: e198a360 RUNNING on processor 0
Not impersonating
DeviceMap e1f236f0
Owning Process 0 Image: <Unknown>
Attached Process 89e7fda0 Image: testk.exe
Wait Start TickCount 21252 Ticks: 2 (0:00:00:00.031)
Context Switch Count 45160 IdealProcessor: 0 LargeStack
UserTime 00:00:18.281
KernelTime 00:00:20.125
Win32 Start Address 0x004a6727
Start Address kernel32!BaseThreadStartThunk (0x7c810729)
Stack Init afb48000 Current afb479c4 Base afb48000 Limit afb44000 Call 0
Priority 13 BasePriority 13 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr Args to Child
afb47d64 004482ef badb0d00 01bbb9c4 00000000 nt!KeUpdateSystemTime (FPO: [0,2] TrapFrame # afb47d64)
WARNING: Stack unwind information not available. Following frames may be wrong.
01f9d814 004483f1 01bb0020 01bbb9c4 000006a2 gfsvc32+0x482ef
01f9d828 004488ef 02c108c0 00081000 000003e8 gfsvc32+0x483f1
01f9d890 0044dc92 000102ee 01f9fd8c 02c108c0 gfsvc32+0x488ef
01f9feac 00437c59 000102ee 00000c90 00000000 gfsvc32+0x4dc92
01f9ffb4 7c80b729 00c9cb40 01e9fffc 00000020 gfsvc32+0x37c59
01f9ffe0 7c80b72f 00000000 00000000 00000000 kernel32!BaseThreadStart+0x37 (FPO: [Non-Fpo])
01f9ffe4 00000000 00000000 00000000 004a6727 kernel32!BaseThreadStart+0x3d (FPO: [Non-Fpo])
So when the thread is interrupted, hal!HalpClockInterrupt() gets called (see !idt -a for the ISRs) and a trap frame is built. The trap frame pointer is currently in the ebp register:
0: kd> r #ebp
ebp=afb47d64
So, EBP = pointer to KTRAP_FRAME = 0xafb47d64
A trap frame is like a "context" structure as it keeps all registers from the interrupted thread. Let's see what is the offset of eip:
0: kd> dt nt!_ktrap_frame eip
+0x068 Eip : Uint4B
EIP is at offset 0x68 in the KTRAP_FRAME structure. Just apply the offset:
0: kd> dd #ebp+0x68 L1
afb47dcc 004482ef
The user-land thread was interrupted while at EIP = 0x4482ef. Let's confirm this using the '.trap' command (could have been '.trap afb47d64' rather than using #ebp):
0: kd> .trap #ebp
ErrCode = 00000000
eax=00002ba2 ebx=00c9cb40 ecx=01bb0020 edx=01bbb9c4 esi=00c9cb40 edi=01e9fffc
eip=004482ef esp=01f9d814 ebp=01f9d814 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=0030 gs=0000 efl=00000202
gfsvc32+0x482ef:
001b:004482ef eb07 jmp gfsvc32+0x482f8 (004482f8)
BTW, you can easily see how the trap frame is constructed in hal!HalpClockInterrupt() by disassembling it:
0: kd> u hal!HalpClockInterrupt L0n10
hal!HalpClockInterrupt:
806e5e54 54 push esp
806e5e55 55 push ebp
806e5e56 53 push ebx
806e5e57 56 push esi
806e5e58 57 push edi
806e5e59 83ec54 sub esp,54h
806e5e5c 8bec mov ebp,esp
806e5e5e 89442444 mov dword ptr [esp+44h],eax
806e5e62 894c2440 mov dword ptr [esp+40h],ecx
806e5e66 8954243c mov dword ptr [esp+3Ch],edx
See how offsets above correspond to the KTRAP_FRAME member offsets:
0: kd> dt nt!_ktrap_frame eax
+0x044 Eax : Uint4B
0: kd> dt nt!_ktrap_frame ecx
+0x040 Ecx : Uint4B
0: kd> dt nt!_ktrap_frame edx
+0x03c Edx : Uint4B
Hope it answers your question.
-- edit --
As my example was on Win XP SP3, you might have different function names on other Windows system.
Example on Win8.1 (x86). If you can't find the clock interrupt function name, I'd try to check the IDT first:
0: kd>idt -a
[...snip...]
6b2ac55a000000d1: 81a237c8 hal!HalpTimerClockInterrupt
6b2ac55a000000d2: 81a23aa4 hal!HalpTimerClockIpiRoutine
[...snip...]
Only two of the 256 vectors have "clock" in their names (notice that one is for IPI [Inter Processor Interrupt], the other one is the usual clock interrupt).
I'd go for hal!HalpTimerClockInterrupt, try to step into this function and see which functions are called later.
It happens you can break on nt!KiUpdateTime or nt!KiUpdateRunTime functions:
0: kd> !thread
THREAD 9d0af680 Cid 0bec.0bf0 Teb: 7f8ae000 Win32Thread: 9ce51470 RUNNING on processor 0
Not impersonating
DeviceMap a0971118
Owning Process 9d161c40 Image: calc.exe
Attached Process N/A Image: N/A
Wait Start TickCount 63249 Ticks: 3 (0:00:00:00.046)
Context Switch Count 66956 IdealProcessor: 0
UserTime 00:01:12.609
KernelTime 00:00:01.281
Win32 Start Address calc!WinMainCRTStartup (0x003db8d4)
Stack Init ac49bfe0 Current ac49be04 Base ac49c000 Limit ac499000 Call 0
Priority 10 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
ChildEBP RetAddr Args to Child
ac49bcf4 81ad2ef6 81c63c50 00000002 00000000 nt!KiUpdateRunTime (FPO: [Non-Fpo])
ac49bd40 81bdf7a7 ac49be38 ffd0fc98 00000002 nt!KiUpdateTime+0x23c (FPO: [Non-Fpo])
ac49bd90 81a134ae 81a10858 ffffffff ac49beb8 nt!KeClockInterruptNotify+0x67 (FPO: [0,15,4])
ac49bda0 81a23993 00000002 000000d1 00000000 hal!HalpTimerClockInterruptCommon+0x3e (FPO: [0,0,4])
ac49bda0 81a10858 00000002 000000d1 00000000 hal!HalpTimerClockInterrupt+0x1cb (FPO: [0,2] TrapFrame # ac49be38)
ac49beb8 81a239f3 00000000 ac49bf54 00200006 hal!HalEndSystemInterrupt+0xe8 (FPO: [Non-Fpo])
ac49beb8 0041be09 00000000 ac49bf54 00200006 hal!HalpTimerClockInterrupt+0x22b (FPO: [0,2] TrapFrame # ac49bf54)
0094c978 003c55f2 00000000 00000031 00ad55bc calc!WindowsCodecs_NULL_THUNK_DATA_DLB+0x79
0094c994 003c586b 00aded98 0094c9b8 003c599d calc!CUIController::displayEvent+0x76 (FPO: [1,1,4])
0094c9a0 003c599d 00ad5574 00adeea0 00aded98 calc!CDisplayEvent::deliver+0x1a (FPO: [Non-Fpo])
0094c9b8 003d5177 00aded98 5b5012f1 00000000 calc!CEventRegistry::fire+0x28 (FPO: [Non-Fpo])
0094c9e4 003d575a 00aded98 00adeea0 03bf38ec calc!CCalculatorState::SetBinaryDigitDisplay+0x75 (FPO: [Non-Fpo])
(side note: don't pay too much attention on the two trap frames above; it seems the first routine was interrupted as soon as it re-enabled interrupts using the STI instructios, so there are two trap frames rather than just one).

Related

rcu_sched kthread timer wakeup didn't happen for x jiffies Allwinner sun8i

I'm working on my own distribution for OrangePI R1 with Allwinner sun8i SoC. I had stripped kernel_defconfig to fit my custom linux into 16M SPI NOR. After leaving the board up for few days I see such messages on my serial console.
admin#orange-pi-r1:~# [65779.614485] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[65779.620458] rcu: 3-...!: (4 GPs behind) idle=200/0/0x0 softirq=17870/17870 fqs=0 (false positive?)
[65779.629630] (detected by 2, t=2103 jiffies, g=68925, q=83)
[65779.635224] Sending NMI from CPU 2 to CPUs 3:
[65779.639605] NMI backtrace for cpu 3
[65779.639619] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 5.15.35 #1
[65779.639636] Hardware name: Allwinner sun8i Family
[65779.639644] PC is at 0xc0106330
[65779.639651] LR is at 0xc0106340
[65779.639657] pc : [<c0106330>] lr : [<c0106340>] psr: 60000013
[65779.639669] sp : c0cadfa8 ip : 00000000 fp : c0805f90
[65779.639679] r10: c0cadfb8 r9 : 410fc075 r8 : c0805f4c
[65779.639689] r7 : c0cac000 r6 : c0cac000 r5 : 00000000 r4 : 00000000
[65779.639701] r3 : c0113ca0 r2 : 12e10204 r1 : 00000000 r0 : 12e10204
[65779.639714] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[65779.639729] Control: 10c5387d Table: 42a7c06a DAC: 00000051
[65779.639738] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G O 5.15.35 #1
[65779.639754] Hardware name: Allwinner sun8i Family
[65779.639767] Function entered at [<c010c0b8>] from [<c0108bf4>]
[65779.639779] Function entered at [<c0108bf4>] from [<c0540794>]
[65779.639791] Function entered at [<c0540794>] from [<c037e2e0>]
[65779.639803] Function entered at [<c037e2e0>] from [<c010ad00>]
[65779.639814] Function entered at [<c010ad00>] from [<c010ad50>]
[65779.639825] Function entered at [<c010ad50>] from [<c016d288>]
[65779.639837] Function entered at [<c016d288>] from [<c0167b94>]
[65779.639848] Function entered at [<c0167b94>] from [<c0168218>]
[65779.639860] Function entered at [<c0168218>] from [<c038dff0>]
[65779.639871] Function entered at [<c038dff0>] from [<c0100b7c>]
[65779.639881] Exception stack(0xc0cadf58 to 0xc0cadfa0)
[65779.639896] df40: 12e10204 00000000
[65779.639915] df60: 12e10204 c0113ca0 00000000 00000000 c0cac000 c0cac000 c0805f4c 410fc075
[65779.639934] df80: c0cadfb8 c0805f90 00000000 c0cadfa8 c0106340 c0106330 60000013 ffffffff
[65779.639946] Function entered at [<c0100b7c>] from [<c0106330>]
[65779.639957] Function entered at [<c0106330>] from [<c0546b94>]
[65779.639969] Function entered at [<c0546b94>] from [<c0148244>]
[65779.639980] Function entered at [<c0148244>] from [<c0148680>]
[65779.639991] Function entered at [<c0148680>] from [<401014d0>]
[65779.640602] rcu: rcu_sched kthread timer wakeup didn't happen for 2103 jiffies! g68925 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402
[65779.845052] rcu: Possible timer handling issue on cpu=1 timer-softirq=33908
[65779.852117] rcu: rcu_sched kthread starved for 2125 jiffies! g68925 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=1
[65779.862406] rcu: Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
[65779.871383] rcu: RCU grace-period kthread stack dump:
[65779.876447] task:rcu_sched state:I stack: 0 pid: 12 ppid: 2 flags:0x00000000
[65779.884836] Function entered at [<c0543560>] from [<c0543780>]
[65779.890688] Function entered at [<c0543780>] from [<c05462f4>]
[65779.896539] Function entered at [<c05462f4>] from [<c0176720>]
[65779.902391] Function entered at [<c0176720>] from [<c01791e4>]
[65779.908241] Function entered at [<c01791e4>] from [<c013cc18>]
[65779.914093] Function entered at [<c013cc18>] from [<c0100130>]
[65779.919942] Exception stack(0xc0c71fb0 to 0xc0c71ff8)
[65779.925008] 1fa0: 00000000 00000000 00000000 00000000
[65779.933212] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[65779.941415] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
What might be causing such output? How can I debug this?

Cannot connect UART to RB6 on PIC24FJ128GA204

Using the RPOR registers, I can successfully connect RB3 or RB15 or other pins to a UART (1-4) ... but not RB6. I don't see anything in the documentation or errata that say RB6 (RP6) is uniquely unavailable. Any guesses?
Here are my RPOR registers when I have RB3, RB6, and RC3 all connected to UART0. RB3 and RC3 operate correctly, but RB6 only operates as a digital output.
03D6 RPOR0 0x0000 0 00000000 00000000 '..'
03D8 RPOR1 0x0300 768 00000011 00000000 '..'
03DA RPOR2 0x0000 0 00000000 00000000 '..'
03DC RPOR3 0x0003 3 00000000 00000011 '..'
03DE RPOR4 0x0000 0 00000000 00000000 '..'
03E0 RPOR5 0x0000 0 00000000 00000000 '..'
03E2 RPOR6 0x0000 0 00000000 00000000 '..'
03E4 RPOR7 0x0000 0 00000000 00000000 '..'
03E6 RPOR8 0x0000 0 00000000 00000000 '..'
03E8 RPOR9 0x0300 768 00000011 00000000 '..'
03EA RPOR10 0x0000 0 00000000 00000000 '..'
03EC RPOR11 0x0700 1792 00000111 00000000 '..'
03EE RPOR12 0x0008 8 00000000 00001000 '..'
Here is how PORTB is set up:
018A TRISB 0x22A2 8866 00100010 10100010 '"¢'
018C PORTB 0x00C8 200 00000000 11001000 '.È'
018E LATB 0x0040 64 00000000 01000000 '.#'
0190 ODCB 0x0000 0 00000000 00000000 '..'
0192 ANSB 0x2000 8192 00100000 00000000 '..'
... and here are the CONFIG bits:
_CONFIG1(JTAGEN_OFF & GCP_OFF & GWRP_OFF & ICS_PGx1 & FWDTEN_ON & WINDIS_OFF & FWPSA_PR128 & WDTPS_PS1024);
_CONFIG2(IESO_ON & WDTCMX_LPRC & FNOSC_FRC & FCKSM_CSDCMD & OSCIOFCN_ON & POSCMD_NONE)
_CONFIG3(SOSCSEL_ON)
_CONFIG4(IOL1WAY_OFF & PLLDIV_DISABLED & DSWDTPS_DSWDTPS15)
I am trying to get on the Microchip fora to ask this, but their registration process is apparently down. Hoping the good folks of StackOverflow can help. Thanks!
Microchip, with infinite and God like wisdom, decided to have analog input functionality on the RB6 input but suppress almost all documentation of this and remove any mention of this in the PIC24FJ128GA204 errata.
The the data sheet has vague hints about this here:
And here:
To get what you need clear ANSB bit 6 to zero.

Debugging kernel panic error

I have a arm board on which I am running yocto with kernel 4.1.15. While I am running my python program I get following kernel error frequently but randomly
Unable to handle kernel paging request at virtual address 7f101f7c
pgd = 80004000
[7f101f7c] *pgd=8c6c4811, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
Modules linked in: wilc3000(O) at_pwr_dev(O) pn5xx_i2c [last unloaded: at_pwr_dev]
CPU: 0 PID: 1336 Comm: DebugThread Tainted: G O 4.1.15-1.2.0+g77f6154
Hardware name: Freescale i.MX6 Ultralite (Device Tree)
task: 8c73b900 ti: 8c8d6000 task.ti: 8c8d6000
PC is at 0x7f101f7c
LR is at _raw_spin_unlock_irqrestore+0x28/0x54
pc : [<7f101f7c>] lr : [<807e1238>] psr: 600f0013
sp : 8c8d7f30 ip : 00000000 fp : 00000000
r10: 7f107d30 r9 : 7f107d20 r8 : 7f107f48
r7 : 00000000 r6 : 8c57b000 r5 : 7f107f48 r4 : 8c54aa00
r3 : 00000000 r2 : 00000000 r1 : 20000013 r0 : ffffffc2
Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment kernel
Control: 10c53c7d Table: 8c52c06a DAC: 00000015
Process DebugThread (pid: 1336, stack limit = 0x8c8d6210)
Stack: (0x8c8d7f30 to 0x8c8d8000) 7f20: 8c8063a0 00000000 8c8d6000 00000000
7f40: 00000000 00000000 00000000 8c975c40 8c54aa00 7f101f28 00000000 00000000
7f60: 00000000 8004d070 00000000 00000000 7ee95a5c 8c54aa00 00000000 00000000
7f80: 8c8d7f80 8c8d7f80 00000000 00000000 8c8d7f90 8c8d7f90 8c8d7fac 8c975c40
7fa0: 8004cf94 00000000 00000000 8000f528 00000000 00000000 00000000 00000000
7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 7a9ce301 72611f00
[<807e1238>] (_raw_spin_unlock_irqrestore) from [<00000000>] ( (null))
Code: bad PC value
How can I debug this error considering the fact that I don't have access to JTAG on this board. What is the meaning of Code: bad PC value? If there any to find anything regarding problem from this log?
pc : [<7f101f7c>] lr : [<807e1238>] psr: 600f0013
In order to translate it into source code line:
arm-none-linux-gnueabi-addr2line -f -e vmlinux 7f101f7c
You must use your addr2line command.

MIPS32 router: module_init not called for kernel module

I'm developing a kernel module that I want to run on my router. The router model is DGN2200v2 by Netgear. It's running Linux 2.6.30 on MIPS. My problem is that when I load my module it seems that my module_init isn't getting called. I tried to narrow it down by modifying my module_init to return -3 (which indicates an error?) and insmod still reports success. I can see my module in the output of lsmod, but I don't see my printk output using dmesg.
For starters, I wanted to create the simplest possible module:
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
static int my_init(void)
{
printk(KERN_EMERG "init_module() called\n");
return -3;
}
static void my_cleanup(void)
{
printk(KERN_EMERG "cleanup_module() called\n");
}
module_init(my_init);
module_exit(my_cleanup);
This is the Makefile I'm using:
TOOLCHAIN=/home/user/buildroot-2016.08/output/host/usr/bin/mips-buildroot-linux-uclibc-
ARCH=mips
CC = $(TOOLCHAIN)gcc
KBUILD_CFLAGS:=.
EXTRA_CFLAGS := -I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/include\
-I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/arch/mips/include/asm/mach-mipssim\
-I/home/user/buildroot-2016.08/output/build/linux-headers-2.6.30/arch/mips/include/asm/mach-generic\
-fno-pic -mno-abicalls -O2
obj-m := module.o
KDIR := /home/user/buildroot-2016.08/output/build/linux-headers-2.6.30
PWD := $(shell pwd)
default:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
I'm running make like so:
make ARCH=mips CROSS_COMPILE=/home/user/buildroot-2016.08/output/host/usr/bin/mips-buildroot-linux-uclibc-
which passes successfully.
As you can see, I'm using Buildroot which I (hopefully) configured correctly. I can paste my .config if needed.
I ran objdump on my module and didn't find a problem. In particular, the module_init symbol seems to point to the same place as my my_init function, and it seems to have the code I expect it to:
module.ko: file format elf32-tradbigmips
module.ko
architecture: mips:isa32, flags 0x00000011:
HAS_RELOC, HAS_SYMS
start address 0x00000000
private flags = 50001001: [abi=O32] [mips32] [not 32bitmode] [noreorder]
MIPS ABI Flags Version: 0
ISA: MIPS32
GPR size: 32
CPR1 size: 0
CPR2 size: 0
FP ABI: Soft float
ISA Extension: None
ASEs:
None
FLAGS 1: 00000001
FLAGS 2: 00000000
Sections:
Idx Name Size VMA LMA File off Algn
0 .MIPS.abiflags 00000018 00000000 00000000 00000038 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_SAME_SIZE
1 .reginfo 00000018 00000000 00000000 00000050 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA, LINK_ONCE_SAME_SIZE
2 .note.gnu.build-id 00000024 00000018 00000018 00000068 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .text 00000040 00000000 00000000 00000090 2**4
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
4 .rodata.str1.4 00000038 00000000 00000000 000000d0 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .modinfo 0000005c 00000000 00000000 00000108 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .data 00000000 00000000 00000000 00000170 2**4
CONTENTS, ALLOC, LOAD, DATA
7 .gnu.linkonce.this_module 0000014c 00000000 00000000 00000170 2**2
CONTENTS, ALLOC, LOAD, RELOC, DATA, LINK_ONCE_DISCARD
8 .bss 00000000 00000000 00000000 000002c0 2**4
ALLOC
9 .comment 00000040 00000000 00000000 000002c0 2**0
CONTENTS, READONLY
10 .pdr 00000040 00000000 00000000 00000300 2**2
CONTENTS, RELOC, READONLY
11 .gnu.attributes 00000010 00000000 00000000 00000340 2**0
CONTENTS, READONLY
12 .mdebug.abi32 00000000 00000000 00000000 00000350 2**0
CONTENTS, READONLY
SYMBOL TABLE:
00000000 l d .MIPS.abiflags 00000000 .MIPS.abiflags
00000000 l d .reginfo 00000000 .reginfo
00000018 l d .note.gnu.build-id 00000000 .note.gnu.build-id
00000000 l d .text 00000000 .text
00000000 l d .rodata.str1.4 00000000 .rodata.str1.4
00000000 l d .modinfo 00000000 .modinfo
00000000 l d .data 00000000 .data
00000000 l d .gnu.linkonce.this_module 00000000 .gnu.linkonce.this_module
00000000 l d .bss 00000000 .bss
00000000 l d .comment 00000000 .comment
00000000 l d .pdr 00000000 .pdr
00000000 l d .gnu.attributes 00000000 .gnu.attributes
00000000 l d .mdebug.abi32 00000000 .mdebug.abi32
00000000 l df *ABS* 00000000 module.c
00000000 l F .text 0000002c my_init
0000002c l F .text 00000014 my_cleanup
00000000 l .rodata.str1.4 00000000 $LC0
0000001c l .rodata.str1.4 00000000 $LC1
00000000 l df *ABS* 00000000 module.mod.c
00000000 l O .modinfo 00000023 __mod_srcversion23
00000024 l O .modinfo 00000009 __module_depends
00000030 l O .modinfo 0000002c __mod_vermagic5
00000000 g O .gnu.linkonce.this_module 0000014c __this_module
0000002c g F .text 00000014 cleanup_module
00000000 g F .text 0000002c init_module
00000000 *UND* 00000000 printk
Disassembly of section .MIPS.abiflags:
00000000 <.MIPS.abiflags>:
0: 00002001 movf a0,zero,$fcc0
4: 01000003 0x1000003
...
10: 00000001 movf zero,zero,$fcc0
14: 00000000 nop
Disassembly of section .reginfo:
00000000 <.reginfo>:
0: a2000014 sb zero,20(s0)
...
14: 00007fef 0x7fef
Disassembly of section .note.gnu.build-id:
00000018 <.note.gnu.build-id>:
18: 00000004 sllv zero,zero,zero
1c: 00000014 0x14
20: 00000003 sra zero,zero,0x0
24: 474e5500 c1 0x14e5500
28: c8e5d654 lwc2 $5,-10668(a3)
2c: cb477d3d lwc2 $7,32061(k0)
30: dfa48d71 ldc3 $4,-29327(sp)
34: c2ea16da ll t2,5850(s7)
38: f6bcae7d sdc1 $f28,-20867(s5)
Disassembly of section .text:
00000000 <init_module>:
0: 27bdffe8 addiu sp,sp,-24
4: 3c040000 lui a0,0x0
4: R_MIPS_HI16 $LC0
8: 3c020000 lui v0,0x0
8: R_MIPS_HI16 printk
c: afbf0014 sw ra,20(sp)
10: 24420000 addiu v0,v0,0
10: R_MIPS_LO16 printk
14: 0040f809 jalr v0
18: 24840000 addiu a0,a0,0
18: R_MIPS_LO16 $LC0
1c: 8fbf0014 lw ra,20(sp)
20: 2402fffd li v0,-3
24: 03e00008 jr ra
28: 27bd0018 addiu sp,sp,24
modinfo output also matches what I expect (same modinfo output as for another .ko that's found on the router, except for the srcversion which my module has but the other module on the router doesn't):
filename: /home/user/module/module.ko
srcversion: B0BADBA395A121CF49B74DC
depends:
vermagic: 2.6.30 mod_unload MIPS32_R1 32BIT
It's entirely possible that I messed something up in my Buildroot configuration, or something doesn't quite match the CPU type of the router, but my init code is so minimal that I'm out of ideas as to what could be wrong.
It turns out that the problem was related to a different kernel configuration between my development environment and the router. Specifically, my kernel was using CONFIG_UNUSED_SYMBOLS whereas the router's was not.
The reason this caused a problem even in a trivial module is that when the kernel loads a module it doesn't only look up the module_init symbol in the module's symbol table. Rather, it reads the module struct from the module (from the .gnu.linkonce.this_module section), and then calls the init module through that struct.
The offset of the init function pointer inside the module struct depends on the kernel configuration, which explains why the kernel can't find the init function if the configuration is different.
Thanks to Sam Protsenko for investing a lot of time in helping me crack this!

Debugging page allocation failure on Coldfire uCLinux

I'm sometimes getting this crash output below on my Coldfire uCLinux system. How do I work out what's causing the problem?
Apr 4 10:44:33 (none) user.debug syslog: starting NTP
sh: page allocation failure. order:8, mode:0xd0
Stack from 41da5dcc:
4005b0f2 400553b6 40207431 406131f8 00000008 000000d0 00000008 00000000
000000a2 000a2000 000a2000 0000000c 40544a14 00000000 405434fc 00000077
41da5eac 00000000 00000010 00000000 41da5008 41da5000 00000000 00000100
00000000 41da5000 00000000 000200d0 4024eecc 00000080 00000000 00000000
4005de52 000000d0 00000008 4024eec8 00000000 00000001 00004d09 00079100
00000004 00003f20 00013424 41cd7000 41da5fcc 41da5f2a 00015790 00000000
Call Trace with CONFIG_FRAME_POINTER disabled:
[4005b0f2] [400553b6] [40207431] [4005de52] [40067d64]
[40093892] [4004b15e] [400390d8] [40020e70] [400677d8]
[40020e70] [401f0c92] [40068468] [4006aa4e] [40020ea0]
[4002386c]
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Active_anon:0 active_file:0 inactive_anon:0
inactive_file:4484 dirty:0 writeback:0 unstable:0
free:8806 slab:565 mapped:0 pagetables:0 bounce:0
DMA free:35216kB min:1016kB low:1268kB high:1524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:17936kB present:65024kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB 4*32kB 6*64kB 3*128kB 46*256kB 44*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 35216kB
4484 total pagecache pages
0 pages RAM
0 pages reserved
0 pages shared
0 pages non-shared
Allocation of length 663552 from process 476 (sh) failed
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Active_anon:0 active_file:0 inactive_anon:0
inactive_file:4484 dirty:0 writeback:0 unstable:0
free:8804 slab:567 mapped:0 pagetables:0 bounce:0
DMA free:35216kB min:1016kB low:1268kB high:1524kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:17936kB present:65024kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 0*4kB 0*8kB 1*16kB 4*32kB 6*64kB 3*128kB 46*256kB 44*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 35216kB
4484 total pagecache pages
Unable to allocate RAM for process text/data, errno 12
sh: page allocation failure. order:8, mode:0xd0
Stack from 41ea6dcc:
4005b0f2 400553b6 40207431 40645848 00000008 000000d0 00000008 00000000
000000a2 000a2000 000a2000 0000000c 40544a6c 00000000 405434fc 00000077
41ea6eac 00000000 00000010 00000000 41ea6008 41ea6000 00000000 00000100
00000000 41ea6000 00000000 000200d0 4024eecc 00000080 00000000 00000000
4005de52 000000d0 00000008 4024eec8 00000000 00000001 00004d09 00079100
00000004 00003f20 00013424 410ae600 41ea6fcc 41ea6f2a 00015790 00000000
Call Trace with CONFIG_FRAME_POINTER disabled:
[4005b0f2] [400553b6] [40207431] [4005de52] [40067d64]
[40093892] [4004b15e] [400390d8] [40020e70] [400677d8]
[40020e70] [401f0c92] [40068468] [4006aa4e] [40020ea0]
[400239c2] [4002386c]
Mem-Info:
Your system has run out of 1 MB free pages. With the power of two allocator, you need a free page of size 1 MB to allocate 663552 byes. This is caused by memory fragmentation. Normally, an MMU would reorganize the free space so that it appears contiguous for new allocations.
You can only take care of the problem through prevention. If the 663552 bytes are the sh binary, you will have to prevent it from being continously re-loaded into memory. This might be done by putting it into an XIP file system.
It might be a heap allocation done by the shell. In this case, you will have to change whatever processing is causing such a large malloc.
At the system level, you will also have to see which programs are large or cause large mallocs and change their behavior so that they don't cause more fragmentation.

Resources