Here's the nm dump of my program.
00000000 T __ctors_end
00000000 T __ctors_start
00000000 T __dtors_end
00000000 T __dtors_start
00000000 a __tmp_reg__
00000000 T __trampolines_end
00000000 T __trampolines_start
00000000 T setup
00000001 a __zero_reg__
0000003d a __SP_L__
0000003e a __SP_H__
0000003f a __SREG__
00000072 T __vector_15
00000086 T main
000000a8 A __data_load_end
000000a8 A __data_load_start
000000a8 T _etext
00800100 D _edata
00800100 T _end
00810000 T __eeprom_end
The architecture is AVR, and I need to get main() back up to 0x00000000 in order for the chip that I'm running this code on to execute properly. It should be as simple as a linker script, shouldn't it?
It doesn't matter where main() is in memory. Simply put a jump instruction to its address at the reset vector, or 0x0000 in application memory.
I used to program for AVR and as I know the only way to change main() entry is fuse bits. But you just can to put in the back of FLASH for bootloader. Depending on chip main starts in different places, I'm not sure but on AVR it should be something like 0x20 to 0x100.
It is because at the beginning there is RESET vector, registers and interrupt vectors.
This structure helps very much, once I had a project on which I wasn't able to use watchdog so the only way to trigger reset was overflow.
Also, I've read your comment. You don't need to put 256 bytes of 0x00 that place is for some registers (AVR registers are divided in to places one is SRAM, other FLASH) and interrupt vectors, so if you use lets say timer or UART and your code start at 0x00 so initialization of these would destroy your code.
It is designed to work, I think redesigning would spoil that. But if you really want this, you can try to add -Ttext=0x0000 this flag. This may compile it as you want but I do not recomend doing that.
Related
So basically i have decompiled unoptimized simple program and saw that it runs through gcrt1.S, and i dived in to assembly language and tried to understand what exactly it does. here is my code and my assumption of what it does
00000034 CLR R1 Clear Register
00000035 OUT 0x3F,R1 Out to I/O location
00000036 SER R28 Set Register
00000037 LDI R29,0x08 Load immediate
00000038 OUT 0x3E,R29 Out to I/O location
00000039 OUT 0x3D,R28 Out to I/O location
0000003A CALL 0x00000040 Call subroutine
0000003C JMP 0x00000050 Jump
0000003E JMP 0x00000000 Jump
Clear R1
Clear stratus register
Set R28 1111 1111
Here is where my questions start:
Load R29 from 0x08 (PORTC ?)
OUT to SPH <-R29
OUT to SPL <-R28
Call Main
The confuision that i have is why it loads byte from PORTC register, since the default would be 0x00 anyway
Microcontroller is atmega328p link to a datasheet
Load R29 from 0x08 (PORTC ?)
The instruction is LDI R29,0x08 which loads 8 into R29. LDI is "load immediate to register"; it does not read from memory, see section "31. Instruction Set Summary" in the ATmega328 manual you are using. The code is initializing the frame pointer Y from symbol __stack, see startup code in gcrt1.S.
I am at my wit's end trying to debug a hard fault on an EFR32BG12 processor. I've been following the instructions in the Silicon Labs knowledge base here:
https://www.silabs.com/community/mcu/32-bit/knowledge-base.entry.html/2014/05/26/debug_a_hardfault-78gc
I've also been using the Keil app note here to fill in some details:
http://www.keil.com/appnotes/files/apnt209.pdf
I've managed to get the hard fault to occur quite consistently in one place. When the hard fault occurs, the code from the knowledge base article gives me the following values (pushed onto the stack by the processor before calling the hard fault handler):
Name Type Value Location
~~~~ ~~~~ ~~~~~ ~~~~~~~~
cfsr uint32_t 0x20000 (Hex) 0x2000078c
hfsr uint32_t 0x40000000 (Hex) 0x20000788
mmfar uint32_t 0xe000ed34 (Hex) 0x20000784
bfar uint32_t 0xe000ed38 (Hex) 0x20000780
r0 uint32_t 0x0 (Hex) 0x2000077c
r1 uint32_t 0x8 (Hex) 0x20000778
r2 uint32_t 0x0 (Hex) 0x20000774
r3 uint32_t 0x0 (Hex) 0x20000770
r12 uint32_t 0x1 (Hex) 0x2000076c
lr uint32_t 0xab61 (Hex) 0x20000768
pc uint32_t 0x38dc8 (Hex) 0x20000764
psr uint32_t 0x0 (Hex) 0x20000760
Looking at the Keil app note, I believe a CFSR value of 0x20000 indicates a Usage Fault with the INVSTATE bit set, i.e.:
INVSTATE: Invalid state: 0 = no invalid state 1 = the processor has
attempted to execute an instruction that makes illegal use of the
Execution Program Status Register (EPSR). When this bit is set, the PC
value stacked for the exception return points to the instruction that
attempted the illegal use of the EPSR. Potential reasons: a) Loading a
branch target address to PC with LSB=0. b) Stacked PSR corrupted
during exception or interrupt handling. c) Vector table contains a
vector address with LSB=0.
The PC value pushed onto the stack by the exception (provided by the code from the knowledge base article) seems to be 0x38dc8. If I go to this address in the Simplicity Studio "Disassembly" window, I see the following:
00038db8: str r5,[r5,#0x14]
00038dba: str r0,[r7,r1]
00038dbc: str r4,[r5,#0x14]
00038dbe: ldr r4,[pc,#0x1e4] ; 0x38fa0
00038dc0: strb r1,[r4,#0x11]
00038dc2: ldr r5,[r4,#0x64]
00038dc4: ldrb r3,[r4,#0x5]
00038dc6: movs r3,r6
00038dc8: strb r1,[r4,#0x15]
00038dca: ldr r4,[r4,#0x14]
00038dcc: cmp r7,#0x6f
00038dce: cmp r6,#0x30
00038dd0: str r7,[r6,#0x14]
00038dd2: lsls r6,r6,#1
00038dd4: movs r5,r0
00038dd6: movs r0,r0
The address appears to be well past the end of my code. If I look at the same address in the "Memory" window, this is what I see:
0x00038DC8 69647561 2E302F6F 00766177 00000005 audio/0.wav.....
0x00038DD8 00000000 000F4240 00000105 00000000 ....#B..........
0x00038DE8 00000000 00000000 00000005 00000000 ................
0x00038DF8 0001C200 00000500 00001000 00000000 .Â..............
0x00038E08 00000000 F00000F0 02F00001 0003F000 ....ð..ð..ð..ð..
0x00038E18 F00004F0 06010005 01020101 01011201 ð..ð............
0x00038E28 35010121 01010D01 6C363025 2E6E6775 !..5....%06lugn.
0x00038E38 00746164 00000001 000008D0 00038400 dat.....Ð.......
Curiously, "audio/0.wav" is a static string which is part of the firmware. If I understand correctly, what I've learned here is that PC somehow gets set to this point in memory, which of course is not a valid instruction and causes the hard fault.
To debug the issue, I need to know how PC came to be set to this incorrect value. I believe the LR register should give me an idea. The LR register pushed onto the stack by the exception seems to be 0xab61. If I look at this location, I see the following in the Disassembly window:
1270 dp->sect = clst2sect(fs, clst);
0000ab58: ldr r0,[r7,#0x10]
0000ab5a: ldr r1,[r7,#0x14]
0000ab5c: bl 0x00009904
0000ab60: mov r2,r0
0000ab62: ldr r3,[r7,#0x4]
0000ab64: str r2,[r3,#0x18]
It looks to me like the problem occurs during this call specifically:
0000ab5c: bl 0x00009904
This makes me think that the problem occurs as a result of a corrupt stack, which causes clst2sect to return to an invalid part of memory rather than to 0xab60. The code for clst2sect is pretty innocuous:
/*-----------------------------------------------------------------------*/
/* Get physical sector number from cluster number */
/*-----------------------------------------------------------------------*/
DWORD clst2sect ( /* !=0:Sector number, 0:Failed (invalid cluster#) */
FATFS* fs, /* Filesystem object */
DWORD clst /* Cluster# to be converted */
)
{
clst -= 2; /* Cluster number is origin from 2 */
if (clst >= fs->n_fatent - 2) return 0; /* Is it invalid cluster number? */
return fs->database + fs->csize * clst; /* Start sector number of the cluster */
}
Does this analysis sound about right?
I suppose the problem I've run into is that I have no idea what might cause this kind of behaviour... I've tried putting breakpoints in all of my interrupt handlers, to see if one of them might be corrupting the stack, but there doesn't seem to be any pattern--sometimes, no interrupt handler is called but the problem still occurs.
In that case, though, it's hard for me to see how a program might try to execute code at a location well past the actual end of the code... I feel like a function pointer might be a likely candidate, but in that case I would expect to see the problem show up, e.g., where a function pointer is used. However, I don't see any function pointers used near where the error is occurring.
Perhaps there is more information I can extract from the debug information I've given above? The problem is quite reproducible, so if there's something I have not tried, but which you think might give some insight, I would love to hear it.
Thanks for any help you can offer!
After about a month of chasing this one, I managed to identify the cause of the problem. I hope I can give enough information here that this will be useful to someone else.
In the end, the problem was caused by passing a pointer to a non-static local variable to a state machine which changed the value at that memory location later on. Because the local variable was no longer in scope, that memory location was a random point in the stack, and changing the value there corrupted the stack.
The problem was difficult to track down for two reasons:
Depending on how the code compiled, the changed memory location could be something non-critical like another local variable, which would cause a much more subtle error. Only when I got lucky would the change affect the PC register and cause a hard fault.
Even when I found a version of the code that consistently generated a hard fault, the actual hard fault typically occurred somewhere up the call stack, when a function returned and popped the stack value into PC. This made it difficult to identify the cause of the problem--all I knew was that something was corrupting the stack before that function return.
A few tools were really helpful in identifying the cause of the problem:
Early on, I had identified a block of code where the hard fault usually occurred using GPIO pins. I would toggle a pin high before entering the block and low when exiting the block. Then I performed many tests, checking if the pin was high or low when the hard fault occurred, and used a sort of binary search to determine the smallest block of code which consistently contained all the hard faults.
The hard fault pushes a number of important registers onto the stack. These helped me confirm where the PC register was becoming corrupt, and also helped me understand that it was becoming corrupt as a result of a stack corruption.
Starting somewhere before that block of code and stepping forward while keeping an eye on local variables, I was able to identify a function call that was corrupting the stack. I could confirm this using Simplicity Studio's memory view.
Finally, stepping through the offending function in detail, I realized that the problem was occurring when I dereferenced a stored pointer and wrote to that memory location. Looking back at where that pointer value was set, I realized it had been set to point to a non-static local variable that was now out of scope.
Thanks to #SeanHoulihane and #cooperised, who helped me eliminate a few possible causes and gave me a little more confidence with the debugging tools.
On most x86-based Unix systems you can construct a "static" executable that does not load any system-provided DLL(-equivalent)s, and runs a bare minimum of instructions before terminating itself normally. For instance, this works on x86/Linux (32-bit). Technically I might not even need the second mov instruction, as IIRC the ABI guarantees all registers are cleared to zero at the program entrypoint.
$ cat > test.s
.text
.globl start
start:
movl $1,%eax # _exit
movl $0,%ebx
int $0x80
$ as -32 test.s -o test.o
$ ld -m elf_i386 -e start test.o -o test
My question is how close you can get on Windows to this bare minimum of instructions executed in user space between process creation and termination. I have heard rumors that the kernelside process creation logic will load ntdll.dll and possibly also kernel32.dll into every process whether or not the PE file references them, and that both of these have nontrivial startup code that may be unavoidable. I have also heard rumors that system call numbers are not part of the stable ABI, so you have to call through ntdll for cross-version compatibility, even if you're bypassing Win32. I would like to know to what extent these rumors are true, and to what extent their implications can be worked around.
This is an exercise in what is possible in an experiment, rather than what is a good idea in a product shipped to end-users. A concrete motivation for asking this question is that if it were possible to cut the "mandatory" system DLLs completely out of the loop then it would be straightforward to measure what proportion of process startup time is due to their self-initialization.
I'm not very experienced with low-level Windows programming, so if you can give a step-by-step recipe like the above for constructing the "minimal" executable you propose as your answer, that would be appreciated.
I might be able to answer part of your question, but I don't know (and I doubt) that you can bypass them.
I have also heard rumors that system call numbers are not part of the
stable ABI, so you have to call through ntdll for cross-version
compatibility, even if you're bypassing Win32
This is true, each major kernel version comes with newer system calls numbers.
The reason why the syscalls number are not permanent is that the syscall table is generated by name (not by number). So each time you insert a new syscall the older ones get "pushed" farther (and the other way around if a syscall gets removed, although this is quite rare).
The syscall table name (kernel side) is KiServiceTable (part of KeServiceDescriptorTable and KeServiceDescriptorTableShadow).
kd> dps nt!KeServiceDescriptorTable L4
fffff800`1236ba80 fffff800`1215f700 nt!KiServiceTable
fffff800`1236ba88 00000000`00000000
fffff800`1236ba90 00000000`000001b1
fffff800`1236ba98 fffff800`1216048c nt!KiArgumentTable
There are 0x1B1 system calls (windows 8.1) and the system calls pointers are located in the KiServiceTable.
An userland syscall stub look like this (Windows 10):
0:004> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007fff`1d913ac0 4c8bd1 mov r10,rcx ; args
00007fff`1d913ac3 b855000000 mov eax,55h ; syscall number
00007fff`1d913ac8 0f05 syscall ; x64 instruction, perform ring3 -> ring0 transition
00007fff`1d913aca c3 ret
00007fff`1d913acb 0f1f440000 nop dword ptr [rax+rax]
The same one from Windows 8.1 x64:
0:003> u ntdll!ntcreatefile
ntdll!NtCreateFile:
00007ff8`62071720 4c8bd1 mov r10,rcx
00007ff8`62071723 b854000000 mov eax,54h
00007ff8`62071728 0f05 syscall
00007ff8`6207172a c3 ret
00007ff8`6207172b 0f1f440000 nop dword ptr [rax+rax]
As you can see the same function leads to different syscall numbers (0x55 for Windows 10 and 0x54 for Windows 8.1)
Pointers in the syscall table (inside the kernel) are now "encoded" in a simple way (they were plain pointers before). Let's take a look at index 0x54:
kd> ? nt!KiServiceTable+(dwo(nt!KiServiceTable + 0x54 * 4) >> 4)
Evaluate expression: -8795786429460 = fffff800`12463bec
What symbols is at this address?
kd> ln fffff800`12463bec
Browse module
Set bu breakpoint
(fffff800`12463bec) nt!NtCreateFile | (fffff800`12463c70) nt!IopCreateFile
Exact matches:
nt!NtCreateFile (<no parameter info>)
So ntdll!ntcreatefile leads to kernel function nt!NtCreateFile (not a big surprise :)
You can find a syscall table for major Windows systems at this URL.
Actually, the leaked source from the windows XP kernel (in fact the WRK) shows how the service table is generated (in an assembly file).
I have heard rumors that the kernelside process creation logic will
load ntdll.dll and possibly also kernel32.dll into every process
whether or not the PE file references them, and that both of these
have nontrivial startup code that may be unavoidable
That's true. I'll not go through the whole process which is very complicated and discussed to great length in the Windows Internals books .
ntdll is loaded because a big part of the user-land windows loader is located there (if you have symbolic information, look at all the function starting with Ldr).
The kernel32.dll is also loaded inside process address space because part of the main thread initialization is located there. It is also needed because a part of exception handling is done there.
I could have gone with an executable that execute just a single instruction (namely RET on x86 / x64), but the result is the same with notepad.
Put a breakpoint at entry point:
0:000> bp $exentry
0:000> bl
0 e 00007ff6`275c4030 0001 (0001) 0:**** notepad!WinMainCRTStartup
0:000> g
Breakpoint 0 hit
notepad!WinMainCRTStartup:
00007ff6`275c4030 4883ec28 sub rsp,28h
Stack trace at entry:
0:000> kb
# RetAddr : Args to Child : Call Site
00 00007fff`1ce62d92 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : notepad!WinMainCRTStartup
01 00007fff`1d889f64 : 00007fff`1ce62d70 00000000`00000000 00000000`00000000 00000000`00000000 : KERNEL32!BaseThreadInitThunk+0x22
02 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : ntdll!RtlUserThreadStart+0x34
So we have ntdll!RtlUserThreadStart which calls KERNEL32!BaseThreadInitThunkwhich calls the entry point of the executable.
0:000> u KERNEL32!BaseThreadInitThunk L 10
KERNEL32!BaseThreadInitThunk:
00007fff`1ce62d70 48895c2408 mov qword ptr [rsp+8],rbx
00007fff`1ce62d75 57 push rdi
00007fff`1ce62d76 4883ec20 sub rsp,20h
00007fff`1ce62d7a 498bf8 mov rdi,r8
00007fff`1ce62d7d 488bda mov rbx,rdx
00007fff`1ce62d80 85c9 test ecx,ecx
00007fff`1ce62d82 7517 jne KERNEL32!BaseThreadInitThunk+0x2b (00007fff`1ce62d9b)
00007fff`1ce62d84 488bca mov rcx,rdx
00007fff`1ce62d87 ff15d3390600 call qword ptr [KERNEL32!_guard_check_icall_fptr (00007fff`1cec6760)]
00007fff`1ce62d8d 488bcf mov rcx,rdi
00007fff`1ce62d90 ffd3 call rbx ; call entry point
00007fff`1ce62d92 8bc8 mov ecx,eax
00007fff`1ce62d94 ff15be2f0600 call qword ptr [KERNEL32!_imp_RtlExitUserThread (00007fff`1cec5d58)]
00007fff`1ce62d9a cc int 3
As you can see, returning from the entry point calls KERNEL32!_imp_RtlExitUserThread (which calls ExitProcess() for the main thread).
The closest you can get to the initialization itself is with TLS callbacks as far as i'm aware, here is some explanation on how things work; TLS callbacks are execute before the entry point of the application and they do have some limitations (that can be worked around with some effort).
As to measure startup time you should avoid trying to do it inside your own aplication; a separated process would be best for that (A debugger could do the trick in a much more reliable way).
Regarding a minimal executable, you can build an executable with only RET (as mentioned by #Neitsa); windows will load the program on memory but will not execute anything, it will basically only map things to memory and that's all.
With FASM you can build an exe that does literally nothing, like the following:
include '%fasm%\win32ax.inc'
section 'a' code readable executable
start:
retn
.end start
stwu r1, -32(r1) // 32 bytes of space for this function
mflr r0
stw r0, 36(r1) //stores link register
stw r30, 24(r1) // ??
stw r31, 28(r1) // Probably makes space for r31?
mr r31, r1 // r31 = stack pointer
This is the beginning of this function, in code above it stores r30 somewhere in the memory, and every function begins this way. But neither r31 nor r30 hold any value in the registers. What sense to store it?
In the PowerPC ELF ABI, registers r14-r31 are defined as non-volatile - they must be preseved across a function call. So, if a function can overwrite the contents of any of these registers, it must save their values in the function prologue, and restore them before returning to the caller.
So, even though your disassembled function hasn't used r30 and r31 yet, it needs to save them on the stack, so it doesn't corrupt the calling-function's nonvolatile state. You'll probably see usage of r30 and r31 later in the function, and the restore (from those same locations on the stack) before the function returns.
I'm assuming that your program conforms to the Power ELF ABI, as that's what defines how your registers are used.
For more information, the Power ELF ABI is at http://openpowerfoundation.org/technical/technical-resources/technical-specifications/ , or https://www.power.org/technology-introduction/standards-specifications/ for the 32-bit versions.
I'm writing a framebuffer for an SPI LCD display on ARM. Before I complete that, I've written a memory only driver and trialled it under Ubuntu (Intel, Virtualbox). The driver works fine - I've allocated a block of memory using kmalloc, page aligned it (it's page aligned anyway actually), and used the framebuffer system to create a /dev/fb1. I have my own mmap function if that's relevant (deferred_io ignores it and uses its own by the look of it).
I have set:
info->screen_base = (u8 __iomem *)kmemptr;
info->fix.smem_len = kmem_size;
When I open /dev/fb1 with a test program and mmap it, it works correctly. I can see what is happening x11vnc to "share" the fb1 out:
x11vnc -rawfb map:/dev/fb1#320x240x16
And view with a vnc viewer:
gvncviewer strontium:0
I've made sure I've no overflows by writing to the entire mmapped buffer and that seems to be fine.
The problem arises when I add in deferred_io. As a test of it, I have a delay of 1 second and the called deferred_io function does nothing except a pr_devel() print. I followed the docs.
Now, the test program opens /dev/fb1 fine, mmap returns ok but as soon as I write to that pointer, I get a kernel panic. The following dump is from the ARM machine actually but it panics on the Ubuntu VM as well:
root#duovero:~/testdrv# ./fbtest1 /dev/fb1
Device opened: /dev/fb3
Screen is: 320 x 240, 16 bpp
Screen size = 153600 bytes
mmap on device succeeded
Unable to handle kernel paging request at virtual address bf81e020
pgd = edbec000
[bf81e020] *pgd=00000000
Internal error: Oops: 5 [#1] SMP ARM
Modules linked in: hhlcd28a(O) sysimgblt sysfillrect syscopyarea fb_sys_fops bnep ipv6 mwifiex_sdio mwifiex btmrvl_sdio firmware_class btmrvl cfg80211 bluetooth rfkill
CPU: 0 Tainted: G O (3.6.0-hh04 #1)
PC is at fb_deferred_io_fault+0x34/0xb0
LR is at fb_deferred_io_fault+0x2c/0xb0
pc : [<c0271b7c>] lr : [<c0271b74>] psr: a0000113
sp : edbdfdb8 ip : 00000000 fp : edbeedb8
r10: edbeedb8 r9 : 00000029 r8 : edbeedb8
r7 : 00000029 r6 : bf81e020 r5 : eda99128 r4 : edbdfdd8
r3 : c081e000 r2 : f0000000 r1 : 00001000 r0 : bf81e020
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user
Control: 10c5387d Table: adbec04a DAC: 00000015
Process fbtest1 (pid: 485, stack limit = 0xedbde2f8)
Stack: (0xedbdfdb8 to 0xedbe0000)
[snipped out hexdump]
[<c0271b7c>] (fb_deferred_io_fault+0x34/0xb0) from [<c00db0c4>] (__do_fault+0xbc/0x470)
[<c00db0c4>] (__do_fault+0xbc/0x470) from [<c00dde0c>] (handle_pte_fault+0x2c4/0x790)
[<c00dde0c>] (handle_pte_fault+0x2c4/0x790) from [<c00de398>] (handle_mm_fault+0xc0/0xd4)
[<c00de398>] (handle_mm_fault+0xc0/0xd4) from [<c049a038>] (do_page_fault+0x140/0x37c)
[<c049a038>] (do_page_fault+0x140/0x37c) from [<c0008348>] (do_DataAbort+0x34/0x98)
[<c0008348>] (do_DataAbort+0x34/0x98) from [<c0498af4>] (__dabt_usr+0x34/0x40)
Exception stack(0xedbdffb0 to 0xedbdfff8)
ffa0: 00000280 0000ffff b6f5c900 00000000
ffc0: 00000003 00000000 00025800 b6f5c900 bea6dc1c 00011048 00000032 b6f5b000
ffe0: 00006450 bea6db70 00000000 000085d6 40000030 ffffffff
Code: 28bd8070 ebffff37 e2506000 0a00001b (e5963000)
---[ end trace 7e5ca57bebd433f5 ]---
Segmentation fault
root#duovero:~/testdrv#
I'm totally stumped - other drivers look more or less the same as mine but I assume they work. Most use vmalloc actually - is there a difference between kmalloc and vmalloc for this purpose?
Confirmed the fix so I'll answer my own question:
deferred_io changes the info mmap to its own that sets up fault handlers for writes to the video memory pages. In the fault handler it
checks bounds against info->fix.smem_len, so you must set that
gets the page that was written to.
For the latter case, it treats vmalloc differently from kmalloc (by checking info->screen_base to see if it's vmalloced). If you have vmalloced, it uses screen_base as the virtual address. If you have not used vmalloc, it assumes that the address of interest is the physical address in info->fix.smem_start.
So, to use deferred_io correctly
set screen_base (char __iomem *) and point that to the virtual address.
set info->fix.smem_len to the video buffer size
if you are not using vmalloc, you must set info->fix.smem_start to the video buffer's physical address by using virt_to_phys(vid_buffer);
Confirmed on Ubuntu as fixing the issue.
Really interesting, I'm currently implementing SPI-based display FB driver too (Sharp Memory LCD display and my VFDHack32 host driver). I also facing similar problem where it crashes at deferred_io. Can you share you source code ? mine is at my GitHub repo. P.S. that Memory LCD display is monochrome so I just pretend to be color display and just check whether the pixel byte is empty (dot off) or not empty (dot on).