Calling printk in a simple IRQ handler crashes the kernel - linux-kernel

I'm new to kernel programming and I couldn't find enough information to know why this happens. Basically I'm trying to replace the page fault handler in the kernel's IDT with something simple that calls the original handler in the end. I just want this function to print a notification that it is called, and calling printk() inside it always results in a kernel panic. It runs fine otherwise.
#include <asm/desc.h>
#include <linux/mm.h>
#include <asm/traps.h>
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <asm/uaccess.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/desc_defs.h>
#include <linux/moduleparam.h>
#define PAGEFAULT_INDEX 14
// Old and new IDT registers
static struct desc_ptr old_idt_reg, new_idt_reg;
static __attribute__((__used__)) unsigned long old_pagefault_pointer, new_page;
// The function that replaces the original handler
asmlinkage void isr_pagefault(void);
asm(" .text");
asm(" .type isr_pagefault,#function");
asm("isr_pagefault:");
asm(" callq print_something");
asm(" jmp *old_pagefault_pointer");
void print_something(void) {
// This printk causes the kernel to crash!
printk(KERN_ALERT "Page fault handler called\n");
return;
}
void my_idt_load(void *ptr) {
printk(KERN_ALERT "Loading on a new processor...\n");
load_idt((struct desc_ptr*)ptr);
return;
}
int module_begin(void) {
gate_desc *old_idt_addr, *new_idt_addr;
unsigned long idt_length;
store_idt(&old_idt_reg);
old_idt_addr = (gate_desc*)old_idt_reg.address;
idt_length = old_idt_reg.size;
// Get the pagefault handler pointer from the IDT's pagefault entry
old_pagefault_pointer = 0
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_high) << 32 )
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_middle) << 16 )
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_low) );
printk(KERN_ALERT "Saved pointer to old pagefault handler: %p\n", (void*)old_pagefault_pointer);
// Allocate a new page for the new IDT
new_page = __get_free_page(GFP_KERNEL);
if (!new_page)
return -1;
// Copy the original IDT to the new page
memcpy((void*)new_page, old_idt_addr, idt_length);
// Set up the new IDT
new_idt_reg.address = new_idt_addr = new_page;
new_idt_reg.size = idt_length;
pack_gate(
&new_idt_addr[PAGEFAULT_INDEX],
GATE_INTERRUPT,
(unsigned long)isr_pagefault, // The interrupt written in assembly at the start of the code
0, 0, __KERNEL_CS
);
// Load the new table
load_idt(&new_idt_reg);
smp_call_function(my_idt_load, (void*)&new_idt_reg, 1); // Call load_idt on the rest of the cores
printk(KERN_ALERT "New IDT loaded\n\n");
return 0;
}
void module_end(void) {
printk( KERN_ALERT "Exit handler called now. Reverting changes and exiting...\n\n");
load_idt(&old_idt_reg);
smp_call_function(my_idt_load, (void*)&old_idt_reg, 1);
if (new_page)
free_page(new_page);
}
module_init(module_begin);
module_exit(module_end);
Many thanks to anyone who can tell me what I'm doing wrong here.

Sorry for resurrecting a dead post, but just for posterity:
I've run into similar issues when hooking IDT entries; one possibility is insufficient stack space. In 64-bit mode, when a trap or fault handler is called, the CPU determines a new stack pointer based on both the "interrupt stack table" (IST) field – bits 32 to 34 – of the corresponding interrupt descriptor, and the processor core's Task State Segment (TSS). From Volume 3A, section 6.14.5 of the Intel Software Developer's Manual:
In IA-32e mode, a new interrupt stack table (IST) mechanism is available as an alternative to the modified legacy stack-switching mechanism described above. This mechanism unconditionally switches stacks when it is enabled. It can be enabled on an individual interrupt-vector basis using a field in the IDT entry. This means that some interrupt vectors can use the modified legacy mechanism and others can use the IST mechanism.
The IST mechanism is only available in IA-32e mode. It is part of the 64-bit mode TSS. The motivation for the IST mechanism is to provide a method for specific interrupts (such as NMI, double-fault, and machine-check) to always execute on a known good stack. In legacy mode, interrupts can use the task-switch mechanism to set up a known good stack by accessing the interrupt service routine through a task gate located in the IDT. However, the legacy task-switch mechanism is not supported in IA-32e mode.
The IST mechanism provides up to seven IST pointers in the TSS. The pointers are referenced by an interrupt-gate descriptor in the interrupt-descriptor table (IDT); see Figure 6-8. The gate descriptor contains a 3-bit IST index field that provides an offset into the IST section of the TSS. Using the IST mechanism, the processor loads the value pointed by an IST pointer into the RSP.
... If the IST index is zero, the modified legacy stack-switching mechanism described above is used.
The "modified legacy stack-switching" mechanism is described in section 6.14.2 of the same chapter, and most importantly just loads the RSP0 entry of the TSS as the new stack pointer. Here is the figure that describes the TSS:
So, to summarize, if the IST field of the interrupt descriptor is 0, then the RSP0 entry of the TSS will be loaded as the new stack pointer, and if the IST field is non-zero, then the entry of the TSS indicated by it will be loaded as the new stack pointer. In x64 linux the IST field is 0 for page faults, so rsp is switched to the RSP0 entry of the TSS whenever a page fault occurs. Unfortunately, the stack space allocated here is rather small; playing around with a kernel debugger revealed that linux allocates only 512 bytes for this stack, and my suspicion is that printk perhaps requires greater stack space.
One possible solution to this is to, in the beginning of your page fault hook, manually switch the stack pointer to the RSP1 entry of the TSS, which should contain the current kernel stack and hence have ample room for printk. This is a very hacky and inelegant solution, but in my experience it does the trick. (To find the address of the TSS you should use str to get the Task Register (tr) and then get the base address from the corresponding entry of the GDT, which is called the "TSS Descriptor". See section 7.2.3 of Volume 3A for details.)
DISCLAIMER: there is however a major caveat today not relevant at the time you asked this question; the new Kernel Page Table Isolation mitigations introduced in response to Meltdown will cause a different fatal problem in this kind of hooking. In particular, your new Interrupt Descriptor Table will not be accessible from a user-mode value for cr3, so any fault in user-land will actually cause a triple fault once you've loaded in the new IDT (first the original fault, then a page fault because the IDT address is not present in the user-mode page tables, and then a triple fault because the double fault entry of the IDT will not be accessible either). Short of manually changing all of the user-mode page tables this renders your IDT hooking approach impossible.
The only solution is to manually overwrite an area of memory that you know will be present in user-mode page tables; for example, the original IRQ handler referenced in the IDT will point to a small segment of code that is always present in user-mode page tables and whose role is to change cr3 to the kernel-mode variant. Linux does this by clearing bits 11 and 12 of cr3, so you could overwrite this area of code with a small assembly stub that clears those bits and then jumps to your hook. As a proof of concept see here.

As far as I know, the printk() requires much resource and complexity(console/file system/storage) than ftrace.
If the crash only happens in case you have used printk(), why don't you use ftrace instead of printk()?
Many of Linux Kernel experts love ftrace.

Related

How can I check whether a kernel address belongs to the Linux kernel executable, and not just the core kernel text?

Coming from the Windows world, I assume that Vmlinuz is equivalent to ntoskrnl.exe, and this is the kernel executable that gets mapped in Kernel memory.
Now I want to figure out whether an address inside kernel belongs to the kernel executable or not. Is using core_kernel_text the correct way of finding this out?
Because core_kernel_text doesn't return true for some of the addresses that clearly should belong to Linux kernel executable.
For example the core_kernel_text doesn't return true when i give it the syscall entry handler address which can be found with the following code:
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
return (void *)system_call_entry;
And when I use this code snippet, the address of the syscall entry handler doesn't belong to the core kernel text or to any kernel module (using get_module_from_addr).
So how can an address for a handler that clearly belongs to Linux kernel executable such as syscall entry, don't belong to neither the core kernel or any kernel module? Then what does it belong to?
Which API do I need to use for these type of addresses that clearly belong to Linux kernel executable to assure me that the address indeed belongs to kernel?
I need such an API because I need to write a detection for malicious kernel modules that patch such handlers, and for now I need to make sure the address belongs to kernel, and not some third party kernel module or random kernel address. (Please do not discuss methods that can be used to bypass my detection, obviously it can be bypassed but that's another story)
The target kernel version is 4.15.0-112-generic, and is Ubuntu 16.04 as a VMware guest.
Reproducible code as requested:
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
unsigned long system_call_entry;
rdmsrl(MSR_LSTAR, system_call_entry);
int isInsideCoreKernel = core_kernel_text_((unsigned long)system_call_entry);
printk("%d , 0x%pK ", isInsideCoreKernel, system_call_entry);
EDIT1: So in the MSR_LSTAR example that I gave above, it turns out that It's related to Kernel Page Table Isolation and CONFIG_RETPOLINE=y in config:
system_call value is different each time when I use rdmsrl(MSR_LSTAR, system_call)
And that's why I am getting the address 0xfffffe0000006000 aka SYSCALL64_entry_trampoline, the same as the question above.
So now the question remains, why this SYSCALL64_entry_trampoline address doesn't belong to anything? It doesn't belong to any kernel module, and it doesn't belong to the core kernel, so which executable this address belongs to and how can I check that with an API similar to core_kernel_text? It seems like it belongs to cpu_entry_area, but what is that and how can I check if an address belongs to that?
You are seeing this "weird" address in MSR_LSTAR (IA32_LSTAR) because of Kernel Page-Table Isolation (KPTI), which mitigates Meltdown. As other existing answers(1) you already found point out, the address you see is the one of a small trampoline (entry_SYSCALL_64_trampoline) that is dynamically remapped at boot time by the kernel for each CPU, and thus does not have an address within the kernel text.
(1)By the way, the answer linked above wrongly states that the corresponding config option for KPTI is CONFIG_RETPOLINE=y. This is wrong, the "retpoline" is a mitigation for Spectre, not Meltdown. The config to enable KPTI is CONFIG_PAGE_TABLE_ISOLATION=y.
You don't have many options. Either:
Tell VMWare to emulate a recent CPU that is not vulnerable to Meltdown.
Detect and implement support for the KPTI trampoline.
You can implement support for this by detecting whether the kernel supports KPTI (CONFIG_PAGE_TABLE_ISOLATION), and if so check whether current CPU has KPTI enabled. The code at kernel/cpu/bugs.c that provides information for /sys/devices/system/cpu/vulnerabilities/meltdown shows how this can be detected:
ssize_t cpu_show_meltdown(struct device *dev,
struct device_attribute *attr, char *buf)
{
if (!boot_cpu_has_bug(X86_BUG_CPU_MELTDOWN))
return sprintf(buf, "Not affected\n");
if (boot_cpu_has(X86_FEATURE_PTI))
return sprintf(buf, "Mitigation: PTI\n");
return sprintf(buf, "Vulnerable\n");
}
The actual trampoline is set up at boot and its address is stored in each CPU's "entry area" for later use (e.g. here when setting up IA32_LSTAR). This answer on Unix & Linux SE explains the purpose of the cpu entry area and its relation to KPTI.
In your module you can do the following detection:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
#include <asm/msr-index.h>
#include <asm/msr.h>
#include <asm/cpufeature.h>
#include <asm/cpu_entry_area.h>
// ...
typedef int(*core_kernel_text_t)(unsigned long addr);
core_kernel_text_t core_kernel_text_;
bool syscall_entry_64_ok(void)
{
unsigned long entry;
rdmsrl(MSR_LSTAR, entry);
if (core_kernel_text_(entry))
return true;
#ifdef CONFIG_PAGE_TABLE_ISOLATION
if (this_cpu_has(X86_FEATURE_PTI)) {
int cpu = smp_processor_id();
unsigned long trampoline = (unsigned long)get_cpu_entry_area(cpu)->entry_trampoline;
if ((entry & PAGE_MASK) == trampoline)
return true;
}
#endif
return false;
}
static int __init modinit(void)
{
core_kernel_text_ = (core_kernel_text_t)kallsyms_lookup_name("core_kernel_text");
if (!core_kernel_text_)
return -EOPNOTSUPP;
pr_info("syscall_entry_64_ok() -> %d\n", syscall_entry_64_ok());
return 0;
}

How does fork() process mark parent's PTE's as read only?

I've searched through a lot of resources, but found nothing concrete on the matter:
I know that with some linux systems, a fork() syscall works with copy-on-write; that is, the parent and the child share the same address space, but PTE is now marked read-only, to be used later of COW. when either tries to access a page, a PAGE_FAULT occur and the page is copied to another place, where it can be modified.
However, I cannot understand how the OS reaches the shared PTEs to mark them as "read". I have hypothesized that when a fork() syscall occurs, the OS preforms a "page walk" on the parent's page table and marks them as read-only - but I find no confirmation for this, or any information regarding the process.
Does anyone know how the pages come to be marked as read only? Will appreciate any help. Thanks!
Linux OS implements syscall fork with iterating over all memory ranges (mmaps, stack and heap) of parent process. Copying of that ranges (VMA - Virtual memory areas is in function copy_page_range (mn/memory.c) which has loop over page table entries:
copy_page_range will iterate over pgd and call
copy_pud_range to iterate over pud and call
copy_pmd_range to iterate over pmd and call
copy_pte_range to iterate over pte and call
copy_one_pte which does memory usage accounting (RSS) and has several code segments to handle COW case:
/*
* If it's a COW mapping, write protect it both
* in the parent and the child
*/
if (is_cow_mapping(vm_flags)) {
ptep_set_wrprotect(src_mm, addr, src_pte);
pte = pte_wrprotect(pte);
}
where is_cow_mapping will be true for private and potentially writable pages (bitfield flags is checked for shared and maywrite bits and should have only maywrite bit set)
#define VM_SHARED 0x00000008
#define VM_MAYWRITE 0x00000020
static inline bool is_cow_mapping(vm_flags_t flags)
{
return (flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
}
PUD, PMD, and PTE are described in books like https://www.kernel.org/doc/gorman/html/understand/understand006.html and in articles like LWN 2005: "Four-level page tables merged".
How fork implementation calls copy_page_range:
fork syscall implementation (sys_fork? or syscall_define0(fork)) is do_fork (kernel/fork.c) which will call
copy_process which will call many copy_* functions, including
copy_mm which calls
dup_mm to allocate and fill new mm struct, where most work is done by
dup_mmap (still kernel/fork.c) which will check what was mmaped and how. (Here I was unable to get exact path to COW implementation so I used the Internet Search Machine with something like "fork+COW+dup_mm" to get hints like [1] or [2] or [3]). After checking mmap types there is retval = copy_page_range(mm, oldmm, mpnt); line to do real work.

User space CR3 value when PTI is enabled

While executing in the kernel mode, is there any way to get the userspace CR3 value when Page Table Isolation(PTI) is enabled?
In current Linux, see arch/x86/entry/calling.h for asm .macro SWITCH_TO_USER_CR3_NOSTACK and other stuff to see how Linux flips between kernel vs. user CR3. And the earlier comment on the constants it uses:
/*
* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
* halves:
*/
#define PTI_USER_PGTABLE_BIT PAGE_SHIFT
#define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT)
#define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT
#define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT)
#define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
It looks like the kernel CR3 is always the lower one, so setting bit 12 in the current CR3 always makes it point to the user-space page directory. (If the current task has a user-space, and if PTI is enabled. These asm macros are only used in code-paths that are about to return to user-space.)
.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
...
mov %cr3, \scratch_reg
...
.Lwrcr3_\#:
/* Flip the PGD to the user version */
orq $(PTI_USER_PGTABLE_MASK), \scratch_reg
mov \scratch_reg, %cr3
These macros are used in entry_64.S, entry_64_compat.S, and entry_32.S in paths that returns to user-space.
There's presumably a cleaner way to access user-space page tables from C.
Your best bet might be to look at the page-fault handler to find out how it accesses the process's page table. (Or mmap's implementation of MAP_POPULATE).

Replace HW interrupt in flat memory mode with DOS32/A

I have a question about how to replace HW interrupt in flat memory mode...
about my application...
created by combining Watcom C and DOS32/A.
written for running on DOS mode( not on OS mode )
with DOS32/A now I can access >1M memory and allocate large memory to use...(running in flat memory mode !!!)
current issue...
I want to write an ISR(interrupt service routine) for one PCI card. Thus I need to "replace" the HW interrupt.
Ex. the PCI card's interrupt line = 0xE in DOS. That means this device will issue interrupt via 8259's IRQ 14.
But I did not how to achieve my goal to replace this interrupt in flat mode ?
# resource I found...
- in watcom C's library, there is one sample using _dos_getvect, _dos_setvect, and _chain_intr to hook INT 0x1C...
I tested this code and found OK. But when I apply it to my case: INT76 ( where IRQ 14 is "INT 0x76" <- (14-8) + 0x70 ) then nothing happened...
I checked HW interrupt is generated but my own ISR did not invoked...
Do I lose something ? or are there any functions I can use to achieve my goal ?
===============================================================
[20120809]
I tried to use DPMI calls 0x204 and 0x205 and found MyISR() is still not invoked. I described what I did as below and maybe you all can give me some suggestions !
1) Use inline assembly to implement DPMI calls 0x204 and 0x205 and test OK...
Ex. Use DPMI 0x204 to show the interrupt vectors of 16 IRQs and I get(selector:offset) following results: 8:1540(INT8),8:1544(INT9),.....,8:1560(INT70),8:1564(INT71),...,8:157C(INT77)
Ex. Use DPMI 0x205 to set the interrupt vector for IRQ14(INT76) and returned CF=0, indicating successful
2) Create my own ISR MyISR() as follows:
volatile int tick=0; // global and volatile...
void MyISR(void)
{
tick = 5; // simple code to change the value of tick...
}
3) Set new interrupt vector by DPMI call 0x205:
selector = FP_SEG(MyISR); // selector = 0x838 here
offset = FP_OFF(MyISR); // offset = 0x30100963 here
sts = DPMI_SetIntVector(0x76, selector, offset, &out_ax);
Then sts = 0(CF=0) indicating successful !
One strange thing here is:my app runs in flat memory model and I think the selector should be 0 for MyISR()... But if selector = 0 for DPMI call 0x205 then I got CF=1 and AX = 0x8022, indicating "invalid selector" !
4) Let HW interrupt be generated and the evidences are:
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
PCI device config register 0x3C/0x3D (Interrupt line) = 0xE/0x2
In DOS the interrupt mode is PIC mode(8259 mode) and Pin-based(MSIE=0)
5) Display the value of tick and found it is still "0"...
Thus I think MyISR() is not invoked correctly...
Try using DPMI Function 0204h and 0205h instead of '_dos_getvect' and '_dos_setvect', respectively.
The runtime environment of your program is DOS32A or a DPMI Server/host. So use the api they provided instead of using DOS int21h facilities. But DOS32A does intercepts int21h interrupts, so your code should work fine, as far as real mode is concerned.
Actually what you did is you install only real mode interrupt handler for IRQ14 by using '_dos_getvect' and '_dos_setvect' functions.
By using the DPMI functions instead, you install protected mode interrupt handler for IRQ14, and DOS32a will autopassup IRQ14 interrupt to this protected mode handler.
Recall: A dos extender/DPMI server can be in protected mode or real mode while an IRQ is asserted.
This is becoz your application uses some DOS or BIOS API, so extender needs to switch to real mode to execute them and the return back to protected mode to transfer control to you protected mode application.
DOS32a does this by allocating a real-mode callback (at least for hardware interrupts) which calls your protected mode handler if IRQ14 is asserted while the Extender is in real-mode.
If the extender is in protected mode, while IRQ14 is asserted, it will automatically transfer control to your IRQ14 handler.
But if you didn't install protected mode handler for your IRQ, then DOS32a, will not allocate any real-mode callback, and your real-mode irq handler may not get control.
But it should recieve control AFAIK.
Anyway give the above two functions a try. And do chain to the previous int76h interrupt handler as Sean said.
In short:
In case of DOS32a, you need not use '_dos_getvect' and '_dos_setvect' functions. Instead use the DPMI functions 0204h and 0205h for installing your protected mode IRQ handler.
An advise : In your interrupt handler the first step should be to check whether your device actually generated interrupt or it is some other device sharing this irq(IRQ14 in your case). You can do this by checking a 'interrupt pending bit' in your device, if it is set, service your device and chain to next handler. If it is not set to 1, simply chain to next handler.
EDITED:
Use the latest version of DOS32a, instead of one that comes with OW.
Update on 2012-08-14:
Yes, you can use FP_SEG and FP_OFF macros for obtaining selector and offset respectively, just like you would use these macros in real modes to get segment and offset.
You can also use MK_FP macro to create far pointers from selector and offset. eg.
MK_FP(selector, offset).
You should declare your interrupt handler with ' __interrupt ', keyword when writing handlers in C.
Here is a snippet:
#include <i86.h> /* for FP_OFF, FP_SEG, and MK_FP in OW */
/* C Prototype for your IRQ handler */
void __interrupt __far irqHandler(void);
.
.
.
irq_selector = (unsigned short)FP_SEG( &irqHandler );
irq_offset = (unsigned long)FP_OFF( &irqHandler );
__dpmi_SetVect( intNum, irq_selector, irq_offset );
.
.
.
or, try this:
extern void sendEOItoMaster(void);
# pragma aux sendEOItoMaster = \
"mov al, 0x20" \
"out 0x20, al" \
modify [eax] ;
extern void sendEOItoSlave(void);
# pragma aux sendEOItoSlave = \
"mov al, 0x20" \
"out 0xA0, al" \
modify [eax] ;
unsigned int old76_selector, new76_selector;
unsigned long old76_offset, new76_offset;
volatile int chain = 1; /* Chain to the old handler */
volatile int tick=0; // global and volatile...
void (__interrupt __far *old76Handler)(void) = NULL; // function pointer declaration
void __interrupt __far new76Handler(void) {
tick = 5; // simple code to change the value of tick...
.
.
.
if( chain ){
// disable irqs if enabled above.
_chain_intr( old76Handler ); // 'jumping' to the old handler
// ( *old76Handler )(); // 'calling' the old handler
}else{
sendEOItoMaster();
sendEOItoSlave();
}
}
__dpmi_GetVect( 0x76, &old76_selector, &old76_offset );
old76Handler = ( void (__interrupt __far *)(void) ) MK_FP (old76_selector, old76_offset)
new76_selector = (unsigned int)FP_SEG( &new76Handler );
new76_offset = (unsigned long)FP_OFF( &new76Handler );
__dpmi_SetVect( 0x76, new76_selector, new76_offset );
.
.
NOTE:
You should first double check that the IRQ# you are hooking is really assigned/mapped to the interrupt pin of your concerned PCI device. IOWs, first read 'Interrupt Line register' (NOT Interrupt Pin register) from PCI configuration space, and hook only that irq#. The valid values for this register, in your case are: 0x00 through 0x0F inclusive, with 0x00 means IRQ0 and 0x01 means IRQ1 and so on.
POST/BIOS code writes a value in 'Interrupt Line register', while booting, and you MUST NOT modify this register at any cost.(of course, unless you are dealing with interrupt routing issues which an OS writer will deal with)
You should also get and save the selector and offset of the old handler by using DPMI call 0204h, in case you are chaining to old handler. If not, don't forget to send EOI(End-of-interrupt) to BOTH master and slave PICs in case you hooked an IRQ belonging to slave PIC(ie INT 70h through 77h, including INT 0Ah), and ONLY to master PIC in case you hooked an IRQ belonging to master PIC.
In flat model, the BASE address is 0 and Limit is 0xFFFFF, with G bit(ie Granularity bit) = 1.
The base and limit(along with attribute bits(e.g G bit) of a segment) reside in the descriptor corresponding to a particular segment. The descriptor itself, sits in the descriptor table.
Descriptor tables are an array with each entry being 8bytes.
The selector is merely a pointer(or an index) to the 8-byte descriptor entry, in the Descriptor table(either GDT or LDT). So a selector CAN'T be 0.
Note that lowest 3 bits of 16-bit selector have special meaning, and only the upper 13-bits are used to index a descriptor entry from a descriptor table.
GDT = Global Descriptor Table
LDT = Local Descriptor Table
A system can have only one GDT, but many LDTs.
As entry number 0 in GDT, is reserved and can't be used. AFAIK, DOS32A, does not create any LDT for its applications, instead it simply allocate and initalize descriptor entries corresponding to the application, in GDT itself.
Selector MUST not be 0, as x86 architecture regards 0 selector as invalid, when you try to access memory with that selector; though you can successfully place 0 in any segment register, it is only when you try to access(read/write/execute) that segment, the cpu generates an exception.
In case of interrupt handlers, the base address need not be 0, even in case of flat mode.
The DPMI environment must have valid reasons for doing this so.
After all, you still need to tackle segmentation at some level in x86 architecture.
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
I think, you mean Bus master command and status registers respectively. They actually reside in either I/O space or memory space, but NOT in PCI configuration space.
So you can read/write them directly via IN/OUT or MOV, instructions.
For reading/writing, PCI configuration registers you must use configuration red/write methods or PCI BIOS routines.
NOTE:
Many PCI disk controllers, have a bit called 'Interrupt enable/disable' bit. The register
that contains this bit is usually in the PCI configuration space, and can be found from the datasheet.
Actually, this setting is for "forwarding" the interrupt generated by the device attached to the PCI controller, to the PCI bus.
If, interrupts are disabled via this bit, then even if your device(attached to PCI controller) is generating the interrupt, the interrupt will NOT be forwarded to the PCI bus(and hence cpu will never know if interrupt occurred), but the interrupt bit(This bit is different from 'Interrupt enable/disable' bit) in PCI controller is still set to notify that the device(attached to PCI controller, eg a hard disk) generated an interrupt, so that the program can read this bit and take appropriate actions. It is similar to polling, from programming perspective.
This usually apply only for non-bus master transfers.
But, it seems that you are using bus master transfers(ie DMA), so it should not apply in your case.
But anyway, I would suggest you do read the datasheet of the PCI controller carefully, especially looking for bits/registers related to interrupt handling
EDITED:
Well, as far as application level programming is concerned, you need not encounter/use _far pointers, as your program will not access anything outside to your code.
But this is not completely true, when you go to system-level programming, you need to access memory mapped device registers, external ROM, or implementing interrupt handlers, etc.
The story changes here. The creation of a segment ie allocating descriptor and getting its associated selector, ensures that even if there is a bug in code, it will not annoyingly change anything external to that particular segment from which current code is executing. If it tries to do so, cpu will generate a fault. So when accessing external devices(especially memory mapped device's registers), or accessing some rom data, eg BIOS etc., it is a good idea to have allocate a descriptor and set the base and segment limits according to the area you need to execute/read/write and proceed. But you are not bound to do so.
Some external code residing for eg in rom, assume that they will be invoked with a far call.
As I said earlier, in x86 architecture, at some level(the farther below you go) you need to deal with segmentation as there is no way to disable it completely.
But in flat model, segmentation is present as an aid to programmer, as I said above, when accessing external(wrt to your program) things. But you need not use if you don't desire to do so.
When an interrupt handler is invoked, it doesn't know the base and limits of program that was interrupted. It doesn't know the segment attributes, limits etc. of the interrupted program, we say except CS and EIP all registers are in undefined state wrt interrupt handler. So it is needed to be declared as far function to indicate that it resides somewhere external to currently executing program.
it's been a while since I fiddled with interrupts, but the table is a pointer to set where the processor should go to to process an interrupt. I can give you the process, but not code, as I only ever used 8086 code.
Pseudo code:
Initialize:
Get current vector - store value
Set vector to point to the entry point of your routine
next:
Process Interrupt:
Your code decides what to do with data
If it's your data:
process it, and return
If not:
jump to the stored vector that we got during initialize,
and let the chain of interrupts continue as they normally would
finally:
Program End:
check to see if interrupt still points to your code
if yes, set vector back to the saved value
if no, set beginning of your code to long jump to vector address you saved,
or set a flag that lets your program not process anything

Win32 DDK: Is calling API from driver interrupt wrong?

Note: This is not a problem i'm experiencing, but it is something i'd
like to understand (just because i
want to be a better person, and to
further the horizon of human
understanding).
In the bonus chapter of Raymond Chen's book,
Raymond gives the example of a bug in a sound card driver:
The original function, called at
hardware interrupt time, looks like
this in the DDK:
void FAR PASCAL midiCallback(NPPORTALLOC pPortAlloc, WORD msg,
DWORD dwParam1, DWORD dwParm2) {
if (pPostAlloc->dwCallback)
DriverCallBack(pPortalloc->dwCallback, HIWORD(pPortalloc->dwFlags),
pPortalloc->hMidi, msg, dwParam1, dwParam2);
}
Their version of the function looked
like this:
void FAR PASCAL midiCallback(NPPORTALLOC pPortAlloc, WORD msg,
DWORD dwParam1, DWORD dwParm2) {
char szBuf[80];
if (pPostAlloc->dwCallback) {
wsprintf(szBuf, " Dc(hMidi=%X,wMsg=%X)", pPortalloc->hMidi, msg);
#ifdef DEBUG
OutputDebugString(szBuf);
#endif
DriverCallBack(pPortalloc->dwCallback, HIWORD(pPortalloc->dwFlags),
pPortalloc->hMidi, msg, dwParam1, dwParam2);
}
}
Not only is there leftover debug stuff in retail code, but it is
calling a noninterrupt- safe function
at hardware interrupt time. If the
wsprintf function ever gets
discarded, the system will take a
segment-not-present fault inside a
hardware interrupt, which leads to a
pretty quick death.
Now if i'm looking at that code i wouldn't have guessed that a call to the library function wsprintf would be a problem. What happens if my driver code needs to make use of the Win32 API?
What is a segment fault? i understand the concept of a page-fault: the code i need is sitting on a page that has been swapped out to the hard-drive, and will need to get back from the hard drive before code execution can continue. What is a segment fault when we're inside a device-driver's interrupt?
Is page-faults the protected mode equivalent of a segment-fault? How does one avoid segment faults? Does Windows ever swap out device driver code? How would i stop "wsprintf from being discarded"? What would cause wsprintf to be "discarded"? What is "discarded"? What is the virtue of discarding? When it something undiscarded
Why is calling an API call from inside a driver bad, and how would one work around it?
A segmentation fault normally refers to an invalid memory access. In most modern operating systems the mechanism which generates seg-faults is also used to provide the demand paging mechanism. What they tend to do is "swap" pages of memory out to disc and mark them as invalid, the next time an instruction accesses that bit of memory the kernel recognises that it isn't really an error and will page in memory.
Windows cannot handle page-faults in certain contexts, one of them being in an interrupt. That is just the way it is designed. For example imagine you get a page fault in the code which reads memory pages data from the disk drive, how could it possible handle such an occurrance? So they define certain limitations on what modes of operation are allowed to page and what are not. If you cause a page fault in an interrupt the kernel will force a BSOD.
What you are supposed to do in an interrupt context if you need to do something which might need paging is to queue what is called a Deferred Procedure Call (DPC) in the interrupt handler. The DPC is then executed at DPC level (something you will see mentioned if you read some of the descriptions of DDK functions). DPC level can page and so you can use any function you need.
As for driver stuff, you can mark some of your code as non-pageable and you can allocate non-paged-pool which is memory you can access without causing page-faults. wsprintf could be paged out because no-one has used it and the kernel reclaims the memory.

Resources