User space CR3 value when PTI is enabled - linux-kernel

While executing in the kernel mode, is there any way to get the userspace CR3 value when Page Table Isolation(PTI) is enabled?

In current Linux, see arch/x86/entry/calling.h for asm .macro SWITCH_TO_USER_CR3_NOSTACK and other stuff to see how Linux flips between kernel vs. user CR3. And the earlier comment on the constants it uses:
/*
* PAGE_TABLE_ISOLATION PGDs are 8k. Flip bit 12 to switch between the two
* halves:
*/
#define PTI_USER_PGTABLE_BIT PAGE_SHIFT
#define PTI_USER_PGTABLE_MASK (1 << PTI_USER_PGTABLE_BIT)
#define PTI_USER_PCID_BIT X86_CR3_PTI_PCID_USER_BIT
#define PTI_USER_PCID_MASK (1 << PTI_USER_PCID_BIT)
#define PTI_USER_PGTABLE_AND_PCID_MASK (PTI_USER_PCID_MASK | PTI_USER_PGTABLE_MASK)
It looks like the kernel CR3 is always the lower one, so setting bit 12 in the current CR3 always makes it point to the user-space page directory. (If the current task has a user-space, and if PTI is enabled. These asm macros are only used in code-paths that are about to return to user-space.)
.macro SWITCH_TO_USER_CR3_NOSTACK scratch_reg:req scratch_reg2:req
...
mov %cr3, \scratch_reg
...
.Lwrcr3_\#:
/* Flip the PGD to the user version */
orq $(PTI_USER_PGTABLE_MASK), \scratch_reg
mov \scratch_reg, %cr3
These macros are used in entry_64.S, entry_64_compat.S, and entry_32.S in paths that returns to user-space.
There's presumably a cleaner way to access user-space page tables from C.
Your best bet might be to look at the page-fault handler to find out how it accesses the process's page table. (Or mmap's implementation of MAP_POPULATE).

Related

How does Windows 10 task manager detect a virtual machine?

The Windows 10 task manager (taskmgr.exe) knows if it is running on a physical or virtual machine.
If you look in the Performance tab you'll notice that the number of processors label either reads Logical processors: or Virtual processors:.
In addition, if running inside a virtual machine, there is also the label Virtual machine: Yes.
See the following two screen shots:
My question is if there is a documented API call taskmgr is using to make this kind of detection?
I had a very short look at the disassembly and it seems that the detection code is somehow related to GetLogicalProcessorInformationEx and/or IsProcessorFeaturePresent and/or NtQuerySystemInformation.
However, I don't see how (at least not without spending some more hours of analyzing the assembly code).
And: This question is IMO not related to other existing questions like How can I detect if my program is running inside a virtual machine? since I did not see any code trying to compare smbios table strings or cpu vendor strings with existing known strings typical for hypervisors ("qemu", "virtualbox", "vmware"). I'm not ruling out that a lower level API implementation does that but I don't see this kind of code in taskmgr.exe.
Update: I can also rule out that taskmgr.exe is using the CPUID instruction (with EAX=1 and checking the hypervisor bit 31 in ECX) to detect a matrix.
Update: A closer look at the disassembly showed that there is indeed a check for bit 31, just not done that obviously.
I'll answer this question myself below.
I've analyzed the x64 taskmgr.exe from Windows 10 1803 (OS Build 17134.165) by tracing back the writes to the memory location that is consulted at the point where the Virtual machine: Yes label is set.
Responsible for that variable's value is the return code of the function WdcMemoryMonitor::CheckVirtualStatus
Here is the disassembly of the first use of the cpuid instruction in this function:
lea eax, [rdi+1] // results in eax set to 1
cpuid
mov dword ptr [rbp+var_2C], ebx // save CPUID feature bits for later use
test ecx, ecx
jns short loc_7FF61E3892DA // negative value check equals check for bit 31
...
return 1
loc_7FF61E3892DA:
// different feature detection code if hypervisor bit is not set
So taskmgr is not using any hardware strings, mac addresses or some other sophisticated technologies but simply checks if the hypervisor bit (CPUID leaf 0x01 ECX bit 31)) is set.
The result is bogus of course since e.g. adding -hypervisor to qemu's cpu parameter disables the hypervisor cpuid flag which results in task manager not showing Virtual machine: yes anymore.
And finally here is some example code (tested on Windows and Linux) that perfectly mimics Windows task manager's test:
#include <stdio.h>
#ifdef _WIN32
#include <intrin.h>
#else
#include <cpuid.h>
#endif
int isHypervisor(void)
{
#ifdef _WIN32
int cpuinfo[4];
__cpuid(cpuinfo, 1);
if (cpuinfo[2] >> 31 & 1)
return 1;
#else
unsigned int eax, ebx, ecx, edx;
__get_cpuid (1, &eax, &ebx, &ecx, &edx);
if (ecx >> 31 & 1)
return 1;
#endif
return 0;
}
int main(int argc, char **argv)
{
if (isHypervisor())
printf("Virtual machine: yes\n");
else
printf("Virtual machine: no\n"); /* actually "maybe */
return 0;
}

Calling printk in a simple IRQ handler crashes the kernel

I'm new to kernel programming and I couldn't find enough information to know why this happens. Basically I'm trying to replace the page fault handler in the kernel's IDT with something simple that calls the original handler in the end. I just want this function to print a notification that it is called, and calling printk() inside it always results in a kernel panic. It runs fine otherwise.
#include <asm/desc.h>
#include <linux/mm.h>
#include <asm/traps.h>
#include <linux/types.h>
#include <linux/errno.h>
#include <linux/sched.h>
#include <asm/uaccess.h>
#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/desc_defs.h>
#include <linux/moduleparam.h>
#define PAGEFAULT_INDEX 14
// Old and new IDT registers
static struct desc_ptr old_idt_reg, new_idt_reg;
static __attribute__((__used__)) unsigned long old_pagefault_pointer, new_page;
// The function that replaces the original handler
asmlinkage void isr_pagefault(void);
asm(" .text");
asm(" .type isr_pagefault,#function");
asm("isr_pagefault:");
asm(" callq print_something");
asm(" jmp *old_pagefault_pointer");
void print_something(void) {
// This printk causes the kernel to crash!
printk(KERN_ALERT "Page fault handler called\n");
return;
}
void my_idt_load(void *ptr) {
printk(KERN_ALERT "Loading on a new processor...\n");
load_idt((struct desc_ptr*)ptr);
return;
}
int module_begin(void) {
gate_desc *old_idt_addr, *new_idt_addr;
unsigned long idt_length;
store_idt(&old_idt_reg);
old_idt_addr = (gate_desc*)old_idt_reg.address;
idt_length = old_idt_reg.size;
// Get the pagefault handler pointer from the IDT's pagefault entry
old_pagefault_pointer = 0
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_high) << 32 )
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_middle) << 16 )
| ((unsigned long)(old_idt_addr[PAGEFAULT_INDEX].offset_low) );
printk(KERN_ALERT "Saved pointer to old pagefault handler: %p\n", (void*)old_pagefault_pointer);
// Allocate a new page for the new IDT
new_page = __get_free_page(GFP_KERNEL);
if (!new_page)
return -1;
// Copy the original IDT to the new page
memcpy((void*)new_page, old_idt_addr, idt_length);
// Set up the new IDT
new_idt_reg.address = new_idt_addr = new_page;
new_idt_reg.size = idt_length;
pack_gate(
&new_idt_addr[PAGEFAULT_INDEX],
GATE_INTERRUPT,
(unsigned long)isr_pagefault, // The interrupt written in assembly at the start of the code
0, 0, __KERNEL_CS
);
// Load the new table
load_idt(&new_idt_reg);
smp_call_function(my_idt_load, (void*)&new_idt_reg, 1); // Call load_idt on the rest of the cores
printk(KERN_ALERT "New IDT loaded\n\n");
return 0;
}
void module_end(void) {
printk( KERN_ALERT "Exit handler called now. Reverting changes and exiting...\n\n");
load_idt(&old_idt_reg);
smp_call_function(my_idt_load, (void*)&old_idt_reg, 1);
if (new_page)
free_page(new_page);
}
module_init(module_begin);
module_exit(module_end);
Many thanks to anyone who can tell me what I'm doing wrong here.
Sorry for resurrecting a dead post, but just for posterity:
I've run into similar issues when hooking IDT entries; one possibility is insufficient stack space. In 64-bit mode, when a trap or fault handler is called, the CPU determines a new stack pointer based on both the "interrupt stack table" (IST) field – bits 32 to 34 – of the corresponding interrupt descriptor, and the processor core's Task State Segment (TSS). From Volume 3A, section 6.14.5 of the Intel Software Developer's Manual:
In IA-32e mode, a new interrupt stack table (IST) mechanism is available as an alternative to the modified legacy stack-switching mechanism described above. This mechanism unconditionally switches stacks when it is enabled. It can be enabled on an individual interrupt-vector basis using a field in the IDT entry. This means that some interrupt vectors can use the modified legacy mechanism and others can use the IST mechanism.
The IST mechanism is only available in IA-32e mode. It is part of the 64-bit mode TSS. The motivation for the IST mechanism is to provide a method for specific interrupts (such as NMI, double-fault, and machine-check) to always execute on a known good stack. In legacy mode, interrupts can use the task-switch mechanism to set up a known good stack by accessing the interrupt service routine through a task gate located in the IDT. However, the legacy task-switch mechanism is not supported in IA-32e mode.
The IST mechanism provides up to seven IST pointers in the TSS. The pointers are referenced by an interrupt-gate descriptor in the interrupt-descriptor table (IDT); see Figure 6-8. The gate descriptor contains a 3-bit IST index field that provides an offset into the IST section of the TSS. Using the IST mechanism, the processor loads the value pointed by an IST pointer into the RSP.
... If the IST index is zero, the modified legacy stack-switching mechanism described above is used.
The "modified legacy stack-switching" mechanism is described in section 6.14.2 of the same chapter, and most importantly just loads the RSP0 entry of the TSS as the new stack pointer. Here is the figure that describes the TSS:
So, to summarize, if the IST field of the interrupt descriptor is 0, then the RSP0 entry of the TSS will be loaded as the new stack pointer, and if the IST field is non-zero, then the entry of the TSS indicated by it will be loaded as the new stack pointer. In x64 linux the IST field is 0 for page faults, so rsp is switched to the RSP0 entry of the TSS whenever a page fault occurs. Unfortunately, the stack space allocated here is rather small; playing around with a kernel debugger revealed that linux allocates only 512 bytes for this stack, and my suspicion is that printk perhaps requires greater stack space.
One possible solution to this is to, in the beginning of your page fault hook, manually switch the stack pointer to the RSP1 entry of the TSS, which should contain the current kernel stack and hence have ample room for printk. This is a very hacky and inelegant solution, but in my experience it does the trick. (To find the address of the TSS you should use str to get the Task Register (tr) and then get the base address from the corresponding entry of the GDT, which is called the "TSS Descriptor". See section 7.2.3 of Volume 3A for details.)
DISCLAIMER: there is however a major caveat today not relevant at the time you asked this question; the new Kernel Page Table Isolation mitigations introduced in response to Meltdown will cause a different fatal problem in this kind of hooking. In particular, your new Interrupt Descriptor Table will not be accessible from a user-mode value for cr3, so any fault in user-land will actually cause a triple fault once you've loaded in the new IDT (first the original fault, then a page fault because the IDT address is not present in the user-mode page tables, and then a triple fault because the double fault entry of the IDT will not be accessible either). Short of manually changing all of the user-mode page tables this renders your IDT hooking approach impossible.
The only solution is to manually overwrite an area of memory that you know will be present in user-mode page tables; for example, the original IRQ handler referenced in the IDT will point to a small segment of code that is always present in user-mode page tables and whose role is to change cr3 to the kernel-mode variant. Linux does this by clearing bits 11 and 12 of cr3, so you could overwrite this area of code with a small assembly stub that clears those bits and then jumps to your hook. As a proof of concept see here.
As far as I know, the printk() requires much resource and complexity(console/file system/storage) than ftrace.
If the crash only happens in case you have used printk(), why don't you use ftrace instead of printk()?
Many of Linux Kernel experts love ftrace.

Interrupt vector and irq mapping in do_IRQ

I'm working on a x86 system with Linux 3.6.0. For some experiments, I need to know how the IRQ is mapped to the vector. I learn from many book saying that for vector 0x0 to 0x20 is for traps and exceptions, and from vector 0x20 afterward is for the external device interrupts. And this also defined in the source code Linux/arch/x86/include/asm/irq_vectors.h
However, what I'm puzzled is that when I check the do_IRQ function,
http://lxr.linux.no/linux+v3.6/arch/x86/kernel/irq.c#L181
I found the IRQ is fetched by looking up the "vector_irq" array:
unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
unsigned irq;
...
irq = __this_cpu_read(vector_irq[vector]); // get the IRQ from the vector_irq
// print out the vector_irq
prink("CPU-ID:%d, vector: 0x%x - irq: %d", smp_processor_id(), vector, irq);
}
By instrumenting the code with printk, the vector-irq mapping I got is like below and I don't have any clue why this is the mapping. I though the mapping should be (irq + 0x20 = vector), but it seems not the case.
from: Linux/arch/x86/include/asm/irq_vector.h
* Vectors 0 ... 31 : system traps and exceptions - hardcoded events
* Vectors 32 ... 127 : device interrupts = 0x20 – 0x7F
But my output is:
CPU-ID=0.Vector=0x56 (irq=58)
CPU-ID=0.Vector=0x66 (irq=59)
CPU-ID=0.Vector=0x76 (irq=60)
CPU-ID=0.Vector=0x86 (irq=61)
CPU-ID=0.Vector=0x96 (irq=62)
CPU-ID=0.Vector=0xa6 (irq=63)
CPU-ID=0.Vector=0xb6 (irq=64)
BTW, these irqs are my 10GB ethernet cards with MSIX enabled. Could anyone give me some ideas about why this is the mapping? and what's the rules for making this mapping?
Thanks.
William
The irq number (which is what you use in software) is not the same as the vector number (which is what the interrupt controller actually uses).
The x86 I/OAPIC interrupt controller assigns interrupt priorities in groups of 16, so the vector numbers are spaced out to prevent them from interfering with each other
(see the function __assign_irq_vector in arch/x86/kernel/apic/io_apic.c).
I guess my question is how the vectors are assigned for a particular
IRQ number and what's are the rules behind.
The IOAPIC supports a register called IOREDTBL for each IRQ input. Software assigns the desired vector number for the IRQ input using bit 7-0 of this register. It is this vector number that serves as an index into the processors Interrupt Descriptor Table. Quoting the IOAPIC manual (82093AA)
7:0 Interrupt Vector (INTVEC)—R/W: The vector field is an 8 bit field
containing the interrupt vector for this interrupt. Vector values
range from 10h to FEh.
Note that these registers are not directly accessible to software. To access IOAPIC registers (not to be confused with Local APIC registers) software must use the IOREGSEL and IOWIN registers to indirectly interact with the IOAPIC. These registers are also described in the IOAPIC manual.
The source information for the IOAPIC can be a little tricky to dig up. Here's a link to the example I used:
IOAPIC data sheet link

Replace HW interrupt in flat memory mode with DOS32/A

I have a question about how to replace HW interrupt in flat memory mode...
about my application...
created by combining Watcom C and DOS32/A.
written for running on DOS mode( not on OS mode )
with DOS32/A now I can access >1M memory and allocate large memory to use...(running in flat memory mode !!!)
current issue...
I want to write an ISR(interrupt service routine) for one PCI card. Thus I need to "replace" the HW interrupt.
Ex. the PCI card's interrupt line = 0xE in DOS. That means this device will issue interrupt via 8259's IRQ 14.
But I did not how to achieve my goal to replace this interrupt in flat mode ?
# resource I found...
- in watcom C's library, there is one sample using _dos_getvect, _dos_setvect, and _chain_intr to hook INT 0x1C...
I tested this code and found OK. But when I apply it to my case: INT76 ( where IRQ 14 is "INT 0x76" <- (14-8) + 0x70 ) then nothing happened...
I checked HW interrupt is generated but my own ISR did not invoked...
Do I lose something ? or are there any functions I can use to achieve my goal ?
===============================================================
[20120809]
I tried to use DPMI calls 0x204 and 0x205 and found MyISR() is still not invoked. I described what I did as below and maybe you all can give me some suggestions !
1) Use inline assembly to implement DPMI calls 0x204 and 0x205 and test OK...
Ex. Use DPMI 0x204 to show the interrupt vectors of 16 IRQs and I get(selector:offset) following results: 8:1540(INT8),8:1544(INT9),.....,8:1560(INT70),8:1564(INT71),...,8:157C(INT77)
Ex. Use DPMI 0x205 to set the interrupt vector for IRQ14(INT76) and returned CF=0, indicating successful
2) Create my own ISR MyISR() as follows:
volatile int tick=0; // global and volatile...
void MyISR(void)
{
tick = 5; // simple code to change the value of tick...
}
3) Set new interrupt vector by DPMI call 0x205:
selector = FP_SEG(MyISR); // selector = 0x838 here
offset = FP_OFF(MyISR); // offset = 0x30100963 here
sts = DPMI_SetIntVector(0x76, selector, offset, &out_ax);
Then sts = 0(CF=0) indicating successful !
One strange thing here is:my app runs in flat memory model and I think the selector should be 0 for MyISR()... But if selector = 0 for DPMI call 0x205 then I got CF=1 and AX = 0x8022, indicating "invalid selector" !
4) Let HW interrupt be generated and the evidences are:
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
PCI device config register 0x3C/0x3D (Interrupt line) = 0xE/0x2
In DOS the interrupt mode is PIC mode(8259 mode) and Pin-based(MSIE=0)
5) Display the value of tick and found it is still "0"...
Thus I think MyISR() is not invoked correctly...
Try using DPMI Function 0204h and 0205h instead of '_dos_getvect' and '_dos_setvect', respectively.
The runtime environment of your program is DOS32A or a DPMI Server/host. So use the api they provided instead of using DOS int21h facilities. But DOS32A does intercepts int21h interrupts, so your code should work fine, as far as real mode is concerned.
Actually what you did is you install only real mode interrupt handler for IRQ14 by using '_dos_getvect' and '_dos_setvect' functions.
By using the DPMI functions instead, you install protected mode interrupt handler for IRQ14, and DOS32a will autopassup IRQ14 interrupt to this protected mode handler.
Recall: A dos extender/DPMI server can be in protected mode or real mode while an IRQ is asserted.
This is becoz your application uses some DOS or BIOS API, so extender needs to switch to real mode to execute them and the return back to protected mode to transfer control to you protected mode application.
DOS32a does this by allocating a real-mode callback (at least for hardware interrupts) which calls your protected mode handler if IRQ14 is asserted while the Extender is in real-mode.
If the extender is in protected mode, while IRQ14 is asserted, it will automatically transfer control to your IRQ14 handler.
But if you didn't install protected mode handler for your IRQ, then DOS32a, will not allocate any real-mode callback, and your real-mode irq handler may not get control.
But it should recieve control AFAIK.
Anyway give the above two functions a try. And do chain to the previous int76h interrupt handler as Sean said.
In short:
In case of DOS32a, you need not use '_dos_getvect' and '_dos_setvect' functions. Instead use the DPMI functions 0204h and 0205h for installing your protected mode IRQ handler.
An advise : In your interrupt handler the first step should be to check whether your device actually generated interrupt or it is some other device sharing this irq(IRQ14 in your case). You can do this by checking a 'interrupt pending bit' in your device, if it is set, service your device and chain to next handler. If it is not set to 1, simply chain to next handler.
EDITED:
Use the latest version of DOS32a, instead of one that comes with OW.
Update on 2012-08-14:
Yes, you can use FP_SEG and FP_OFF macros for obtaining selector and offset respectively, just like you would use these macros in real modes to get segment and offset.
You can also use MK_FP macro to create far pointers from selector and offset. eg.
MK_FP(selector, offset).
You should declare your interrupt handler with ' __interrupt ', keyword when writing handlers in C.
Here is a snippet:
#include <i86.h> /* for FP_OFF, FP_SEG, and MK_FP in OW */
/* C Prototype for your IRQ handler */
void __interrupt __far irqHandler(void);
.
.
.
irq_selector = (unsigned short)FP_SEG( &irqHandler );
irq_offset = (unsigned long)FP_OFF( &irqHandler );
__dpmi_SetVect( intNum, irq_selector, irq_offset );
.
.
.
or, try this:
extern void sendEOItoMaster(void);
# pragma aux sendEOItoMaster = \
"mov al, 0x20" \
"out 0x20, al" \
modify [eax] ;
extern void sendEOItoSlave(void);
# pragma aux sendEOItoSlave = \
"mov al, 0x20" \
"out 0xA0, al" \
modify [eax] ;
unsigned int old76_selector, new76_selector;
unsigned long old76_offset, new76_offset;
volatile int chain = 1; /* Chain to the old handler */
volatile int tick=0; // global and volatile...
void (__interrupt __far *old76Handler)(void) = NULL; // function pointer declaration
void __interrupt __far new76Handler(void) {
tick = 5; // simple code to change the value of tick...
.
.
.
if( chain ){
// disable irqs if enabled above.
_chain_intr( old76Handler ); // 'jumping' to the old handler
// ( *old76Handler )(); // 'calling' the old handler
}else{
sendEOItoMaster();
sendEOItoSlave();
}
}
__dpmi_GetVect( 0x76, &old76_selector, &old76_offset );
old76Handler = ( void (__interrupt __far *)(void) ) MK_FP (old76_selector, old76_offset)
new76_selector = (unsigned int)FP_SEG( &new76Handler );
new76_offset = (unsigned long)FP_OFF( &new76Handler );
__dpmi_SetVect( 0x76, new76_selector, new76_offset );
.
.
NOTE:
You should first double check that the IRQ# you are hooking is really assigned/mapped to the interrupt pin of your concerned PCI device. IOWs, first read 'Interrupt Line register' (NOT Interrupt Pin register) from PCI configuration space, and hook only that irq#. The valid values for this register, in your case are: 0x00 through 0x0F inclusive, with 0x00 means IRQ0 and 0x01 means IRQ1 and so on.
POST/BIOS code writes a value in 'Interrupt Line register', while booting, and you MUST NOT modify this register at any cost.(of course, unless you are dealing with interrupt routing issues which an OS writer will deal with)
You should also get and save the selector and offset of the old handler by using DPMI call 0204h, in case you are chaining to old handler. If not, don't forget to send EOI(End-of-interrupt) to BOTH master and slave PICs in case you hooked an IRQ belonging to slave PIC(ie INT 70h through 77h, including INT 0Ah), and ONLY to master PIC in case you hooked an IRQ belonging to master PIC.
In flat model, the BASE address is 0 and Limit is 0xFFFFF, with G bit(ie Granularity bit) = 1.
The base and limit(along with attribute bits(e.g G bit) of a segment) reside in the descriptor corresponding to a particular segment. The descriptor itself, sits in the descriptor table.
Descriptor tables are an array with each entry being 8bytes.
The selector is merely a pointer(or an index) to the 8-byte descriptor entry, in the Descriptor table(either GDT or LDT). So a selector CAN'T be 0.
Note that lowest 3 bits of 16-bit selector have special meaning, and only the upper 13-bits are used to index a descriptor entry from a descriptor table.
GDT = Global Descriptor Table
LDT = Local Descriptor Table
A system can have only one GDT, but many LDTs.
As entry number 0 in GDT, is reserved and can't be used. AFAIK, DOS32A, does not create any LDT for its applications, instead it simply allocate and initalize descriptor entries corresponding to the application, in GDT itself.
Selector MUST not be 0, as x86 architecture regards 0 selector as invalid, when you try to access memory with that selector; though you can successfully place 0 in any segment register, it is only when you try to access(read/write/execute) that segment, the cpu generates an exception.
In case of interrupt handlers, the base address need not be 0, even in case of flat mode.
The DPMI environment must have valid reasons for doing this so.
After all, you still need to tackle segmentation at some level in x86 architecture.
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
I think, you mean Bus master command and status registers respectively. They actually reside in either I/O space or memory space, but NOT in PCI configuration space.
So you can read/write them directly via IN/OUT or MOV, instructions.
For reading/writing, PCI configuration registers you must use configuration red/write methods or PCI BIOS routines.
NOTE:
Many PCI disk controllers, have a bit called 'Interrupt enable/disable' bit. The register
that contains this bit is usually in the PCI configuration space, and can be found from the datasheet.
Actually, this setting is for "forwarding" the interrupt generated by the device attached to the PCI controller, to the PCI bus.
If, interrupts are disabled via this bit, then even if your device(attached to PCI controller) is generating the interrupt, the interrupt will NOT be forwarded to the PCI bus(and hence cpu will never know if interrupt occurred), but the interrupt bit(This bit is different from 'Interrupt enable/disable' bit) in PCI controller is still set to notify that the device(attached to PCI controller, eg a hard disk) generated an interrupt, so that the program can read this bit and take appropriate actions. It is similar to polling, from programming perspective.
This usually apply only for non-bus master transfers.
But, it seems that you are using bus master transfers(ie DMA), so it should not apply in your case.
But anyway, I would suggest you do read the datasheet of the PCI controller carefully, especially looking for bits/registers related to interrupt handling
EDITED:
Well, as far as application level programming is concerned, you need not encounter/use _far pointers, as your program will not access anything outside to your code.
But this is not completely true, when you go to system-level programming, you need to access memory mapped device registers, external ROM, or implementing interrupt handlers, etc.
The story changes here. The creation of a segment ie allocating descriptor and getting its associated selector, ensures that even if there is a bug in code, it will not annoyingly change anything external to that particular segment from which current code is executing. If it tries to do so, cpu will generate a fault. So when accessing external devices(especially memory mapped device's registers), or accessing some rom data, eg BIOS etc., it is a good idea to have allocate a descriptor and set the base and segment limits according to the area you need to execute/read/write and proceed. But you are not bound to do so.
Some external code residing for eg in rom, assume that they will be invoked with a far call.
As I said earlier, in x86 architecture, at some level(the farther below you go) you need to deal with segmentation as there is no way to disable it completely.
But in flat model, segmentation is present as an aid to programmer, as I said above, when accessing external(wrt to your program) things. But you need not use if you don't desire to do so.
When an interrupt handler is invoked, it doesn't know the base and limits of program that was interrupted. It doesn't know the segment attributes, limits etc. of the interrupted program, we say except CS and EIP all registers are in undefined state wrt interrupt handler. So it is needed to be declared as far function to indicate that it resides somewhere external to currently executing program.
it's been a while since I fiddled with interrupts, but the table is a pointer to set where the processor should go to to process an interrupt. I can give you the process, but not code, as I only ever used 8086 code.
Pseudo code:
Initialize:
Get current vector - store value
Set vector to point to the entry point of your routine
next:
Process Interrupt:
Your code decides what to do with data
If it's your data:
process it, and return
If not:
jump to the stored vector that we got during initialize,
and let the chain of interrupts continue as they normally would
finally:
Program End:
check to see if interrupt still points to your code
if yes, set vector back to the saved value
if no, set beginning of your code to long jump to vector address you saved,
or set a flag that lets your program not process anything

How to send nmi on same system

I need to send an nmi on the system I am working on. I want to test few things which I have implemented. Is there any windows driver routine which allows us to do that? I think I can write to a port using __outword. Is there any other way to do it?
I have one more question. Are there any specific scenarios which causes an NMI? (However, I dont want system to BSOD or triple fault.)
Thanks
From Intel's Software Development Manual: System Programming Guide:
The nonmaskable interrupt (NMI) can be generated in either of two ways:
External hardware asserts the NMI pin.
The processor receives a message on the system bus (Pentium 4, Intel Core Duo, Intel Core 2, Intel Atom, and Intel Xeon processors) or the APIC serial bus (P6 family and Pentium processors) with a delivery mode NMI.
and
It is possible to issue a maskable hardware interrupt (through the INTR pin) to vector 2 to invoke the NMI interrupt handler; however, this interrupt will not truly be an NMI interrupt. A true NMI interrupt that activates the processors NMI-handling hardware can only be delivered through one of the mechanisms listed above.
So, if all you want to do is trigger the NMI handler, you can simply use int $2 (int 02h in Intel syntax). But, if you need to ensure that it is not masked, you will either need external hardware to trigger it, or to use the APIC.
If you choose to use the APIC to send an NMI, the easiest way to do it is to send an inter-processor interrupt. To do this, you will need access to the local APIC's registers, which are mapped into physical memory, by default at the address 0xFEE00000, although that can be changed. You will need to find the physical page containing the APIC's registers and map it into virtual memory so that you can access them.
In order to send an IPI, you need to write into the interrupt configuration register. The ICR's low 32 bits are located at 0x300 within the APIC's page, and the upper 32 bits are at 0x310. To send the NMI, you need to:
Get the APIC ID of the processor you want to send the NMI to. If you want to send it to the processor you are running on, this is simple since you can read it from the APIC at 0x20 in bits 24-31.
Write the APIC ID into the destination field, bits 24-31 of the high ICR register.
Write the value 0x4400 into the low ICR register. Bits 8-10 of this value indicate that you are sending an NMI, and bit 14 indicates that you are using the assert trigger mode.
When writing to an APIC register, you must write a full 32 bit value. Also, bits 13, 16-17, and 20-55 in the ICR are reserved, so you should not change their values. You also must write to the high bits of the ICR before the low bits, since the IPI is triggered by the write to the low bits.
Here is an example of sending an NMI to the current processor in C.
#define APIC_ID_OFFSET 0x20
#define ICR_LOW_OFFSET 0x300
#define ICR_HIGH_OFFSET 0x310
// Convenience macro used to access APIC registers
#define APIC_REG(offset) (*(unsigned int*)(apicAddress + offset))
void *apicAddress; // This should contain the virtual address that the APIC registers are mapped to
// Get the current APIC ID. Leave it in the high 8 bits since that is where it needs to be written anyway
unsigned int apicID = APIC_REG(APIC_ID_OFFSET) & 0xFF000000;
unsigned int high = APIC_REG(ICR_HIGH_OFFSET) & 0x00FFFFFF;
high |= apicID;
unsigned int low = APIC_REG(ICR_LOW_OFFSET) & 0xFFF32000;
low |= 0x4400;
APIC_REG(ICR_HIGH_OFFSET) = high;
APIC_REG(ICR_LOW_OFFSET) = low;

Resources