Replace HW interrupt in flat memory mode with DOS32/A - dos

I have a question about how to replace HW interrupt in flat memory mode...
about my application...
created by combining Watcom C and DOS32/A.
written for running on DOS mode( not on OS mode )
with DOS32/A now I can access >1M memory and allocate large memory to use...(running in flat memory mode !!!)
current issue...
I want to write an ISR(interrupt service routine) for one PCI card. Thus I need to "replace" the HW interrupt.
Ex. the PCI card's interrupt line = 0xE in DOS. That means this device will issue interrupt via 8259's IRQ 14.
But I did not how to achieve my goal to replace this interrupt in flat mode ?
# resource I found...
- in watcom C's library, there is one sample using _dos_getvect, _dos_setvect, and _chain_intr to hook INT 0x1C...
I tested this code and found OK. But when I apply it to my case: INT76 ( where IRQ 14 is "INT 0x76" <- (14-8) + 0x70 ) then nothing happened...
I checked HW interrupt is generated but my own ISR did not invoked...
Do I lose something ? or are there any functions I can use to achieve my goal ?
===============================================================
[20120809]
I tried to use DPMI calls 0x204 and 0x205 and found MyISR() is still not invoked. I described what I did as below and maybe you all can give me some suggestions !
1) Use inline assembly to implement DPMI calls 0x204 and 0x205 and test OK...
Ex. Use DPMI 0x204 to show the interrupt vectors of 16 IRQs and I get(selector:offset) following results: 8:1540(INT8),8:1544(INT9),.....,8:1560(INT70),8:1564(INT71),...,8:157C(INT77)
Ex. Use DPMI 0x205 to set the interrupt vector for IRQ14(INT76) and returned CF=0, indicating successful
2) Create my own ISR MyISR() as follows:
volatile int tick=0; // global and volatile...
void MyISR(void)
{
tick = 5; // simple code to change the value of tick...
}
3) Set new interrupt vector by DPMI call 0x205:
selector = FP_SEG(MyISR); // selector = 0x838 here
offset = FP_OFF(MyISR); // offset = 0x30100963 here
sts = DPMI_SetIntVector(0x76, selector, offset, &out_ax);
Then sts = 0(CF=0) indicating successful !
One strange thing here is:my app runs in flat memory model and I think the selector should be 0 for MyISR()... But if selector = 0 for DPMI call 0x205 then I got CF=1 and AX = 0x8022, indicating "invalid selector" !
4) Let HW interrupt be generated and the evidences are:
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
PCI device config register 0x3C/0x3D (Interrupt line) = 0xE/0x2
In DOS the interrupt mode is PIC mode(8259 mode) and Pin-based(MSIE=0)
5) Display the value of tick and found it is still "0"...
Thus I think MyISR() is not invoked correctly...

Try using DPMI Function 0204h and 0205h instead of '_dos_getvect' and '_dos_setvect', respectively.
The runtime environment of your program is DOS32A or a DPMI Server/host. So use the api they provided instead of using DOS int21h facilities. But DOS32A does intercepts int21h interrupts, so your code should work fine, as far as real mode is concerned.
Actually what you did is you install only real mode interrupt handler for IRQ14 by using '_dos_getvect' and '_dos_setvect' functions.
By using the DPMI functions instead, you install protected mode interrupt handler for IRQ14, and DOS32a will autopassup IRQ14 interrupt to this protected mode handler.
Recall: A dos extender/DPMI server can be in protected mode or real mode while an IRQ is asserted.
This is becoz your application uses some DOS or BIOS API, so extender needs to switch to real mode to execute them and the return back to protected mode to transfer control to you protected mode application.
DOS32a does this by allocating a real-mode callback (at least for hardware interrupts) which calls your protected mode handler if IRQ14 is asserted while the Extender is in real-mode.
If the extender is in protected mode, while IRQ14 is asserted, it will automatically transfer control to your IRQ14 handler.
But if you didn't install protected mode handler for your IRQ, then DOS32a, will not allocate any real-mode callback, and your real-mode irq handler may not get control.
But it should recieve control AFAIK.
Anyway give the above two functions a try. And do chain to the previous int76h interrupt handler as Sean said.
In short:
In case of DOS32a, you need not use '_dos_getvect' and '_dos_setvect' functions. Instead use the DPMI functions 0204h and 0205h for installing your protected mode IRQ handler.
An advise : In your interrupt handler the first step should be to check whether your device actually generated interrupt or it is some other device sharing this irq(IRQ14 in your case). You can do this by checking a 'interrupt pending bit' in your device, if it is set, service your device and chain to next handler. If it is not set to 1, simply chain to next handler.
EDITED:
Use the latest version of DOS32a, instead of one that comes with OW.
Update on 2012-08-14:
Yes, you can use FP_SEG and FP_OFF macros for obtaining selector and offset respectively, just like you would use these macros in real modes to get segment and offset.
You can also use MK_FP macro to create far pointers from selector and offset. eg.
MK_FP(selector, offset).
You should declare your interrupt handler with ' __interrupt ', keyword when writing handlers in C.
Here is a snippet:
#include <i86.h> /* for FP_OFF, FP_SEG, and MK_FP in OW */
/* C Prototype for your IRQ handler */
void __interrupt __far irqHandler(void);
.
.
.
irq_selector = (unsigned short)FP_SEG( &irqHandler );
irq_offset = (unsigned long)FP_OFF( &irqHandler );
__dpmi_SetVect( intNum, irq_selector, irq_offset );
.
.
.
or, try this:
extern void sendEOItoMaster(void);
# pragma aux sendEOItoMaster = \
"mov al, 0x20" \
"out 0x20, al" \
modify [eax] ;
extern void sendEOItoSlave(void);
# pragma aux sendEOItoSlave = \
"mov al, 0x20" \
"out 0xA0, al" \
modify [eax] ;
unsigned int old76_selector, new76_selector;
unsigned long old76_offset, new76_offset;
volatile int chain = 1; /* Chain to the old handler */
volatile int tick=0; // global and volatile...
void (__interrupt __far *old76Handler)(void) = NULL; // function pointer declaration
void __interrupt __far new76Handler(void) {
tick = 5; // simple code to change the value of tick...
.
.
.
if( chain ){
// disable irqs if enabled above.
_chain_intr( old76Handler ); // 'jumping' to the old handler
// ( *old76Handler )(); // 'calling' the old handler
}else{
sendEOItoMaster();
sendEOItoSlave();
}
}
__dpmi_GetVect( 0x76, &old76_selector, &old76_offset );
old76Handler = ( void (__interrupt __far *)(void) ) MK_FP (old76_selector, old76_offset)
new76_selector = (unsigned int)FP_SEG( &new76Handler );
new76_offset = (unsigned long)FP_OFF( &new76Handler );
__dpmi_SetVect( 0x76, new76_selector, new76_offset );
.
.
NOTE:
You should first double check that the IRQ# you are hooking is really assigned/mapped to the interrupt pin of your concerned PCI device. IOWs, first read 'Interrupt Line register' (NOT Interrupt Pin register) from PCI configuration space, and hook only that irq#. The valid values for this register, in your case are: 0x00 through 0x0F inclusive, with 0x00 means IRQ0 and 0x01 means IRQ1 and so on.
POST/BIOS code writes a value in 'Interrupt Line register', while booting, and you MUST NOT modify this register at any cost.(of course, unless you are dealing with interrupt routing issues which an OS writer will deal with)
You should also get and save the selector and offset of the old handler by using DPMI call 0204h, in case you are chaining to old handler. If not, don't forget to send EOI(End-of-interrupt) to BOTH master and slave PICs in case you hooked an IRQ belonging to slave PIC(ie INT 70h through 77h, including INT 0Ah), and ONLY to master PIC in case you hooked an IRQ belonging to master PIC.
In flat model, the BASE address is 0 and Limit is 0xFFFFF, with G bit(ie Granularity bit) = 1.
The base and limit(along with attribute bits(e.g G bit) of a segment) reside in the descriptor corresponding to a particular segment. The descriptor itself, sits in the descriptor table.
Descriptor tables are an array with each entry being 8bytes.
The selector is merely a pointer(or an index) to the 8-byte descriptor entry, in the Descriptor table(either GDT or LDT). So a selector CAN'T be 0.
Note that lowest 3 bits of 16-bit selector have special meaning, and only the upper 13-bits are used to index a descriptor entry from a descriptor table.
GDT = Global Descriptor Table
LDT = Local Descriptor Table
A system can have only one GDT, but many LDTs.
As entry number 0 in GDT, is reserved and can't be used. AFAIK, DOS32A, does not create any LDT for its applications, instead it simply allocate and initalize descriptor entries corresponding to the application, in GDT itself.
Selector MUST not be 0, as x86 architecture regards 0 selector as invalid, when you try to access memory with that selector; though you can successfully place 0 in any segment register, it is only when you try to access(read/write/execute) that segment, the cpu generates an exception.
In case of interrupt handlers, the base address need not be 0, even in case of flat mode.
The DPMI environment must have valid reasons for doing this so.
After all, you still need to tackle segmentation at some level in x86 architecture.
PCI device config register 0x5 bit2(Interrupt Disabled) = 0
PCI device config register 0x6 bit3(Interrupt status) = 1
I think, you mean Bus master command and status registers respectively. They actually reside in either I/O space or memory space, but NOT in PCI configuration space.
So you can read/write them directly via IN/OUT or MOV, instructions.
For reading/writing, PCI configuration registers you must use configuration red/write methods or PCI BIOS routines.
NOTE:
Many PCI disk controllers, have a bit called 'Interrupt enable/disable' bit. The register
that contains this bit is usually in the PCI configuration space, and can be found from the datasheet.
Actually, this setting is for "forwarding" the interrupt generated by the device attached to the PCI controller, to the PCI bus.
If, interrupts are disabled via this bit, then even if your device(attached to PCI controller) is generating the interrupt, the interrupt will NOT be forwarded to the PCI bus(and hence cpu will never know if interrupt occurred), but the interrupt bit(This bit is different from 'Interrupt enable/disable' bit) in PCI controller is still set to notify that the device(attached to PCI controller, eg a hard disk) generated an interrupt, so that the program can read this bit and take appropriate actions. It is similar to polling, from programming perspective.
This usually apply only for non-bus master transfers.
But, it seems that you are using bus master transfers(ie DMA), so it should not apply in your case.
But anyway, I would suggest you do read the datasheet of the PCI controller carefully, especially looking for bits/registers related to interrupt handling
EDITED:
Well, as far as application level programming is concerned, you need not encounter/use _far pointers, as your program will not access anything outside to your code.
But this is not completely true, when you go to system-level programming, you need to access memory mapped device registers, external ROM, or implementing interrupt handlers, etc.
The story changes here. The creation of a segment ie allocating descriptor and getting its associated selector, ensures that even if there is a bug in code, it will not annoyingly change anything external to that particular segment from which current code is executing. If it tries to do so, cpu will generate a fault. So when accessing external devices(especially memory mapped device's registers), or accessing some rom data, eg BIOS etc., it is a good idea to have allocate a descriptor and set the base and segment limits according to the area you need to execute/read/write and proceed. But you are not bound to do so.
Some external code residing for eg in rom, assume that they will be invoked with a far call.
As I said earlier, in x86 architecture, at some level(the farther below you go) you need to deal with segmentation as there is no way to disable it completely.
But in flat model, segmentation is present as an aid to programmer, as I said above, when accessing external(wrt to your program) things. But you need not use if you don't desire to do so.
When an interrupt handler is invoked, it doesn't know the base and limits of program that was interrupted. It doesn't know the segment attributes, limits etc. of the interrupted program, we say except CS and EIP all registers are in undefined state wrt interrupt handler. So it is needed to be declared as far function to indicate that it resides somewhere external to currently executing program.

it's been a while since I fiddled with interrupts, but the table is a pointer to set where the processor should go to to process an interrupt. I can give you the process, but not code, as I only ever used 8086 code.
Pseudo code:
Initialize:
Get current vector - store value
Set vector to point to the entry point of your routine
next:
Process Interrupt:
Your code decides what to do with data
If it's your data:
process it, and return
If not:
jump to the stored vector that we got during initialize,
and let the chain of interrupts continue as they normally would
finally:
Program End:
check to see if interrupt still points to your code
if yes, set vector back to the saved value
if no, set beginning of your code to long jump to vector address you saved,
or set a flag that lets your program not process anything

Related

Registering interrupt with irq from pci_irq_vector(9) function results in "No irq handler for this function"?

I am writing a device driver that services the interrupts from the device. The device has only one MSI interrupt vector, so I poll the irq with pci_irq_vector(dev, 0), receive the irq, and register the interrupt. This is shown in the following code snippet (equivalent to what I have minus error handling):
retval = pci_alloc_irq_vectors(dev, 1, 1, PCI_IRQ_MSI);
irq = pci_irq_vector(dev, 0);
retval = request_irq(irq, irq_fnc, 0, "name", dev);
This all completes successfully and without warning (at least with dmesg). Yet when the interrupt comes in, I get the error.
kernel:do_IRQ: 0.xxx No irq handler for this vector (irq -1)
The xxx appears to be an arbitrary number that changes every time the driver is loaded, but does not match the irq number. Instead, it matches the last two hex digits of the message data sent with the MSI interrupt as read from the MSI capability structure. Trying to request an irq of this number returns EINVAL which I think means that it's not associated with any PCI device. What does this number mean anyway?
Something that may be important to note, I am actually manually triggering this interrupt from the host side due to limitations with the device. I am reading the interrupt address and data from the capability structure then instructing the device to write the data to that address.
How would I go about further debugging this? Does anything from my description stand out as suspicious? Any help would be appreciated.
Does this particular irq show when you type cat /proc/interrupts? Maybe you can get the correct irq number from there, as well as other info like where it is attached and what driver is associated with this interrupt line!
So the problem ended up being in the order of things. To manually create the interrupt, I had read the config space for the interrupt address and data before allocating interrupts. While obvious in retrospect, allocating the irq vectors for the device writes the appropriate data to the config space. Hence, using the preexisting value in the message data field would point to an irq vector that does not exist.

How do I retrieve high and low-order parts of a value from two registers in inline assembly?

I'm currently working on a little game that can run from the boot sector of a hard drive, just for something fun to do. This means my program runs in 16-bit real mode, and I have my compiler flags set up to emit pure i386 code. I'm writing the game in C++, but I do need a lot of inline assembly to talk to the BIOS via interrupt calls. Some of these calls return a 32-bit integer, but stored in two 16-bit registers. Currently I'm doing the following to get my number out of the assembly:
auto getTicks = [](){
uint16_t ticksL{ 0 }, ticksH{ 0 };
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL) : "a"(0x0));
return static_cast<uint32_t>( (ticksH << 16) | ticksL );
};
This is a lambda function I use to call this interrupt function which returns a tick count. I'm aware that there are better methods to get time data, and that I haven't implemented a check for AL to see if midnight has passed, but that's another topic.
As you can see, I have to use two 16-bit values, get the register values separately, then combine them into a 32-bit number the way you see at the return statement.
Is there any way I could retrieve that data into a single 32-bit number in my code right away avoid the shift and bitwise-or? I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.
I know that those 16-bit registers I'm accessing are really just the higher and lower 16-bits of a 32-bit register in reality, but I have no idea how to access the original 32-bit register as a whole.
As Jester has already pointed out, these are in fact 2 separate registers, so there is no way to retrieve "the original 32-bit register."
One other point: That interrupt modifies the ax register (returning the 'past midnight' flag), however your asm doesn't inform gcc that you are changing ax. Might I suggest something like this:
asm volatile("int $0x1a" : "=c"(ticksH), "=d"(ticksL), "=a"(midnight) : "a"(0x0));
Note that midnight is also a uint16_t.
As other answers suggest you can't load DX and CX directly into a 32-bit register. You'd have to combine them as you suggest.
In this case there is an alternative. Rather than using INT 1Ah/AH=0h you can read the BIOS Data Area (BDA) in low memory for the 32-bit DWORD value and load it into a 32-bit register. This is allowed in real mode on i386 processors. Two memory addresses of interest:
40:6C dword Daily timer counter, equal to zero at midnight;
incremented by INT 8; read/set by INT 1A
40:70 byte Clock rollover flag, set when 40:6C exceeds 24hrs
These two memory addresses are in segment:offset format, but would be equivalent to physical address 0x0046C and 0x00470.
All you'd have to do is temporarily set the DS register to 0 (saving the previous value), turn off interrupts with CLI retrieve the values from lower memory using C/C++ pointers, re-enable interrupts with STI and restore DS to the previously saved value. This of course is added overhead in the boot sector compared to using INT 1Ah/AH=0h but would allow you direct access to the memory addresses the BIOS is reading/writing on your behalf.
Note: If DS is set to zero already no need to save/set/restore it. Since we don't see the code that sets up the environment before calling into the C++ code I don't know what your default segment values are. If you don't need to retrieve both the roll over and timer values and only wish to get them individually you can eliminate the CLI/STI.
You're looking for the 'A' constraint, which refers to the dx:ax register pair as a double-wide value. You can see the full set of defined constraints for x86 in the gcc documentation. Unfortunately there are no constraints for any other register pairs, so you have to get them as two values and reassemble them with shift and or, like you describe.

Intel 8259 PIC - Acknowledge interrupt

Assume we have a system with CPU which is fully compatible with Intel 8259 Programmable Interrupt Controller. So, this CPU use vectored interrupts, of course.
When one of eight interrupts occurs, PIC just asserts INTR wire that is connected to the CPU. Now PIC waits for CPU until INTA will be asserted. When so, PIC selects interrupt with the highest priority (depends on pin number), and then send its interrupt vector to data bus. I omitted some timing, but it doesn't matter for now, I think.
Here are questions:
How whole device, that causes interrupt, knows that his interrupt
request was accepted and it can pull off interrupt request? I read about 8259, but I didn't find it.
Is acknowledge device, whose interrupt was accepted, performed in ISR?
Sorry for my English.
The best reference is the original intel doc and is available here: https://pdos.csail.mit.edu/6.828/2012/readings/hardware/8259A.pdf It has full details of these modes, how the device operates, and how to program the device.
Caveat: I'm a bit rusty as I haven't programmed the 8259 in many years, but I'll take a shot at explaining things, per your request.
After an interrupting device, connected to an IRR ["interrupt request register"] pin, has asserted an interrupt request, the 8259 will convey this to the CPU by assserting INTR and then placing the vector on the bus during the three INTA cycles generated by the CPU.
After a given device has asserted IRR, the 8259's IS ["in-service"] register is or'ed with a mask of the IRR pin number. The IS is a priority select. While the IS bit is set, other interrupting devices of lower priority [or the original one] will not cause an INTR/INTA cycle to the CPU. The IS bit must be cleared first. These interrupts remain "pending".
The IS can be cleared by an EOI (end-of-interrupt) operation. There are multiple EOI modes that can be programmed. The EOI can be generated by the 8259 in AEOI mode. In other modes, the EOI is generated manually by the ISR by sending a command to the 8259.
The EOI action is all about allowing other devices to cause interrupts while the ISR is processing the current one. The EOI does not clear the interrupting device.
Clearing the interrupting device must be done by the ISR using whatever device specific register the device has for that purpose. Usually, this a "pending interrupt" register [can be 1 bit wide]. Most H/W uses two interrupt related registers and the other one is an "interrupt enable" register.
With level triggered interrupts, if the ISR does not clear the device, when the ISR does issue the EOI command to the 8259, the 8259 will [try to] reinterrupt the CPU using the vector for the same device for the same condition. The CPU will probably be reinterrupted as soon as it issues an sti or iret instruction. Thus, an ISR routine must take care to process things in proper sequence.
Consider an example. We have a video controller that has four sources for interrupts:
HSTART -- start of horizontal line
HEND -- end of horizontal line [start of horizontal blanking interval]
VSTART -- start of new video field/frame
VEND -- end of video field/frame [start of vertical blanking interval]
The controller presents these as a bit mask in its own special interrupt source register, which we'll call vidintr_pend. We'll call the interrupt enable register vidintr_enable.
The video controller will use only one 8259 IRR pin. It is the responsibility of the CPU's video ISR to interrogate the vidpend register and decide what to do.
The video controller will assert its IRR pin as long as vidpend is non-zero. Since we're level triggered, the CPU may be re-interrupted.
Here is a sample ISR routine to go with this:
// video_init -- initialize controller
void
video_init(void)
{
write_port(...);
write_port(...);
write_port(...);
...
// we only care about the vertical interrupts, not the horizontal ones
write_port(vidintr_enable,VSTART | VEND);
}
// video_stop -- stop controller
void
video_stop(void)
{
// stop all interrupt sources
write_port(vidintr_enable,0);
write_port(...);
write_port(...);
write_port(...);
...
}
// vidisr_process -- process video interrupts
void
vidisr_process(void)
{
u32 pendmsk;
// NOTE: we loop because controller may assert a new, different interrupt
// while we're processing a given one -- we don't want to exit if we _know_
// we'll be [almost] immediately re-entered
while (1) {
pendmsk = port_read(vidintr_pend);
if (pendmsk == 0)
break;
// the normal way to clear on most H/W is a writeback
// writing a 1 to a given bit clears the interrupt source
// writing a 0 does nothing
// NOTE: with this method, we can _never_ have a race condition where
// we lose an interrupt
port_write(vidintr_pend,pendmsk);
if (pendmsk & HSTART)
...
if (pendmsk & HEND)
...
if (pendmsk & VSTART)
...
if (pendmsk & VEND)
...
}
}
// vidisr_simple -- simple video ISR routine
void
vidisr_simple(void)
{
// NOTE: interrupt state has been pre-saved for us ...
// process our interrupt sources
vidisr_process();
// allow other devices to cause interrupts
port_write(8259,SEND_NON_SPECIFIC_EOI)
// return from interrupt by popping interrupt state
iret();
}
// vidisr_nested -- video ISR routine that allows nested interrupts
void
vidisr_nested(void)
{
// NOTE: interrupt state has been pre-saved for us ...
// allow other devices to cause interrupts
port_write(8259,SEND_NON_SPECIFIC_EOI)
// allow us to receive them
sti();
// process our interrupt sources
// this can be interrupted by another source or another device
vidisr_process();
// return from interrupt by popping interrupt state
iret();
}
UPDATE:
Your followup questions:
Why do you use interrupt disable on video controller register instead of mask 8259's interrupt enable bit?
When you execute vidisr_nested(void) function, it will enable nesting the same interrupt. Is it true? And is that what you want?
To answer (1), we should do both but not necessarily in the same place. They seem similar, but work in slightly different ways.
We change the video controller registers in the video controller driver [as it's the only place that "understands" the video controller's registers].
The video controller actually asserts the 8259's IRR pin from: IRR = ((vidintr_enable & vidintr_pend) != 0). If we never set vidintr_enable (i.e. it's all zeroes), then we can operate the device in a "polled" [non-interrupt] mode.
The 8259 interrupt enable register works similarly, but it masks against which IRRs [asserted or not] may interrupt the CPU. The device vidintr_enable controls whether it will assert IRR or not.
In the example video driver, the init routine enables the vertical interrupts, but not the horizontal. Only the vertical interrupts will generate a call to the ISR, but the ISR can/will also process the horizontal ones [as polled bits].
Changing the 8259 interrupt enable mask should be done in a place that understands the interrupt topology of the entire system. This is usually done by the containing OS. That's because the OS knows about the other devices and can make the best choice.
Herein, "containing OS" could be a full OS like Linux [of which I'm most familiar]. Or, it could just be an R/T executive [or boot rom--I've written a few] that has some common device handling framework with "helper" functions for the device drivers.
For example, although it's usual that all devices get their own IRR pin. But, it is possible, with level triggering, for two different devices to share an IRR. (e.g.) IRR[0] = devA_IRROUT | devB_IRROUT. Either through an OR gate [or wired OR(?)].
It's also possible that the device is attached to a "nested" or "cascaded" interrupt controller. IIRC [consult document], it is possible to have a "master" 8259 and [up to] 8 "slave" 8259s. Each slave 8259 connects to an IRR pin of the master. Then, connect devices to the slave IRR pins. For a fully loaded system, you can have 256 interrupting devices. And, the master can have slave 8259s on some IRR pins and real devices on others [a "hybrid" topology].
Usually, only the OS knows enough to deal with this. In a real system, a device driver probably wouldn't touch the 8259 at all. The non-specific EOI would probably have been sent to the 8259 before entering the device's ISR. And, the OS would handle the full "save state" and "restore state" and the driver just handles device specific actions.
Also, under an OS, the OS will call the "init" and "stop" routines. The general OS routines for this will handle the 8259 and call the device specific ones.
For example, under Linux [or almost any other OS or R/T executive], the interrupt sequence goes something like this:
- CPU hardware actions [atomic]:
- push %esp and flags register [has CPU interrupt enable flag] to stack
- clear CPU interrupt enable flag (e.g. implied cli)
- jump within interrupt vector table
- OS general ISR (preset within IVT):
- push all remaining registers to stack
- send non-specific EOI to 8259(s)
- call device-specific ISR (NOTE: CPU interrupt flag still clear)
- pop regs
- iret
To answer (2), yes, you are correct. It would probably interrupt immediately, and might nest (infinitely :-).
The simple ISR version is more efficient and preferable if the actions taken in the ISR are short, quick, and simple (e.g. just output to a few data ports).
If the required actions take a relatively long time (e.g. do intensive calculations, or write to a large number of ports or memory locations), the nested version is preferred to prevent other devices from having entry to their ISRs delayed excessively.
However, some time critical devices [like a video controller] need to use the simple model, preventing interruption by other devices, to guaranteed that they can complete in a finite, deterministic time.
For example, the video ISR handling of VEND might program the device for the next/upcoming field/frame and must complete this within the vertical blanking interval. They, have to do this, even if it means "excessive" delay of other ISRs.
Note that the ISR was "racing" to complete before the end of the blanking interval. Not the best design. I've had to program such a controller/device. For rev 2, we changed the design so the device registers were double-buffered.
That meant that we could set up the registers for frame 1 anytime during the [much longer] frame 0 display period. At VSTART for frame 1, the video hardware would instantly clock-in/save the double-buffered values, and the CPU could then setup for frame 2 anytime during the display of frame 1. And so on ...
With the modified design, the video driver removed the device setup from the ISR entirely. It was now handled from OS task level
In the driver example, I've adjusted the sequencing a bit to prevent infinite stacking, and added some additional information based upon my question (1) answer. That is, it shows [crudely] what to do with or without an OS.
// video controller driver
//
// for illustration purposes, STANDALONE means a very simple software system
//
// if it's _not_ defined, we assume the ISR is called from an OS general ISR
// that handles 8259 interactions
//
// if it's _defined_, we're showing [crudely] what needs to be done
//
// NOTE: although this is largely C code, it's also pseudo-code in places
// video_init -- initialize controller
void
video_init(void)
{
write_port(...);
write_port(...);
write_port(...);
...
#ifdef STANDALONE
write_port(8259_interrupt_enable |= VIDEO_IRR_PIN);
#endif
// we only care about the vertical interrupts, not the horizontal ones
write_port(vidintr_enable,VSTART | VEND);
}
// video_stop -- stop controller
void
video_stop(void)
{
// stop all interrupt sources
write_port(vidintr_enable,0);
#ifdef STANDALONE
write_port(8259_interrupt_enable &= ~VIDEO_IRR_PIN);
#endif
write_port(...);
write_port(...);
write_port(...);
...
}
// vidisr_pendmsk -- get video controller pending mask (and clear it)
u32
vidisr_pendmsk(void)
{
u32 pendmsk;
pendmsk = port_read(vidintr_pend);
// the normal way to clear on most H/W is a writeback
// writing a 1 to a given bit clears the interrupt source
// writing a 0 does nothing
// NOTE: with this method, we can _never_ have a race condition where
// we lose an interrupt
port_write(vidintr_pend,pendmsk);
return pendmsk;
}
// vidisr_process -- process video interrupts
void
vidisr_process(u32 pendmsk)
{
// NOTE: we loop because controller may assert a new, different interrupt
// while we're processing a given one -- we don't want to exit if we _know_
// we'll be [almost] immediately re-entered
while (1) {
if (pendmsk == 0)
break;
if (pendmsk & HSTART)
...
if (pendmsk & HEND)
...
if (pendmsk & VSTART)
...
if (pendmsk & VEND)
...
pendmsk = port_read(vidintr_pend);
}
}
// vidisr_simple -- simple video ISR routine
void
vidisr_simple(void)
{
u32 pendmsk;
// NOTE: interrupt state has been pre-saved for us ...
pendmsk = vidisr_pendmsk();
// process our interrupt sources
vidisr_process(pendmsk);
// allow other devices to cause interrupts
#ifdef STANDALONE
port_write(8259,SEND_NON_SPECIFIC_EOI)
#endif
// return from interrupt by popping interrupt state
#ifdef STANDALONE
pop_regs();
iret();
#endif
}
// vidisr_nested -- video ISR routine that allows nested interrupts
void
vidisr_nested(void)
{
u32 pendmsk;
// NOTE: interrupt state has been pre-saved for us ...
// get device pending mask -- do this _before_ [optional] EOI and the sti
// to prevent immediate stacked interrupts
pendmsk = vidisr_pendmsk();
// allow other devices to cause interrupts
#ifdef STANDALONE
port_write(8259,SEND_NON_SPECIFIC_EOI)
#endif
// allow us to receive them
// NOTE: with or without OS, we can't stack until _after_ this
sti();
// process our interrupt sources
// this can be interrupted by another source or another device
vidisr_process(pendmsk);
// return from interrupt by popping interrupt state
#ifdef STANDALONE
pop_regs();
iret();
#endif
}
BTW, I'm the author of the linux irqtune program
I wrote it back in the mid 90's. It's of lesser use now, and probably doesn't work on modern systems, but the FAQ I wrote has a great deal of information about interrupt device priorities. The program itself did a simple 8259 manipulation.
An online copy is available here: http://archive.debian.org/debian/dists/Debian-1.1/main/disks-i386/SpecialKernels/irqtune/README.html There's probably source code somewhere in this archive.
That's the version 0.2 doc. I haven't found an online copy of version 0.6 which has better explanation, so I've put up a text version here: http://pastebin.com/Ut6nCgL6
Side note: The "where to get" information in the FAQ [and email address] are no longer valid. And, I didn't understand the full impact of "spam" until I posted the FAQ and starting getting [tons of] it ;-)
And, irqtune even drew Linus' ire. Not because it didn't work but because it did: https://lkml.org/lkml/1996/8/23/19 IMO, if he had read the FAQ, he would have understood why [as what irqtune did is standard stuff to R/T guys].
UPDATE #2
Your new questions:
I think that you are missing a destination address in write_port(8259_interrupt_enable &= ~VIDEO_IRR_PIN). Isn't it so?
IRR register is read-only or r/w? If the second case, what is the purpose of writing into it?
Interrupt vectors are stored as logical addresses or physical address?
To answer question (3): No, not really [even if it seemed so]. The code snippet was "pseudo code" [not pure C code], as I mentioned in a code comment at the top, so technically speaking, I'm covered. However, to make it more clear, here is what the [closer to] real C code would look like:
// the system must know _which_ IRR H/W pin the video controller is connected to
// so we _hardwire_ it here
#define VIDEO_IRR_PIN_NUMBER 3 // just an example
#define VIDEO_IMR_MASK (1 << VIDEO_IRR_PIN_NUMBER)
// video_enable -- enable/disable video controller in 8259
void
video_enable(int enable)
{
u32 val;
// NOTE: we're reading/writing the _enable_ register, not the IRR [which
// software can _not_ modify or read]
val = read_port(8259_interrupt_enable);
if (enable)
val |= VIDEO_IMR_MASK;
else
val &= ~VIDEO_IMR_MASK;
write_port(8259_interrupt_enable,val);
}
Now, in video_init, replace the code inside STANDALONE with video_enable(1), and, in video_stop with video_enable(0)
As to question (4): We weren't really writing to the IRR, even though the symbol had _IRR_ in it. As mentioned in the code comments above, we were writing to the 8259 interrupt enable register which is really the "interrupt mask register" or IMR in the documentation. The IMR can be read from and written to by using OCW1 (see doc).
There is no way for software to access the IRR at all. (i.e.) There is no port in the 8259 to read or write the IRR value. The IRR is completely internal to the 8259.
There is a one-to-one correspondence between IRR pin numbers [0-7] and IMR bit numbers (e.g. to enable for IRR(0), set IMR bit 0), but the software has to know which bit to set.
Because the video controller is physically connected to a given IRR pin, it is always the same for a given PC board. The software [on older non-PnP systems] can't probe for this. Even on newer systems, the 8259 knows nothing of PnP, so it's still hardwired. The video controller driver programmer must just "know" what IRR pin is being used [by consulting the "spec sheet" or controller "architecture reference manual"].
To answer question (5): First consider what the 8259 does.
When the 8259 is intialized, the ICW2 ("initialization command word 2") gets set by the OS driver. This defines a portion of interrupt vector number the 8259 will present during the INTR/INTA cycle. In ICW2, the most significant 5 bits are marked T7-T3.
When an interrupt occurs, these bits are combined with the IRR pin number of the interrupting device [which is 3 bits wide] to form an 8 bit interrupt vector number: T7,T6,T5,T4,T3|I2,I1,I0
For example, if we put 0xD0 into ICW2, with our video controller using IRR pin 3, we'd have 1,1,0,1,0|0,1,1 or 0xD3 as the interrupt vector number that the 8259 will send to the CPU.
This is just a vector number [0x00-0xFF] as the 8259 knows nothing of memory addresses. It is the CPU that takes this vector number and, using the CPU's "interrupt vector table" [IVT], uses the vector number as an index into the IVT to properly vector the interrupt to an ISR routine.
On 80386 and later architectures, the IVT is actually called an IDT ("interrupt descriptor table"). For details, see the "System Programming Guide", chapter 6: http://download.intel.com/design/processor/manuals/253668.pdf
As, to whether the resulting ISR address from the IVT/IDT is physical or logical depends on the processor mode (e.g. real mode, protected mode, protected with virtual addressing enabled).
In a sense, all such addresses are always logical. And, all logical addresses undergo a translation to physical on each CPU instruction. Whether the translation is one-to-one [MMU not enabled or page tables have one-to-one mapping] is a question for "How has the OS set things up?"
Strictly speaking, there is no such thing
as "acknowledge an interrupt to device".
The thing that an ISR should do, is to handle
the interrupt condition. For example, if
the UART requested an interrupt because it
has an incoming data, then you should read
that incoming data. After that read operation,
UART no longer has the incoming data, so naturally
it stops asserting the IRQ line. Alternatively,
if your program no longer needs to read the
data and wants to stop the communication, it
would just mask the receiver interrupt via
the UART registers, and, once again, UART
will stop asserting the IRQ line. If the device
just wanted to signal you some state change,
then you should read the new state, and the
device will know that you have an up-to-date
state and will release an IRQ line.
So, in short: there is usually no any device-specific
acknowledge procedure. All you need to do is
to service an interrupt condition, after which,
that condition will disappear, voiding the
interrupt request.

Interrupt handling in Device Driver

I have written a simple character driver and requested IRQ on a gpio pin and wrtten a handler for it.
err = request_irq( irq, irq_handler,IRQF_SHARED | IRQF_TRIGGER_RISING, INTERRUPT_DEVICE_NAME, raspi_gpio_devp);
static irqreturn_t irq_handler(int irq, void *arg);
now from theory i know that Upon interrupt the interrupt Controller with tell the processor to call do_IRQ() which will check the IDT and call my interrupt handler for this line.
how does the kernel know that the interrupt handler was for this particular device file
Also I know that Interrupt handlers do not run in any process context. But let say I am accessing any variable declared out side scope of handler, a static global flag = 0, In the handler I make flag = 1 indicating that an interrupt has occurred. That variable is in process context. So I am confused how this handler not in any process context modify a variable in process context.
Thanks
The kernel does not know that this particular interrupt is for a particular device.
The only thing it knows is that it must call irq_handler with raspi_gpio_devp as a parameter. (like this: irq_handler(irq, raspi_gpio_devp)).
If your irq line is shared, you should check if your device generated an IRQ or not. Code:
int irq_handler(int irq, void* dev_id) {
struct raspi_gpio_dev *raspi_gpio_devp = (struct raspi_gpio_dev *) dev_id;
if (!my_gpio_irq_occured(raspi_gpio_devp))
return IRQ_NONE;
/* do stuff here */
return IRQ_HANDLED;
}
The interrupt handler runs in interrupt context. But you can access static variables declared outside the scope of the interrupt.
Usually, what an interrupt handler does is:
check interrupt status
retrieve information from the hardware and store it somewhere (a buffer/fifo for example)
wake_up() a kernel process waiting for that information
If you want to be really confident with the do and don't of interrupt handling, the best thing to read about is what a process is for the kernel.
An excellent book dealing with this is Linux Kernel Developpement by Robert Love.
The kernel doesn't know which device the interrupt pertains to. It is possible for a single interrupt to be shared among multiple devices. Previously this was quite common. It is becoming less so due to improved interrupt support in interrupt controllers and introduction of message-signaled interrupts. Your driver must determine whether the interrupt was from your device (i.e. whether your device needs "service").
You can provide context to your interrupt handler via the "void *arg" provided. This should never be process-specific context, because a process might exit leaving pointers dangling (i.e. referencing memory which has been freed and/or possibly reallocated for other purposes).
A global variable is not "in process context". It is in every context -- or no context if you prefer. When you hear "not in process context", that means a few things: (1) you cannot block/sleep (because what process would you be putting to sleep?), (2) you cannot make any references to user-space virtual addresses (because what would those references be pointing to?), (3) you cannot make references to "current task" (since there isn't one or it's unknown).
Typically, a driver's interrupt handler pushes or pulls data into "driver global" data areas from which/to which the process context end of the driver can transfer data.
This is to reply your question :-
how does the kernel know that the interrupt handler was for this particular >device file?
Each System-On-Chip documents will mention interrupt numbers for different devices connected to different interrupt lines.
The Same Interrupt number has to be mentioned in the Device Tree entry for instantiation of device driver.
The Device driver's usual probe function parses the Device tree data structure and reads the IRQ number and registers the handler using the register_irq function.
If there are multiple devices to a single IRQ number/line, then the IRQ status register(for different devices if mapped under the same VM space) can be used inside the IRQ handler to differentiate.
Please read more in my blog

Interrupt vector and irq mapping in do_IRQ

I'm working on a x86 system with Linux 3.6.0. For some experiments, I need to know how the IRQ is mapped to the vector. I learn from many book saying that for vector 0x0 to 0x20 is for traps and exceptions, and from vector 0x20 afterward is for the external device interrupts. And this also defined in the source code Linux/arch/x86/include/asm/irq_vectors.h
However, what I'm puzzled is that when I check the do_IRQ function,
http://lxr.linux.no/linux+v3.6/arch/x86/kernel/irq.c#L181
I found the IRQ is fetched by looking up the "vector_irq" array:
unsigned int __irq_entry do_IRQ(struct pt_regs *regs)
{
struct pt_regs *old_regs = set_irq_regs(regs);
/* high bit used in ret_from_ code */
unsigned vector = ~regs->orig_ax;
unsigned irq;
...
irq = __this_cpu_read(vector_irq[vector]); // get the IRQ from the vector_irq
// print out the vector_irq
prink("CPU-ID:%d, vector: 0x%x - irq: %d", smp_processor_id(), vector, irq);
}
By instrumenting the code with printk, the vector-irq mapping I got is like below and I don't have any clue why this is the mapping. I though the mapping should be (irq + 0x20 = vector), but it seems not the case.
from: Linux/arch/x86/include/asm/irq_vector.h
* Vectors 0 ... 31 : system traps and exceptions - hardcoded events
* Vectors 32 ... 127 : device interrupts = 0x20 – 0x7F
But my output is:
CPU-ID=0.Vector=0x56 (irq=58)
CPU-ID=0.Vector=0x66 (irq=59)
CPU-ID=0.Vector=0x76 (irq=60)
CPU-ID=0.Vector=0x86 (irq=61)
CPU-ID=0.Vector=0x96 (irq=62)
CPU-ID=0.Vector=0xa6 (irq=63)
CPU-ID=0.Vector=0xb6 (irq=64)
BTW, these irqs are my 10GB ethernet cards with MSIX enabled. Could anyone give me some ideas about why this is the mapping? and what's the rules for making this mapping?
Thanks.
William
The irq number (which is what you use in software) is not the same as the vector number (which is what the interrupt controller actually uses).
The x86 I/OAPIC interrupt controller assigns interrupt priorities in groups of 16, so the vector numbers are spaced out to prevent them from interfering with each other
(see the function __assign_irq_vector in arch/x86/kernel/apic/io_apic.c).
I guess my question is how the vectors are assigned for a particular
IRQ number and what's are the rules behind.
The IOAPIC supports a register called IOREDTBL for each IRQ input. Software assigns the desired vector number for the IRQ input using bit 7-0 of this register. It is this vector number that serves as an index into the processors Interrupt Descriptor Table. Quoting the IOAPIC manual (82093AA)
7:0 Interrupt Vector (INTVEC)—R/W: The vector field is an 8 bit field
containing the interrupt vector for this interrupt. Vector values
range from 10h to FEh.
Note that these registers are not directly accessible to software. To access IOAPIC registers (not to be confused with Local APIC registers) software must use the IOREGSEL and IOWIN registers to indirectly interact with the IOAPIC. These registers are also described in the IOAPIC manual.
The source information for the IOAPIC can be a little tricky to dig up. Here's a link to the example I used:
IOAPIC data sheet link

Resources