How do I increase windows interrupt latency to stress test a driver? - windows

I have a driver & device that seem to misbehave when the user does any number of complex things (opening large word documents, opening lots of files at once, etc.) -- but does not reliably go wrong when any one thing is repeated. I believe it's because it does not handle high interrupt latency situations gracefully.
Is there a reliable way to increase interrupt latency on Windows XP to test this theory?
I'd prefer to write my test programn in python, but c++ & WinAPI is also fine...

My apologies for not having a concrete answer, but an idea to explore would be to use either c++ or cython to hook into the timer interrupt (the clock tick one) and waste time in there. This will effectively increase latency.

I don't know if there's an existing solution. But you may create your own one.
On Windows all the interrupts are prioritized. So that if there's a driver code running on a high IRQL, your driver won't be able to serve your interrupt if its level is lower. At least it won't be able to run on the same processor.
I'd do the following:
Configure your driver to run on a single processor (don't remember how to do this, but such an option definitely exists).
Add an I/O control code to your driver.
In your driver's Dispatch routine do a busy wait on a high IRQL (more about this later)
Call your driver (via DeviceIoControl) to simulate a stress.
The busy wait may look something like this:
KIRQL oldIrql;
__int64 t1, t2;
KeRaiseIrql(31, &oldIrql);
KeQuerySystemTime((LARGE_INTEGER*) &t1);
while (1)
{
KeQuerySystemTime((LARGE_INTEGER*) &t2);
if (t1 - t1 > /* put the needed time interval */)
break;
}
KeLowerIrql(oldIrql);

Related

What is the difference within the compiler between debugging and running the code? (STM32)

somehow when i am running my code, it seems like one GPIO Port isn't being initialized, meanwhile if i am debugging, it is.
I am initializing two sensors:
struct MAX31856_t max31856_temperature_sensor_heater_1 = MAX31856_TPL( SPI_DEV_TPL( IO_PIN_TPL(
TEMP_SENSOR_0_CS_GPIO_Port, TEMP_SENSOR_0_CS_Pin), &spi1));
struct MAX31856_t max31856_temperature_sensor_heater_2 = MAX31856_TPL( SPI_DEV_TPL( IO_PIN_TPL(
TEMP_SENSOR_1_CS_GPIO_Port, TEMP_SENSOR_1_CS_Pin), &spi1));
Sensor Heater 1 is not getting any Information, Sensor Heater 2 is getting Informations. Now if i swap the Name of the Heaters:
struct MAX31856_t max31856_temperature_sensor_heater_2 = MAX31856_TPL( SPI_DEV_TPL( IO_PIN_TPL(
TEMP_SENSOR_0_CS_GPIO_Port, TEMP_SENSOR_0_CS_Pin), &spi1));
struct MAX31856_t max31856_temperature_sensor_heater_1 = MAX31856_TPL( SPI_DEV_TPL( IO_PIN_TPL(TEMP_SENSOR_1_CS_GPIO_Port, TEMP_SENSOR_1_CS_Pin), &spi1));
and run the code in the debugger, Sensor Heater 1 and 2 are getting Informations.
How can this happen? I was thinking about a timing problem, but since it is working in the debugger, i don't really know what to do.
Provided that you are debugging and/or running the same binary. Debugging is mostly the same as running except if you halt the processor (es breakpoints).
In that case...
some peripherals could continue to run or be halted togheder with the cpu, the behaviour is some cases can be configured. (timers, watchdog...)
some interrupts can be lost.
some hardware buffers can overflow and data can be lost (if you don't use any flow control in your IO)
How do you run the code in debug mode? Do you have breakpoints somewhere?
You (OP) are right about it being most likely a timing problem, and probably related to physical SPI transmission. Because your line of code to send/receive something over SPI has already executed in the MCU, but physically the bits and bytes are still being transmitted on the line, while MCU is already calling the next SPI function, so one of the transmissions will fail. Try adding some delay after SPI transmission code. If things work after that, then it's the timing of SPI peripheral, and you need to add a check that there is no SPI transmission already in place before you call a functions to send/receive something.
You can do while(transmission) (pseudocode, replace with actual check if SPI transmission is going on) to wait until the previous transmission ends to call the next one.

TI MSP430 Interrupt Problems After UART Code Port

I am using the MSP430F2013 processor for an application, which doesn't have a UART. I need a UART, and so I used the TI's sample code "msp430x20x3_ta_uart2400.c" to emulate one using the Timer module. This all worked fine (compiled with IAR Embedded Workbench), having tested it using PuTTY to transmit characters to a development board and a loopback to echo them to the terminal.
That was a de-risking exercise, and now I've come to port that code into my application's state machine. Having done this, I'm having issues surrounding the timer interrupts and low power sleep modes. Here's the snippet of my code around the entry into the low power (sleep) mode:
// Prepare the UART to receive one byte.
prepare_receiver();
// Enter low power mode 1.
__bis_SR_register(LPM1_bits + GIE);
// Check whether the full message has been received.
if(true == get_message_complete())
{
process_event(e_euart_message_received, NULL);
}
What I'm seeing on the debugger (C-Spy) is that sometimes it will execute the bis_SR_register() line on first entry and then go to the if statement, i.e., ignoring the fact that I've asked it to go to sleep. On other occasions, when it does go to sleep when it should, the ISR triggers correctly and eventually brings me back to the if statement to continue program execution (as I'm expecting). However, if I try to step to the next statement, the application freezes on that first line, i.e., I can't advance.
I can't think of anything functionally different from TI's example that I'm doing, so I figure my problem must be something to do with how I've ported it. For example, my Timer ISR and the code I've posted here are in different compilation units - would this sort of decision have any bearing on things? I'm aware my question might be a little vague but unfortunately I can't post all of my code, so instead I'm looking for someone with MSP experience who might be able to suggest some things to look at or some potential pitfalls that I may have fallen into.
Debugging interrupts with C-Spy in Low Power Mode is going to be tricky. According to Section A.3 Debugging (C-Spy) - IAR User's Guide:
5) C-SPY can debug applications that utilize interrupts and low power modes
But there are some "gotchas" that you should be aware of that may be causing your headaches.
In particular:
14) When C-SPY has control of the device, the CPU is ON (that is, it is not in low-power mode) regardless of the settings of the low-power
mode bits in the status register. Any low-power mode conditions are
restored prior to Step or Go. Consequently, do not measure the power
consumed by the device while C-SPY has control of the device. Instead,
run your application using Go with JTAG released
19) C-SPY utilizes the system clock to control the device during
debugging. Therefore, device counters, etc., that are clocked by the
Main System Clock (MCLK) are affected when C-SPY has control of the
device. Special precautions are taken to minimize the effect upon the
Watchdog Timer. The CPU core registers are preserved. All other clock
sources (SMCLK, ACLK) and peripherals continue to operate normally
during emulation. In other words, the Flash Emulation Tool is a
partially intrusive tool.
Devices that support clock control (Emulator
→ Advanced → Clock Control) can further minimize these
effects by selecting to stop the clock(s) during debugging
24) Peripheral bits that are cleared when read during normal program
execution (that is, interrupt flags) are cleared when read while being
debugged (that is, memory dump, peripheral registers).
When using certain MSP430 devices (such as MSP430F15x, MSP430F16x,
MSP430F43x, and MSP430F44x devices), bits do not behave this way
(that is, the bits are not cleared by C-SPY read operations).
26) While single stepping with active and enabled interrupts, it can
appear that only the interrupt service routine (ISR) is active (that
is, the non-ISR code never appears to execute, and the single step
operation always stops on the first line of the ISR). However, this
behavior is correct because the device always processes an active and
enabled interrupt before processing non-ISR (that is, mainline) code.
A workaround for this behavior is, while within the ISR, to disable
the GIE bit on the stack so that interrupts are disabled after exiting
the ISR. This permits the non-ISR code to be debugged (but without
interrupts). Interrupts can later be reenabled by setting GIE in the
status register in the Register window.
On devices with the clock control emulation feature, it may be possible
to suspend a clock between single steps and delay an interrupt request
(Emulator → Advanced → Clock Control).
One thing to try is commenting out all the low power code and seeing if your UART code works like that. Then go back and try re-enabling the low power mode.
The answer to this question lies in the debugging setup and more specifically what types of breakpoints are being used. I had quite a complex series of macros that were running on program upload, which set various hooks into memory for testing purposes. These hooks relied on software breakpoints being created, which would then call functions outside of the application. I have seen no problem in using these breakpoints in normal use, however their existence means that the debugging session doesn't run in real-time (i.e., the device is under control of the host PC). This, for a reason yet not completely known to me, caused problems when trying to debug interrupts and low power modes. (I suspect that if I was to look a bit deeper, I would see the need to use clock control whilst debugging, but I'll save that for another day).
So, to solve this problem and allow me to debug my interrupt and low power mode heavy code, which I'd ported into my larger application state machine, I had to do the following:
Disable software breakpoints within IAR.They're not actually enabled by default, but if you've been doing clever things with macros like I had, you probably would've needed to enable them, since there just aren't enough hardware breakpoints available in most MSP430s (for instance, I only have two in the MSP430F2013, and C-SPY more often than not hogs one of those!). The obvious downside to this is that debugging becomes a bit more laborious, but at least it's reliable.
Remove links to .mac Macro files.In other words, if you're using macros, don't. In my case, this meant that I had to hack some state machine logic in order to force myself down a certain route (that previously the macro had been doing for me). This clearly isn't ideal, but it will allow you to debug the interrupt/low power mode code. The macros can then be re-enabled afterwards.
So it turned out that there wasn't a problem with my port after all. I'm not particularly happy with this hacky solution, but at least it's a step forward. If I have the time, I'll investigate to see if I can work out a way of using software breakpoints and add to this answer.

How can I get a pulse in win32 Assembler (specifically nasm)?

I'm planning on making a clock. An actual clock, not something for Windows. However, I would like to be able to write most of the code now. I'll be using a PIC16F628A to drive the clock, and it has a timer I can access (actually, it has 3, in addition to the clock it has built in). Windows, however, does not appear to have this function. Which makes making a clock a bit hard, since I need to know how long it's been so I can update the current time. So I need to know how I can get a pulse (1Hz, 1KHz, doesn't really matter as long as I know how fast it is) in Windows.
There are many timer objects available in Windows. Probably the easiest to use for your purposes would be the Multimedia Timer, but that's been deprecated. It would still work, but Microsoft recommends using one of the new timer types.
I'd recommend using a threadpool timer if you know your application will be running under Windows Vista, Server 2008, or later. If you have to support Windows XP, use a Timer Queue timer.
There's a lot to those APIs, but general use is pretty simple. I showed how to use them (in C#) in my article Using the Windows Timer Queue API. The code is mostly API calls, so I figure you won't have trouble understanding and converting it.
The LARGE_INTEGER is just an 8-byte block of memory that's split into a high part and a low part. In assembly, you can define it as:
MyLargeInt equ $
MyLargeIntLow dd 0
MyLargeIntHigh dd 0
If you're looking to learn ASM, just do a Google search for [x86 assembly language tutorial]. That'll get you a whole lot of good information.
You could use a waitable timer object. Since Windows is not a real-time OS, you'll need to make sure you set the period long enough that you won't miss pulses. A tenth of a second should be safe most of the time.
Additional:
The const LARGE_INTEGER you need to pass to SetWaitableTimer is easy to implement in NASM, it's just an eight byte constant:
period: dq 100 ; 100ms = ten times a second
Pass the address of period as the second argument to SetWaitableTimer.

How to wait for one second on an 8051 microcontroller?

I'm supposed to write a program that will send some values to registers, then wait one second, then change the values. The thing is, I'm unable to find the instruction that will halt operations for one second.
How about setting up a timer interrupt ?
Some useful hints and code snippets in this Keil 8051 application note.
There is no such 'instruction'. There is however no doubt at least one hardware timer peripheral (the exact peripheral set depends on the exact part you are using). Get out the datasheet/user manual for the part you are using and figure out how to program the timer; you can then poll it or use interrupts. Typically you'd configure the timer to generate a periodic interrupt that then increments a counter variable.
Two things you must know about timer interrupts: Firstly, if your counter variable is greater than 8-bit, access to it will not be atomic, so outside of the interrupt context you must either temporarily disable interrupts to read it, or read it twice in succession with the same value to validate it. Secondly, the timer counter variable must be declared volatile to prevent the compiler optimising out access to it; this is true of all variables shared between interrupts and threads.
Another alternative is to use a low power 'sleep' mode if supported; you set up a timer to wake the processor after the desired period, and issue the necessary sleep instruction (this may be provided as an 'intrinsic' by your compiler, or you may be controlled by a peripheral register. This is general advice, not 8051 specific; I don't know if your part even supports a sleep mode.
Either way you need to wade through the part specific documentation. If you could tell us the exact part, you may get help with that.
A third solution is to use an 8051 specific RTOS kernel which will provide exactly the periodic delay function you are looking for, as well as multi-threading and IPC.
I would setup a timer so that it interrupts every 10ms. In that interrupt, increment a variable.
You will also need to write a function to disable interrupts and read that variable.
In your main program, you will read the timer variable and then wait until it is 10100 more than it is when you started.
Don't forget to watch out for the timer variable rolling over.

How to determine which task is dead?

I have an embedded system that has multiple (>20) tasks running at different priorities. I also have watchdog task that runs to check that all the other tasks are not stuck. My watchdog is working because every once in a blue moon, it will reboot the system because a task did not check in.
How do I determine which task died?
I can't just blame the oldest task to kick the watchdog because it might have been held off by a higher priority task that is not yielding.
Any suggestions?
A per-task watchdog requires that the higher priority tasks yield for an adequate time so that all may kick the watchdog. To determine which task is at fault, you'll have to find the one that's starving the others. You'll need to measure task execution times between watchdog checks to locate the actual culprit.
Is this pre-emptive? I gather so since otherwise a watchdog task would not run if one of the others had gotten stuck.
You make no mention of the OS but, if a watchdog task can check if a single task has not checked in, there must be separate channels of communication between each task and the watchdog.
You'll probably have to modify the watchdog to somehow dump the task number of the one that hasn't checked in and dump the task control blocks and memory so you can do a post-mortem.
Depending on the OS, this could be easy or hard.
Even I was working last few weeks on Watchdog reset problem. But fortunately for me in the ramdump files (in ARM development environment), which has one Interrupt handler trace buffer, containing PC and SLR at each of the interrupts. Thus from the trace buffer I could exactly find out which part of code was running before WD reset.
I think if you have same kind of mechanism of storing PC, SLR at each interrupt then you can precisely find out culprit task.
Depending on your system and OS, there may be different approaches. One very low level approach I have used is to blink an LED on when each of the tasks is running. You may need to put a scope on the LEDs to see very fast task switching.
For an interrupt-driven watchdog, you'd just make the task switcher update the currently running task number each time it is changed, allowing you to identify which one didn't yield.
However, you suggest you wrote the watchdog as a task yourself, so before rebooting, surely the watchdog can identify the starved task? You can store this in memory that persists beyond a warm reboot, or send it over a debug interface. The problem with this is that the starved task is probably not the problematic one: you'll probably want to know the last few task switches (and times) in order to identify the cause.
A simplistic, back of the napkin approach would be something like this:
int8_t wd_tickle[NUM_TASKS]
void taskA_main()
{
...
// main loop
while(1) {
...
wd_tickle[TASKA_NUM]++;
}
}
... tasks B, C, D... follow similar pattern
void watchdog_task()
{
for(int i= 0; i < NUM_TASKS; i++) {
if(0 == wd_tickle[i]) {
// Egads! The task didn't kick us! Reset and record the task number
}
}
}
How is your system working exactly? I always use a combination of software and hardware watchdogs. Let me explain...
My example assumes you're working with a preemptive real time kernel and you have watchdog support in your cpu/microcontroller. This watchdog will perform a reset if it was not kicked withing a certain period of time. You want to check two things:
1) The periodic system timer ("RTOS clock") is running (if not, functions like "sleep" would no longer work and your system is unusable).
2) All threads can run withing a reasonable period of time.
My RTOS (www.lieron.be/micror2k) provides the possibility to run code in the RTOS clock interrupt handler. This is the only place where you refresh the hardware watchdog, so you're sure the clock is running all the time (if not the watchdog will reset your system).
In the idle thread (always running at lowest priority), a "software watchdog" is refreshed. This is simply setting a variable to a certain value (e.g. 1000). In the RTOS clock interrupt (where you kick the hardware watchdog), you decrement and check this value. If it reaches 0, it means that the idle thread has not run for 1000 clock ticks and you reboot the system (can be done by looping indefinitely inside the interrupt handler to let the hardware watchdog reboot).
Now for your original question. I assume the system clock keeps running, so it's the software watchdog that resets the system. In the RTOS clock interrupt handler, you can do some "statistics gathering" in case the software watchdog situation occurs. Instead of resetting the system, you can see what thread is running at each clock tick (after the problem occurs) and try to find out what's going on. It's not ideal, but it will help.
Another option is to add several software watchdogs at different priorities. Have the idle thread set VariableA to 1000 and have a (dedicated) medium priority thread set Variable B. In the RTOS clock interrupt handler, you check both variables. With this information you know if the looping thread has a priority higher then "medium" or lower then "medium". If you wish you can add a 3rd or 4th or how many software watchdogs you like. Worst case, add a software watchdog for each priority that's used (will cost you as many extra threads though).

Resources