Can breakpoints be used in ISRs? - debugging

Can breakpoints be used in interrupt service routines (ISRs)?

Yes - in an emulator.
Otherwise, no. It's difficult to pull off, and a bad idea in any case. ISRs are (usually) supposed to work with the hardware, and hardware can easily behave very differently when you leave a gap of half a second between each instruction.
Set up some sort of logging system instead.
ISRs also ungracefully "steal" the CPU from other processes, so many operating systems recommend keeping your ISRs extremely short and doing only what is strictly necessary (such as dealing with any urgent hardware stuff, and scheduling a task that will deal with the event properly). So in theory, ISRs should be so simple that they don't need to be debugged.
If it's hardware behaviour that's the problem, use some sort of logging instead, as I've suggested. If the hardware doesn't really mind long gaps of time between instructions, then you could just write most of the driver in user space - and you can use a debugger on that!

Depending on your platform, you can do this by accessing the debug port of your processor, typically using the JTAG interface. Keep in mind that you're drastically changing everything that has to do with timing with that method, so your debug session may be useless. But then again, many bugs can be caught this way. Also mind MMU based memory mappings, as JTAG debuggers often don't take them into account.

In Windows, with a kernel debugger attached, you can indeed place breakpoints in interrupt handlers.

Related

Do you need a realtime operating system in order to ensure your program is never taken off the CPU?

If I were to write a program and I wanted to be guaranteed that the program never sees an instance where, after it is running, it gets kicked off of the cpu until program termination, would I need an RTOS or is there a way to have such an experience guranteed on a regular linux os.
Example:
Lets say we a running a headless Linux machine and running a program as user or root (eg reading SPI data from a sensor, listening for http requests) and there is reason to believe there is almost almost no other interaction with the machine aside from the single standalone script running.
If I wanted to ensure that my process running never gets taken off my cpu even for a moment such that I never miss valuable sensor information or incoming http requests, does this warrant a real-time operating system to keep this guarantee?
are process priorities of programs ran by the user / root enough of a priority to not get kicked off?
is a realtime os needed to guarantee our program never witnesses a moment when it is kicked off of the cpu?
I know that Real Time OS are needed for guarantees on hard limits and hard deadlines of events. I also know that on a regular operating system it is up to the OS to decide priority and scheduling.
if this is in the wrong stack let me know.
Do you need to act on sensor readings in a constant time frame? How complicated this action should be? If all you need is to never miss a reading and you're ok with buffering them - just add a microcontroller or an FPGA in between your non-realtime device and a sensor.
Also, you can ensure some soft real time constraints even with an unpatched Linux. You can pin a process to a CPU and avoid using any syscalls in it - spin and poll instead, at 100% CPU utilisation, and then it's likely kernel will never touch it. Make sure the process binary and all the dynamic libraries (if any) are on a RAM disk (to avoid paging) and disable swap.

Executing simple code on secondary processors, Pre-boot or in DOS

On AMD-64 (family 15h) in barebone legacy mode - such as pre-booting, or in DOS -
I wish to momentarily 'awake' each application processor in turn, have it run a (very short) sequence of instructions, and put it back to its previous waiting or sleeping state.
For specifics, I want each processor to 'WrMSR' some microcode update blob, newer than what's currently carved in BIOS. But the question can be, more generally : how to awake processor number N, set its state (CS:rIP) so it executes a prepared thread of instructions from DRAM and finally goes back to 'sleep' quietly?
This should be done with the lightest possible of machineries, in real mode as far as possible, using no "ACPI" tables, helpers, and stuff ! Also, for the kind of very basic and short tasks envisioned, the app processors would not need to serve hardware interrupts, so, I guess, no special APIC setup is needed.
I'd appreciate a sketch of the essential steps, with assembler or pseudo-code, and or pointers to relevant documentation or example code.

Atomic operations in ARM strex and ldrex - can they work on I/O registers?

Suppose I'm modifying a few bits in a memory-mapped I/O register, and it's possible that another process or and ISR could be modifying other bits in the same register.
Can ldrex and strex be used to protect against this? I mean, they can in principle because you can ldrex, and then change the bit(s), and strex it back, and if the strex fails it means another operation may have changed the reg and you have to start again. But can the strex/ldrex mechanism be used on a non-cacheable area?
I have tried this on raspberry pi, with an I/O register mapped into userspace, and the ldrex operation gives me a bus error. If I change the ldrex/strex to a simple ldr/str it works fine (but is not atomic any more...) Also, the ldrex/strex routines work fine on ordinary RAM. Pointer is 32-bit aligned.
So is this a limitation of the strex/ldrex mechanism? or a problem with the BCM2708 implementation, or the way the kernel has set it up? (or somethinge else- maybe I've mapped it wrong)?
Thanks for mentioning me...
You do not use ldrex/strex pairs on the resource itself. Like swp or test and set or whatever your instruction set supports (for arm it is swp and more recently strex/ldrex). You use these instructions on ram, some ram location agreed to by all the parties involved. The processes sharing the resource use the ram location to fight over control of the resource, whoever wins, gets to then actually address the resource. You would never use swp or ldrex/strex on a peripheral itself, that makes no sense. and I could see the memory system not giving you an exclusive okay response (EXOKAY) which is what you need to get out of the ldrex/strex infinite loop.
You have two basic methods for sharing a resource (well maybe more, but here are two). One is you use this shared memory location and each user of the shared resource, fights to win control over the memory location. When you win you then talk to the resource directly. When finished give up control over the shared memory location.
The other method is you have only one piece of software allowed to talk to the peripheral, nobody else is allowed to ever talk to the peripheral. Anyone wishing to have something done on the peripheral asks the one resource to do it for them. It is like everyone being able to share the soft drink fountain, vs the soft drink fountain is behind the counter and only the soft drink fountain employee is allowed to use the soft drink fountain. Then you need a scheme either have folks stand in line or have folks take a number and be called to have their drink filled. Along with the single resource talking to the peripheral you have to come up with a scheme, fifo for example, to essentially make the requests serial in nature.
These are both on the honor system. You expect nobody else to talk to the peripheral who is not supposed to talk to the peripheral, or who has not won the right to talk to the peripheral. If you are looking for hardware solutions to prevent folks from talking to it, well, use the mmu but now you need to manage the who won the lock and how do they get the mmu unblocked (without using the honor system) and re-blocked in a way that
Situations where you might have an interrupt handler and a foreground task sharing a resource, you have one or the other be the one that can touch the resource, and the other asks for requests. for example the resource might be interrupt driven (a serial port for example) and you have the interrupt handlers talk to the serial port hardware directly, if the application/forground task wants to have something done it fills out a request (puts something in a fifo/buffer) the interrupt then looks to see if there is anything in the request queue, and if so operates on it.
Of course there is the, disable interrupts and re-enable critical sections, but those are scary if you want your interrupts to have some notion of timing/latency...Understand what you are doing and they can be used to solve this app+isr two user problem.
ldrex/strex on non-cached memory space:
My extest perhaps has more text on the when you can and cant use ldrex/strex, unfortunately the arm docs are not that good in this area. They tell you to stop using swp, which implies you should use strex/ldrex. But then switch to the hardware manual which says you dont have to support exclusive operations on a uniprocessor system. Which says two things, ldrex/strex are meant for multiprocessor systems and meant for sharing resources between processors on a multiprocessor system. Also this means that ldrex/strex is not necessarily supported on uniprocessor systems. Then it gets worse. ARM logic generally stops either at the edge of the processor core, the L1 cache is contained within this boundary it is not on the axi/amba bus. Or if you purchased/use the L2 cache then the ARM logic stops at the edge of that layer. Then you get into the chip vendor specific logic. That is the logic that you read the hardware manual for where it says you dont NEED to support exclusive accesses on uniprocessor systems. So the problem is vendor specific. And it gets worse, ARM's L1 and L2 cache so far as I have found do support ldrex/strex, so if you have the caches on then ldrex/strex will work on a system whose vendor code does not support them. If you dont have the cache on that is when you get into trouble on those systems (that is the extest thing I wrote).
The processors that have ldrex/strex are new enough to have a big bank of config registers accessed through copressor reads. buried in there is a "swp instruction supported" bit to determine if you have a swap. didnt the cortex-m3 folks run into the situation of no swap and no ldrex/strex?
The bug in the linux kernel (there are many others as well for other misunderstandings of arm hardware and documentation) is that on a processor that supports ldrex/strex the ldrex/strex solution is chosen without determining if it is multiprocessor, so you can (and I know of two instances) get into an infinite ldrex/strex loop. If you modify the linux code so that it uses the swp solution (there is code there for either solution) they linux will work. why only two people have talked about this on the internet that I know of, is because you have to turn off the caches to have it happen (so far as I know), and who would turn off both caches and try to run linux? It actually takes a fair amount of work to succesfully turn off the caches, modifications to linux are required to get it to work without crashing.
No, I cant tell you the systems, and no I do not now nor ever have worked for ARM. This stuff is all in the arm documentation if you know where to look and how to interpret it.
Generally, the ldrex and strex need support from the memory systems. You may wish to refer to some answers by dwelch as well as his extext application. I would believe that you can not do this for memory mapped I/O. ldrex and strex are intended more for Lock Free algorithms, in normal memory.
Generally only one driver should be in charge of a bank of I/O registers. Software will make requests to that driver via semaphores, etc which can be implement with ldrex and strex in normal SDRAM. So, you can inter-lock these I/O registers, but not in the direct sense.
Often, the I/O registers will support atomic access through write one to clear, multiplexed access and other schemes.
Write one to clear - typically use with hardware events. If code handles the event, then it writes only that bit. In this way, multiple routines can handle different bits in the same register.
Multiplexed access - often an interrupt enable/disable will have a register bitmap. However, there are also alternate register that you can write the interrupt number to which enable or disable a particular register. For instance, intmask maybe two 32 bit registers. To enable int3, you could mask 1<<3 to the intmask or write only 3 to an intenable register. They intmask and intenable are hooked to the same bits via hardware.
So, you can emulate an inter-lock with a driver or the hardware itself may support atomic operations through normal register writes. These schemes have served systems well for quiet some time before people even started to talk about lock free and wait free algorithms.
Like previous answers state, ldrex/strex are not intended for accessing the resource itself, but rather for implementing the synchronization primitives required to protect it.
However, I feel the need to expand a bit on the architectural bits:
ldrex/strex (pronounced load-exclusive/store-exclusive) are supported by all ARM architecture version 6 and later processors, minus the M0/M1 microcontrollers (ARMv6-M).
It is not architecturally guaranteed that load-exclusive/store-exclusive will work on memory types other than "Normal" - so any clever usage of them on peripherals would not be portable.
The SWP instruction isn't being recommended against simply because its very nature is counterproductive in a multi-core system - it was deprecated in ARMv6 and is "optional" to implement in certain ARMv7-A revisions, and most ARMv7-A processors already require it to be explicitly enabled in the cp15 SCTLR. Linux by default does not, and instead emulates the operation through the undef handler using ... load-exclusive and store-exclusive (what #dwelch refers to above). So please don't recommend SWP as a valid alternative if you are expecting code to be portable across ARMv7-A platforms.
Synchronization with bus masters not in the inner-shareable domain (your cache-coherency island, as it were) requires additional external hardware - referred to as a global monitor - in order to track which masters have requested exclusive access to which regions.
The "not required on uniprocessor systems" bit sounds like the ARM terminology getting in the way. A quad-core Cortex-A15 is considered one processor... So testing for "uniprocessor" in Linux would not make one iota of a difference - the architecture and the interconnect specifications remain the same regardless, and SWP is still optional and may not be present at all.
Cortex-M3 supports ldrex/strex, but its interconnect (AHB-lite) does not support propagating it, so it cannot use it to synchronize with external masters. It does not support SWP, never introduced in the Thumb instruction set, which its interconnect would also not be able to propagate.
If the chip in question has a toggle register (which is essentially XORed with the output latch when written to) there is a work around.
load port latch
mask off unrelated bits
xor with desired output
write to toggle register
as long as two processes do not modify the same pins (as opposed to "the same port") there is no race condition.
In the case of the bcm2708 you could choose an output pin whose neighbors are either unused or are never changed and write to GPFSELn in byte mode. This will however only ensure that you will not corrupt others. If others are writing in 32 bit mode and you interrupt them they will still corrupt you. So its kind of a hack.
Hope this helps

Linux Preemptive kernel implications?

What are the implications of a linux kernel being preemptive, particularly for creating device drivers. I'm guessing you need to be more diligent about resource locking, but is there anything more to this?
There are a lot more opportunities for race conditions as you had mentioned, so yes, you have to be very diligent with locks. You also have to be careful about timing, such as when you enable/disable interrupts or other hardware resources, etc. You don't always have to use locks for these situations, but you may have to reorder you code. Finally, it also effects scheduling, allowing high priority tasks to be much more responsive, which in turn may have a negative effect on the lower priority tasks.
If not on SMP, make sure this lock patch need to be applied: "Gaurantee spinlocks implicit barrier for !PREEMPT_COUNT", that was made in Apr 2013.
Be aware that each time the code runs "spin_unlock_" or "preemption_enable", a preemption could kick in. It's the same whenever an exception returns or interrupt returns. Beyond those cases and the likes, there should be no other concerns. The kernel design guarantees exceptions and interrupts are handled in strictly embedded manner, though with SMP multiple instances may run in parallel.

What simple method can I use to debug an embedded processor without serial port or video?

We have a small embedded system without any video or serial ports (i.e. we can't output text via printf).
We would like to track the progress of our code through the initialization sequence.
Is there some simple things we can do to help with this.
It is not running any OS, and the hardware platform is somewhat customizable.
The simplest most scalable solution are state LEDs. Toggle LEDs based on actions, either in binary form or when certain actions occur if you can narrow your focus.
The most powerful will be a hardware JTAG device. You don't even need to set breakpoints - simply being able to stop the application and inspect the state of memory may be enough. Note that some hardware platforms do not support "fancy" options such as memory watches or hardware breakpoints. The former is usually worked around with constantly stopping the processor and reading memory (turns your 10MHz system into a 1kHz system), while the latter is sometimes performed using code replacement (replace the targeted instruction with a different jump), which sometimes masks other problems. Be aware of these issues and which embedded processors they apply to.
There are a few strategies you can employ to help with debugging:
If you have Output Pins available, you can hook them up to LEDs (or an oscilloscope) and toggle the output pins high/low to indicate that certain points have been reached in the code.
For example, 1 blink might be program loaded, 2 blink is foozbar initialized, 3 blink is accepting inputs...
If you have multiple output lines available, you can use a 7 segment LED to convey more information (numbers/letters instead of blinks).
If you have the capabilities to read memory and have some RAM available, you can use the sprint function to do printf-like debugging, but instead of going to a screen/serial port, it is written in memory.
It depends on the type of debugging that you're trying to do - in particular if you're after a temporary method of tracing or if you are trying to provide a tool that can be used as an indication of status during the life of the project (or product).
For one off, in depth source tracing and debugging an in-circuit debugger (eg. jtag) can be very helpful. However, they are most helpful where your debugging requires setting breakpoints and investigating memory and registers - which makes it of little benefit where you are dealing time critical problems.
Where you need to determine program state without having a significant impact on the execution of your system the use of LEDs connected to spare I/O pins will be helpful. These can also be used as the input to a digital storage oscilloscope (DSO) or logic analyzer.
This technique can be made more powerful by selecting unique patterns of pulses that will be identifiable on the DSO.
For a more versatile debugging tool, though, a serial port is a good solution. To save cost and PCB real-estate you may find it useful to use an plug-in module that contains the RS232 converters.
If you are trying to provide a longer term indication of status as part of the normal operation of your product, LEDs are again a cheap an simple method. However in this situation it is best to choose patterns of pulses that are slow enough to be easily identified by visual inspection. This will all you over time you will learn a particular pattern that represents "normal" behavior.
You can easily emulate serial communications (UARTs) using bit-banging from the IO pins of the system. Hook it to one of the card's pins and attach to a RS232 converter there (TTL to RS232 converters are easy to either buy or build), which goes to your PC's serial port.
A JTAG debugger is also an option, though cumbersome to set up.
If you don't have JTAG, the LEDs suggested by the others are a great idea - although you do tend to end up in a test/rebuild cycle to try to track down the issue.
If you've got more time, and spare hardware pins, and memory to spare, you could always bit-bash a low speed serial interface. I've found that pretty useful in the past.
Others have suggested some pretty good ideas using output pins, so I won't suggest that, although it can be a very good solution, and is very cost effective. If your budget and target processor support it, a hardware trace system, (either an old fashioned emulator, or a fancy BDM with bus snooping trace support) can be great for this type of thing. It's very expensive though.
The idea of using a bit-banged software UART is nice, but there's some effort required in writing one and also you need some free timers and interrupts. If your hardware has any other unused serial interface (SPI, I2C, ..), using them would be easier. With a small microcontroller you could convert the interface to RS-232.
If you have to go for the bit-banging, making a synchronous serial might be a simpler alternative as it wouldn't be critical to timing.

Resources