Proper way to manipulate registers (PUT32 vs GPIO->ODR) - performance

I'm learning how to use microcontrollers without a bunch of abstractions. I've read somewhere that it's better to use PUT32() and GET32() instead of volatile pointers and stuff. Why is that?
With a basic pin wiggle "benchmark," the performance of GPIO->ODR=0xFFFFFFFF seems to be about four times faster than PUT32(GPIO_ODR, 0xFFFFFFFF), as shown by the scope:
(The one with lower frequency is PUT32)
This is my code using PUT32
PUT32(0x40021034, 0x00000002); // RCC IOPENR B
PUT32(0x50000400, 0x00555555); // PB MODER
while (1) {
PUT32(0x50000414, 0x0000FFFF); // PB ODR
PUT32(0x50000414, 0x00000000);
}
This is my code using the arrow thing
* (volatile uint32_t *) 0x40021034 = 0x00000002; // RCC IOPENR B
GPIOB->MODER = 0x00555555; // PB MODER
while (1) {
GPIOB->ODR = 0x00000000; // PB ODR
GPIOB->ODR = 0x0000FFFF;
}
I shamelessly adapted the assembly for PUT32 from somewhere
PUT32 PROC
EXPORT PUT32
STR R1,[R0]
BX LR
ENDP
My questions are:
Why is one method slower when it looks like they're doing the same thing?
What's the proper or best way to interact with GPIO? (Or rather what are the pros and cons of different methods?)
Additional information:
Chip is STM32G031G8Ux, using Keil uVision IDE.
I didn't configure the clock to go as fast as it can, but it should be consistent for the two tests.
Here's my hardware setup: (Scope probe connected to the LEDs. The extra wires should have no effect here)
Thank you for your time, sorry for any misunderstandings

PUT32 is a totally non-standard method that the poster in that other question made up. They have done this to avoid the complication and possible mistakes in defining the register access methods.
When you use the standard CMSIS header files and assign to the registers in the standard way, then all the complication has already been taken care of for you by someone who has specific knowledge of the target that you are using. They have designed it in a way that makes it hard for you to make the mistakes that the PUT32 is trying to avoid, and in a way that makes the final syntax look cleaner.
The reason that writing to the registers directly is quicker is because writing to a register can take as little as a single cycle of the processor clock, whereas calling a function and then writing to the register and then returning takes four times longer in the context of your experiment.
By using this generic access method you also risk introducing bugs that are not possible if you used the manufacturer provided header files: for example using a 32 bit access when the register is 16 or 8 bits.

Related

gcc-avr ATmega16/32 Programming

I have just entered into AVR MCU programming using gcc-avr, but when I see sample programs I am not able to make out much from the code:
DDRD |= (1 << PD7);
TCCR2 = (1 << WGM21) | (0 << WGM20);
TCCR2 |= (1 << COM20);
TCCR2 |= (6 << CS20);
I do not see also any declarations variables : DDRD, PD7, TCCR2, WGM21, WGM20, COM20, CS20, but they are directly used. Please let me know how I can know all pre-defined variables and its usage? It becomes very difficult in understanding the code without knowing the same.
Thanks in advance.
That kind of code is very common when programming embedded systems, although you will need to look at the header files and the AVR documentation to learn what those specific identifiers mean. Be aware that it can be very complex if you're new to this, and you will need to understand how to work with raw binary and C-style bit shifts/operators. (There are lots of tutorials online if you need to learn more about that.)
I'll try to explain the basic principle though.
All of the identifiers you saw will be preprocessor constants (i.e. #define ...), rather than variables. DDRD and TCCR2 will specify memory locations. These locations will be mapped onto certain functionality, so that setting or clearing certain bits at those locations will change the behaviour of the device (e.g. enable a clock divider, or set a GPIO pin high or low, etc.).
PD7, WGM21, WGM20, COM20, and CS20 will all be fairly small numbers. They specify how far you need to offset certain bit patterns to achieve certain results. Bit-wise operators (such as | and &) and bit-shift operators (typically <<) are used to create the patterns which are written to the memory locations. The documentation will tell you what patterns to use.
I'll use a simple fictional example to illustrate this. Let's say there is a register which controls the value of some output pins. We'll call the register OUTPUT1. Typically, each bit will correspond to the value of a specific pin. Turning on pin 4 (but leaving the other pins alone) might look like this:
OUTPUT1 |= (1 << PIN4);
This bitwise OR's the existing register with the pattern to turn pin 4 on. Turning that pin off again might look like this:
OUTPUT1 &= ~(1 << PIN4);
This bitwise AND's the existing register with everything except the pattern to turn pin 4 on (which results in clearing the bit). That's an entirely fictional example though, so don't actually try it!
The principle is basically the same for many different systems, so once you've learned it on AVR, you will hopefully be able to adapt to other devices as well.

Why doesn't gcc handle volatile register?

I'm working on a timing loop for the AVR platform where I'm counting down a single byte inside an ISR. Since this task is a primary function of my program, I'd like to permanently reserve a processor register so that the ISR doesn't have to hit a memory barrier when its usual code path is decrement, compare to zero, and reti.
The avr-libc docs show how to bind a variable to a register, and I got that working without a problem. However, since this variable is shared between the main program (for starting the timer countdown) and the ISR (for actually counting and signaling completion), it should also be volatile to ensure that the compiler doesn't do anything too clever in optimizing it.
In this context (reserving a register across an entire monolithic build), the combination volatile register makes sense to me semantically, as "permanently store this variable in register rX, but don't optimize away checks because the register might be modified externally". GCC doesn't like this, however, and emits a warning that it might go ahead and optimize away the variable access anyway.
The bug history of this combination in GCC suggests that the compiler team is simply unwilling to consider the type of scenario I'm describing and thinks it's pointless to provide for it. Am I missing some fundamental reason why the volatile register approach is in itself a Bad Idea, or is this a case that makes semantic sense but that the compiler team just isn't interested in handling?
The semantics of volatile are not exactly as you describe "don't optimize away checks because the register might be modified externally" but are actually more narrow: Try to think of it as "don't cache the variable's value from RAM in a register".
Seen this way, it does not make any sense to declare a register as volatile because the register itself cannot be 'cached' and therefore cannot possibly be inconsistent with the variable's 'actual' value.
The fact that read accesses to volatile variables are usually not optimzed away is merely a side effect of the above semantics, but it's not guaranteed.
I think GCC should assume by default that a value in a register is 'like volatile' but I have not verified that it actually does so.
Edit:
I just did a small test and found:
avr-gcc 4.6.2 does not treat global register variables like volatiles with respect to read accesses, and
the Naggy extension for Atmel Studio detects an error in my code: "global register variables are not supported".
Assuming that global register variables are actually considered "unsupported" I am not surprised that gcc treats them just like local variables, with the known implications.
My test code looks like this:
uint8_t var;
volatile uint8_t volVar;
register uint8_t regVar asm("r13");
#define NOP asm volatile ("nop\r\n":::)
int main(void)
{
var = 1; // <-- kept
if ( var == 0 ) {
NOP; // <-- optimized away, var is not volatile
}
volVar = 1; // <-- kept
if ( volVar == 0 ) {
NOP; // <-- kept, volVar *is* volatile
}
regVar = 1; // <-- optimized away, regVar is treated like a local variable
if ( regVar == 0 ) {
NOP; // <-- optimized away consequently
}
for(;;){}
}
The reason you would use the volatile keyword on AVR variables is to, as you said, avoid the compiler optimizing access to the variable. The question now is, how does this happen though?
A variable has two places it can reside. Either in the general purpose register file or in some location in RAM. Consider the case where the variable resides in RAM. To access the latest value of the variable, the compiler loads the variable from RAM, using some form of the ld instruction, say lds r16, 0x000f. In this case, the variable was stored in RAM location 0x000f and the program made a copy of this variable in r16. Now, here is where things get interesting if interrupts are enabled. Say that after loading the variable, the following occurs inc r16, then an interrupt triggers and its corresponding ISR is run. Within the ISR, the variable is also used. There is a problem, however. The variable exists in two different versions, one in RAM and one in r16. Ideally, the compiler should use the version in r16, but this one is not guaranteed to exist, so it loads it from RAM instead, and now, the code does not operate as needed. Enter then the volatile keyword. The variable is still stored in RAM, however, the compiler must ensure that the variable is updated in RAM before anything else happens, thus the following assembly may be generated:
cli
lds r16, 0x000f
inc r16
sei
sts 0x000f, r16
First, interrupts are disabled. Then, the the variable is loaded into r16. The variable is increased, interrupts are enabled and then the variable is stored. It may appear confusing for the global interrupt flag to be enabled before the variable is stored back in RAM, but from the instruction set manual:
The instruction following SEI will be executed before any pending interrupts.
This means that the sts instruction will be executed before any interrupts trigger again, and that the interrupts are disabled for the minimum amount of time possible.
Consider now the case where the variable is bound to a register. Any operations done on the variable are done directly on the register. These operations, unlike operations done to a variable in RAM, can be considered atomic, as there is no read -> modify -> write cycle to speak of. If an interrupt triggers after the variable is updated, it will get the new value of the variable, since it will read the variable from the register it was bound to.
Also, since the variable is bound to a register, any test instructions will utilize the register itself and will not be optimized away on the grounds the compiler may have a "hunch" it is a static value, given that registers by their very nature are volatile.
Now, from experience, when using interrupts in AVR, I have sometimes noticed that the global volatile variables never hit RAM. The compiler kept them on the registers all the time, bypassing the read -> modify -> write cycle alltogether. This was due, however, to compiler optimizations, and it should not be relied on. Different compilers are free to generate different assembly for the same piece of code. You can generate a disassembly of your final file or any particular object files using the avr-objdump utility.
Cheers.
Reserving a register for one variable for a complete compilation unit is probably too restrictive for a compiler's code generator. That is, every C routine would have to NOT use that register.
How do you guarantee that other called routines do NOT use that register once your code goes out of scope? Even stuff like serial i/o routines would have to NOT use that reserved register. Compilers do NOT recompile their run-time libraries based on a data definition in a user program.
Is your application really so time sensitive that the extra delay for bringing memory up from L2 or L3 can be detected? If so, then your ISR might be running so frequently that the required memory location is always available (i.e. it doesn't get paged back down thru the cache) and thus does NOT hit a memory barrier (I assume by memory barrier you are referring to how memory in a cpu really operates, through caching, etc.). But for this to really be true the up would have to have a fairly large L1 cache and the ISR would have to run at a very high frequency.
Finally, sometimes an application's requirements make it necessary to code it in ASM in which case you can do exactly what you are requesting!

Is it possible to save some data parmanently in AVR Microcontroller?

Well, the question says it all.
What I would like to do is that, every time I power up the micro-controller, it should take some data from the saved data and use it. It should not use any external flash chip.
If possible, please give some code-snippet so that I can use them in AVR studio 4. for example if I save 8 uint16_t data it should load those data into an array of uint16_t.
You have to burn the data to the program memory of the chip if you don't need to update them programmatically, or if you want read-write support, you should use the built-in EPROM.
Pgmem example:
#include <avr/pgmspace.h>
PROGMEM uint16_t data[] = { 0, 1, 2, 3 };
int main()
{
uint16_t x = pgm_read_word_near(data + 1); // access 2nd element
}
You need to get the datasheet for the part you are using. Microcontrollers like these typically contain at least a flash and sometimes multiple banks of flash to allow for different bootloaders while making it easy to erase one whole flash without affecting another. Likewise some have eeprom. This is all internal, not external. Esp since you say you need to save programatically this should work (remember how easy it is to wear out a flash do dont save unless you need to). Either eeprom or flash will meet the requirement of having that information there when you power up, non-volatile. As well as being able to save it programmatically. Googling will find a number of examples on how to do this, in addition to the datasheet you apparently have not read, as well as the app notes that also contain this information (that you should have read). If you are looking for some sort of one time programmable fuse blowing thing, there may be OTP versions of the avr, and you will have to read the datasheets, programmers references and app notes on how to program that memory, and should tell you if OTP parts can be written programmatically or if they are treated differently.
The reading of the data is in the memory map in the datasheet, write code to read those adresses. Writing is described in the datasheet (programmers reference manual, users guide, whatever atmel calls it) as well and there are many examples on the net.

Protecting memory from changing

Is there a way to protect an area of the memory?
I have this struct:
#define BUFFER 4
struct
{
char s[BUFFER-1];
const char zc;
} str = {'\0'};
printf("'%s', zc=%d\n", str.s, str.zc);
It is supposed to operate strings of lenght BUFFER-1, and garantee that it ends in '\0'.
But compiler gives error only for:
str.zc='e'; /*error */
Not if:
str.s[3]='e'; /*no error */
If compiling with gcc and some flag might do, that is good as well.
Thanks,
Beco
To detect errors at runtime take a look at the -fstack-protector-all option in gcc. It may be of limited use when attempting to detect very small overflows like the one your described.
Unfortunately you aren't going to find a lot of info on detecting buffer overflow scenarios like the one you described at compile-time. From a C language perspective the syntax is totally correct, and the language gives you just enough rope to hang yourself with. If you really want to protect your buffers from yourself you can write a front-end to array accesses that validates the index before it allows access to the memory you want.

Can I assume sizeof(GUID)==16 at all times?

The definition of GUID in the windows header's is like this:
typedef struct _GUID {
unsigned long Data1;
unsigned short Data2;
unsigned short Data3;
unsigned char Data4[ 8 ];
} GUID;
However, no packing is not defined. Since the alignment of structure members is dependent on the compiler implementation one could think this structure could be longer than 16 bytes in size.
If i can assume it is always 16 bytes - my code using GUIDs is more efficient and simple.
However, it would be completely unsafe - if a compiler adds some padding in between of the members for some reason.
My questions do potential reasons exist ? Or is the probability of the scenario that sizeof(GUID)!=16 actually really 0.
It's not official documentation, but perhaps this article can ease some of your fears. I think there was another one on a similar topic, but I cannot find it now.
What I want to say is that Windows structures do have a packing specifier, but it's a global setting which is somewhere inside the header files. It's a #pragma or something. And it is mandatory, because otherwise programs compiled by different compilers couldn't interact with each other - or even with Windows itself.
It's not zero, it depends on your system. If the alignment is word (4-bytes) based, you'll have padding between the shorts, and the size will be more than 16.
If you want to be sure that it's 16 - manually disable the padding, otherwise use sizeof, and don't assume the value.
If I feel I need to make an assumption like this, I'll put a 'compile time assertion' in the code. That way, the compiler will let me know if and when I'm wrong.
If you have or are willing to use Boost, there's a BOOST_STATIC_ASSERT macro that does this.
For my own purposes, I've cobbled together my own (that works in C or C++ with MSVC, GCC and an embedded compiler or two) that uses techniques similar to those described in this article:
http://www.pixelbeat.org/programming/gcc/static_assert.html
The real tricks to getting the compile time assertion to work cleanly is dealing with the fact that some compilers don't like declarations mixed with code (MSVC in C mode), and that the techniques often generate warnings that you'd rather not have clogging up an otherwise working build. Coming up with techniques that avoid the warnings is sometimes a challenge.
Yes, on any Windows compiler. Otherwise IsEqualGUID would not work: it compares only the first 16 bytes. Similarly, any other WinAPI function that takes a GUID* just checks the first 16 bytes.
Note that you must not assume generic C or C++ rules for windows.h. For instance, a byte is always 8 bits on Windows, even though ISO C allows 9 bits.
Anytime you write code dependent on the size of someone else's structure,
warning bells should go off.
Could you give an example of some of the simplified code you want to use?
Most people would just use sizeof(GUID) if the size of the structure was needed.
With that said -- I can't see the size of GUID ever changing.
#include <stdio.h>
#include <rpc.h>
int main () {
GUID myGUID;
printf("size of GUID is %d\n", sizeof(myGUID));
return 0;
}
Got 16. This is useful to know if you need to manually allocate on the heap.

Resources