lifetime of constant variable? - runtime

what is the lifetime of constant variables in microcontroller ?
Is the const variables allocated before Runtime or during Runtime ?
void main()
{
const x=5;
while(1)
{ }
}

It depends on the compiler.
Usually constant are placed in program memory. After compiling is done constants are built in in HEX file.
For example Microchip C18 and XC32 compilers has different handling. While C18 has const ROM with C32 you have to use -membedded-data flag to specify how and were in ROM constants will be placed.

Related

Redefine some functions of gcc-arm-none-eabi's stdlibc

STM32 chips (and many others) have hardware random number generator (RNG), it is faster and more reliable than software RNG provided by libc. Compiler knows nothing about hardware.
Is there a way to redefine implementation of rand()?
There are other hardware modules, i.e real time clock (RTC) which can provide data for time().
You simply override them by defining functions with identical signature. If they are defined WEAK in the standard library they will be overridden, otherwise they are overridden on a first resolution basis so so long as your implementation is passed to the linker before libc is searched, it will override. Moreover .o / .obj files specifically are used in symbol resolution before .a / .lib files, so if your implementation is included in your project source, it will always override.
You should be careful to get the semantics of your implementation correct. For example rand() returns a signed integer 0 to RAND_MAX, which is likley not teh same as the RNG hardware. Since RAND_MAX is a macro, changing it would require changing the standard header, so your implementation needs to enforce the existing RAND_MAX.
Example using STM32 Standard Peripheral Library:
#include <stdlib.h>
#include <stm32xxx.h> // Your processor header here
#if defined __cplusplus
extern "C"
{
#endif
static int rng_running = 0 ;
int rand( void )
{
if( rng_running == 0 )
{
RCC_AHB2PeriphClockCmd(RCC_AHB2Periph_RNG, ENABLE);
RNG_Cmd(ENABLE);
rng_running = 1 ;
}
while(RNG_GetFlagStatus(RNG_FLAG_DRDY)== RESET) { }
// Assumes RAND_MAX is an "all ones" integer value (check)
return (int)(RNG_GetRandomNumber() & (unsigned)RAND_MAX) ;
}
void srand( unsigned ) { }
#if defined __cplusplus
}
#endif
For time() similar applies and there is an example at Problem with time() function in embedded application with C

How to instruct avr-gcc to optimize volatile variables?

Code for an interrupt service handler:
volatile unsigned char x = 0;
void interruptHandler() __attribute__ ((signal));
void interruptHandler() {
f();
g();
}
Calls:
void f() { x ++; } // could be more complex, could also be in a different file
void g() { x ++; } // as `f()`, this is just a very simple example
Because x is a volatile variable, it is read and written every time it is used. The body of the interrupt handler compiles to (avr-gcc -g -c -Wa,-alh -mmcu=atmega328p -Ofast file.c):
lds r24,x
subi r24,lo8(-(1))
sts x,r24
lds r24,x
subi r24,lo8(-(1))
sts x,r24
Now I can manually inline the functions and employ a temporary variable:
unsigned char y = x;
y ++;
y ++;
x = y;
Or I can just write:
x += 2;
Both examples compile to the much more efficient:
lds r24,x
subi r24,lo8(-(2))
sts x,r24
Is it possible to tell avr-gcc to optimize access to volatile variables inside of interruptHandler, i.e. to do my manual optimization automatically?
After all, while interruptHandler is running, global interrupts are disabled, and it is impossible for x to change. I prefer not having to hand optimize code, thereby possibly creating duplicate code (if f() and g() are needed elsewhere) and introducing errors.
Is it possible to tell avr-gcc to optimize access to volatile variables inside of interruptHandler, i.e. to do my manual optimization automatically?
No, that is not possible in the C language.
After all, while interruptHandler is running, global interrupts are disabled
The compiler does not know this - and you could simply put an sei into the handler to turn them back on.
Also note that hardware registers are declared volatile, too. Some of these - like the UART data register - have side effects even when read. The compiler must not remove any reads or writes for these.
If you declare a variable to be volatile, then all accesses to it are volatile - the compiler will read and write it exactly as many times as the source code says, without combining them or doing similar optimisations.
So if you want combining optimisations, declare the variable without the "volatile" - then you will get what you need inside the interrupt code.
And then from outside the interrupt code, you can force volatile accesses using something like this macro:
#define volatileAccess(v) *((volatile typeof((v)) *) &(v))
Use "volatileAccess(x)" rather than "x" outside the interrupt code.
Just don't forget that "volatile" does not mean "atomic" !

How do memory operands work in avr-gcc inline assembly?

I'm trying to write a custom memory-copy function for AVR as inline assembly, because avr-gcc will always use a loop for memcpy and struct assignment, which is inefficient in terms of time. I want to use memory operands to avoid having to add a "memory" clobber. I currently have this:
void copy_2_bytes (char *restrict dst, char *restrict src)
{
struct S {
char x[2];
};
__asm__(
" ld __tmp_reg__,%[src]+\n"
" st %[dst]+,__tmp_reg__\n"
" ld __tmp_reg__,%[src]+\n"
" st %[dst]+,__tmp_reg__\n"
: [dst] "=m" ( *(struct S *)dst )
: [src] "m" ( *(struct S *)src )
);
}
This compiles, but it's incorrect in general because it modifies the pointer register pairs corresponding to the memory operands. It's easy to see that gcc assumes that the registers stay unchanged, for example by adding "*dst = 0;" after the assembly.
On the other hand, the Y and Z registers support the "ldd" and "std" instructions, which also take an immediate offset, so they can be used to access multiple bytes without being modified. But then there doesn't seem to be a way to force gcc to not select the X register, which doesn't support that.
UPDATE
Actually, if gcc determines that the address of the memory operand is constant, it will pass the constant address into the assembly, instead of a register pair. So now, I have absolutely no idea how to deal with this. Are there some magic instructions or assembly macros which can deal with both pointer registers and constant addresses at the same time?

How to use arrays in program (global) scope in OpenCL

AMD OpenCL Programming Guide, Section 6.3 Constant Memory Optimization:
Globally scoped constant arrays. These arrays are initialized,
globally scoped, and in the constant address space (as specified in
section 6.5.3 of the OpenCL specification). If the size of an array is
below 64 kB, it is placed in hardware constant buffers; otherwise, it
uses global memory. An example of this is a lookup table for math
functions.
I want to use this "globally scoped constant array". I have such code in pure C
#define SIZE 101
int *reciprocal_table;
int reciprocal(int number){
return reciprocal_table[number];
}
void kernel(int *output)
{
for(int i=0; i < SIZE; i+)
output[i] = reciprocal(i);
}
I want to port it into OpenCL
__kernel void kernel(__global int *output){
int gid = get_global_id(0);
output[gid] = reciprocal(gid);
}
int reciprocal(int number){
return reciprocal_table[number];
}
What should I do with global variable reciprocal_table? If I try to add __global or __constant to it I get an error:
global variable must be declared in addrSpace constant
I don't want to pass __constant int *reciprocal_table from kernel to reciprocal. Is it possible to initialize global variable somehow? I know that I can write it down into code, but does other way exist?
P.S. I'm using AMD OpenCL
UPD Above code is just an example. I have real much more complex code with a lot of functions. So I want to make array in program scope to use it in all functions.
UPD2 Changed example code and added citation from Programming Guide
#define SIZE 2
int constant array[SIZE] = {0, 1};
kernel void
foo (global int* input,
global int* output)
{
const uint id = get_global_id (0);
output[id] = input[id] + array[id];
}
I can get the above to compile with Intel as well as AMD. It also works without the initialization of the array but then you would not know what's in the array and since it's in the constant address space, you could not assign any values.
Program global variables have to be in the __constant address space, as stated by section 6.5.3 in the standard.
UPDATE Now, that I fully understood the question:
One thing that worked for me is to define the array in the constant space and then overwrite it by passing a kernel parameter constant int* array which overwrites the array.
That produced correct results only on the GPU Device. The AMD CPU Device and the Intel CPU Device did not overwrite the arrays address. It also is probably not compliant to the standard.
Here's how it looks:
#define SIZE 2
int constant foo[SIZE] = {100, 100};
int
baz (int i)
{
return foo[i];
}
kernel void
bar (global int* input,
global int* output,
constant int* foo)
{
const uint id = get_global_id (0);
output[id] = input[id] + baz (id);
}
For input = {2, 3} and foo = {0, 1} this produces {2, 4} on my HD 7850 Device (Ubuntu 12.10, Catalyst 9.0.2). But on the CPU I get {102, 103} with either OCL Implementation (AMD, Intel). So I can not stress, how much I personally would NOT do this, because it's only a matter of time, before this breaks.
Another way to achieve this is would be to compute .h files with the host during runtime with the definition of the array (or predefine them) and pass them to the kernel upon compilation via a compiler option. This, of course, requires recompilation of the clProgram/clKernel for every different LUT.
I struggled to get this work in my own program some time ago.
I did not find any way to initialize a constant or global scope array from the host via some clEnqueueWriteBuffer or so. The only way is to write it explicitely in your .cl source file.
So here my trick to initialize it from the host is to use the fact that you are actually compiling your source from the host, which also means you can alter your src.cl file before compiling it.
First my src.cl file reads:
__constant double lookup[SIZE] = { LOOKUP }; // precomputed table (in constant memory).
double func(int idx) {
return(lookup[idx])
}
__kernel void ker1(__global double *in, __global double *out)
{
... do something ...
double t = func(i)
...
}
notice the lookup table is initialized with LOOKUP.
Then, in the host program, before compiling your OpenCL code:
compute the values of my lookup table in host_values[]
on your host, run something like:
char *buf = (char*) malloc( 10000 );
int count = sprintf(buf, "#define LOOKUP "); // actual source generation !
for (int i=0;i<SIZE;i++) count += sprintf(buf+count, "%g, ",host_values[i]);
count += sprintf(buf+count,"\n");
then read the content of your source file src.cl and place it right at buf+count.
you now have a source file with an explicitely defined lookup table that you just computed from the host.
compile your buffer with something like clCreateProgramWithSource(context, 1, (const char **) &buf, &src_sz, err);
voilĂ  !
It looks like "array" is a look-up table of sorts. You'll need to clCreateBuffer and clEnqueueWriteBuffer so the GPU has a copy of it to use.

Overloading conflict with vector types __m128, __m256 in GCC

I've started playing around with AVX instructions on the new Intel's Sandy Bridge processor. I'm using GCC 4.5.2, TDM-GCC 64bit build of MinGW64.
I want to overload operator<< for ostream to be able to print out the vector types __m256, __m128 etc to the console. But I'm running into an overloading conflict. The 2nd function in the following code produces an error "conflicts with previous declaration void f(__vector(8) float)":
void f(__m128 v) {
cout << 4;
}
void f(__m256 v) {
cout << 8;
}
It seems that the compiler cannot distinguish between the two types and consideres them both f(float __vector).
Is there a way around this? I haven't been able to find anything online. Any help is greatly appreciated.
I accidentally stumbled upon the answer when having a similar problem with function templates. In this case, the GCC error message actually suggested a solution:
add -fabi-version=4 compiler option.
This solves my problem, and hopefully doesn't cause any issues when linking the standard libraries.
One can read more about ABI (Application Binary Interface) and GCC at ABI Policy and Guidelines and ABI specification. ABI specifies how the functions names are mangled when the code is compiled into object files. Apparently, ABI version 3 used by GCC by default cannot distinguish between the various vector types.
I was unsatisfied with the solution of changing compiler ABI flags to solve this, so I went looking for a different solution. It seems they encountered this issue in writing the Eigen library - see this source file for details http://eigen.tuxfamily.org/dox-devel/SSE_2PacketMath_8h_source.html
My solution to this is a slightly tweaked version of theirs:
template <typename T, unsigned RegisterSize>
struct Register
{
using ValueType = T;
enum { Size = RegisterSize };
inline operator T&() { return myValue; }
inline operator const T&() const { return myValue; }
inline Register() {}
inline Register(const T & v) : myValue(v) {} // Not explicit
inline Register & operator=(const T & v)
{
myValue = v;
return *this;
}
T myValue;
};
using Register4 = Register<__m128, 4u>;
using Register8 = Register<__m256, 8u>;
// Could provide more declarations for __m128d, __m128i, etc. if needed
Using the above, you can overload on Register4, Register8, etc. or produce template functions taking Registers without running into linking issues and without changing ABI settings.

Resources