Redefine some functions of gcc-arm-none-eabi's stdlibc - gcc

STM32 chips (and many others) have hardware random number generator (RNG), it is faster and more reliable than software RNG provided by libc. Compiler knows nothing about hardware.
Is there a way to redefine implementation of rand()?
There are other hardware modules, i.e real time clock (RTC) which can provide data for time().

You simply override them by defining functions with identical signature. If they are defined WEAK in the standard library they will be overridden, otherwise they are overridden on a first resolution basis so so long as your implementation is passed to the linker before libc is searched, it will override. Moreover .o / .obj files specifically are used in symbol resolution before .a / .lib files, so if your implementation is included in your project source, it will always override.
You should be careful to get the semantics of your implementation correct. For example rand() returns a signed integer 0 to RAND_MAX, which is likley not teh same as the RNG hardware. Since RAND_MAX is a macro, changing it would require changing the standard header, so your implementation needs to enforce the existing RAND_MAX.
Example using STM32 Standard Peripheral Library:
#include <stdlib.h>
#include <stm32xxx.h> // Your processor header here
#if defined __cplusplus
extern "C"
{
#endif
static int rng_running = 0 ;
int rand( void )
{
if( rng_running == 0 )
{
RCC_AHB2PeriphClockCmd(RCC_AHB2Periph_RNG, ENABLE);
RNG_Cmd(ENABLE);
rng_running = 1 ;
}
while(RNG_GetFlagStatus(RNG_FLAG_DRDY)== RESET) { }
// Assumes RAND_MAX is an "all ones" integer value (check)
return (int)(RNG_GetRandomNumber() & (unsigned)RAND_MAX) ;
}
void srand( unsigned ) { }
#if defined __cplusplus
}
#endif
For time() similar applies and there is an example at Problem with time() function in embedded application with C

Related

What is causing this error: SSE register return with SSE disabled?

I'm new to kernel development, and I need to write a Linux kernel module that performs several matrix multiplications (I'm working on an x64_64 platform). I'm trying to use fixed-point values for these operations, however during compilation, the compiler encounters this error:
error: SSE register return with SSE disabled
I don't know that much about SSE or this issue in particular, but from what i've found and according to most answers to questions about this problem, it is related to the usage of Floating-Point (FP) arithmetic in kernel space, which seems to be rarely a good idea (hence the utilization of Fixed-Point arithmetics). This error seems weird to me because I'm pretty sure I'm not using any FP values or operations, however it keeps popping up and in some ways that seem weird to me. For instance, I have this block of code:
#include <linux/init.h>
#include <linux/kernel.h>
#include <linux/module.h>
const int scale = 16;
#define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
#define FIXED_TO_DOUBLE(x) ((x) / (1 << scale))
#define MULT(x, y) ((((x) >> 8) * ((y) >> 8)) >> 0)
#define DIV(x, y) (((x) << 8) / (y) << 8)
#define OUTPUT_ROWS 6
#define OUTPUT_COLUMNS 2
struct matrix {
int rows;
int cols;
double *data;
};
double outputlayer_weights[OUTPUT_ROWS * OUTPUT_COLUMNS] =
{
0.7977986, -0.77172316,
-0.43078753, 0.67738613,
-1.04312621, 1.0552227 ,
-0.32619684, 0.14119884,
-0.72325027, 0.64673559,
0.58467862, -0.06229197
};
...
void matmul (struct matrix *A, struct matrix *B, struct matrix *C) {
int i, j, k, a, b, sum, fixed_prod;
if (A->cols != B->rows) {
return;
}
for (i = 0; i < A->rows; i++) {
for (j = 0; j < B->cols; j++) {
sum = 0;
for (k = 0; k < A->cols; k++) {
a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
b = DOUBLE_TO_FIXED(B->data[k * B->rows + j]);
fixed_prod = MULT(a, b);
sum += fixed_prod;
}
/* Commented the following line, causes error */
//C->data[i * C->rows + j] = sum;
}
}
}
...
static int __init insert_matmul_init (void)
{
printk(KERN_INFO "INSERTING MATMUL");
return 0;
}
static void __exit insert_matmul_exit (void)
{
printk(KERN_INFO "REMOVING MATMUL");
}
module_init (insert_matmul_init);
module_exit (insert_matmul_exit);
which compiles with no errors (I left out code that I found irrelevant to the problem). I have made sure to comment any error-prone lines to get to a point where the program can be compiled with no errors, and I am trying to solve each of them one by one. However, when uncommenting this line:
C->data[i * C->rows + j] = sum;
I get this error message in a previous (unmodified) line of code:
error: SSE register return with SSE disabled
sum += fixed_prod;
~~~~^~~~~~~~~~~~~
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error. Maybe my fixed-point implementation is flawed (I'm no expert in that matter either), or maybe I'm missing something obvious. Just in case, I have tested the same logic in a user-space program (using Floating-Point values) and it seems to work fine. In either case, any help in solving this issue would be appreciated. Thanks in advance!
Edit: I have included the definition of matrix and an example matrix. I have been using the default kbuild command for building external modules, here is what my Makefile looks like:
obj-m = matrix_mult.o
KVERSION = $(shell uname -r)
all:
make -C /lib/modules/$(KVERSION)/build M=$(PWD) modules
Linux compiles kernel code with -mgeneral-regs-only on x86, which produces this error in functions that do anything with FP or SIMD. (Except via inline asm, because then the compiler doesn't see the FP instructions, only the assembler does.)
From what I understand, there are no FP operations taking place, at least in this section, so I need help figuring out what might be causing this error.
GCC optimizes whole functions when optimization is enabled, and you are using FP inside that function. You're doing FP multiply and truncating conversion to integer with your macro and assigning the result to an int, since the MCVE you eventually provided shows struct matrix containing double *data.
If you stop the compiler from using FP instructions (like Linux does by building with -mgeneral-regs-only), it refuses to compile your file instead of doing software floating-point.
The only odd thing is that it pins down the error to an integer += instead of one of the statements that compiles to a mulsd and cvttsd2si
If you disable optimization (-O0 -mgeneral-regs-only) you get a more obvious location for the same error (https://godbolt.org/z/Tv5nG6nd4):
<source>: In function 'void matmul(matrix*, matrix*, matrix*)':
<source>:9:33: error: SSE register return with SSE disabled
9 | #define DOUBLE_TO_FIXED(x) ((x) * (1 << scale))
| ~~~~~^~~~~~~~~~~~~~~
<source>:46:21: note: in expansion of macro 'DOUBLE_TO_FIXED'
46 | a = DOUBLE_TO_FIXED(A->data[i * A->rows + k]);
| ^~~~~~~~~~~~~~~
If you really want to know what's going on with the GCC internals, you could dig into it with -fdump-tree-... options, e.g. on the Godbolt compiler explorer there's a dropdown for GCC Tree / RTL output that would let you look at the GIMPLE or RTL internal representation of your function's logic after various analyzer passes.
But if you just want to know whether there's a way to make this function work, no obviously not, unless you compile a file without -mgeneral-registers-only. All functions in a file compiled that way must only be called by callers that have used kernel_fpu_begin() before the call. (and kernel_fpu_end after).
You can't safely use kernel_fpu_begin inside a function compiled to allow it to use SSE / x87 registers; it might already have corrupted user-space FPU state before calling the function, after optimization. The symptom of getting this wrong is not a fault, it's corrupting user-space state, so don't assume that happens to work = correct. Also, depending on how GCC optimizes, the code-gen might be fine with your version, but might be broken with earlier or later GCC or clang versions. I somewhat expect that kernel_fpu_begin() at the top of this function would get called before the compiler did anything with FP instructions, but that doesn't mean it would be safe and correct.
See also Generate and optimize FP / SIMD code in the Linux Kernel on files which contains kernel_fpu_begin()?
Apparently -msse2 overrides -mgeneral-regs-only, so that's probably just an alias for -mno-mmx -mno-sse and whatever options disables x87. So you might be able to use __attribute__((target("sse2"))) on a function without changing build options for it, but that would be x86-specific. Of course, so is -mgeneral-regs-only. And there isn't a -mno-general-regs-only option to override the kernel's normal CFLAGS.
I don't have a specific suggestion for the best way to set up a build option if you really do think it's worth using kernel_fpu_begin at all, here (rather than using fixed-point the whole way through).
Obviously if you do save/restore the FPU state, you might as well use it for the loop instead of using FP to convert to fixed-point and back.

HLS: How to separate AXI4 signals

I am trying to write a module that uses the AXI4 streaming protocol to communicate with the previous and next modules. The modules use the following communication signals:
TDATA, which is 16 bits,
TKEEP, which is 2 bits,
TUSER, which is 1 bit,
TVALID, which is 1 bit,
TREADY, which is 1 bit and goes towards the previous module, and
TLAST, which is 1 bit.
These all need to be separate signals. I tried to implement it using the following code:
#include "core.h"
void core_module(hls::stream<ap_axis_str> &input_stream, hls::stream<ap_axis_str> &output_stream){
#pragma HLS INTERFACE axis port=input_stream
#pragma HLS INTERFACE axis port=output_stream
#pragma HLS INTERFACE s_axilite port=return bundle=CTRL
ap_axis_str strm_val_in;
ap_axis_str strm_val_out;
for (int i = 0; i<NDATA; i++){
strm_val_in = input_stream.read();
strm_val_out.data = strm_val_in.data * 2;
strm_val_out.keep = 3;
strm_val_out.valid = 1;
strm_val_in.ready = 1;
strm_val_out.user = ((i%2)==0);
strm_val_out.last = (i == NDATA-1) ? 1:0;
output_stream.write(strm_val_out);
}
}
where the header file is
#ifndef core_h
#define core_h
#include <ap_int.h>
#include <ap_axi_sdata.h>
#include <hls_stream.h>
typedef ap_uint<16> word;
#define NDATA 10
struct ap_axis_str {
word data;
ap_uint<2> keep;
bool user;
bool last;
bool ready;
bool valid;
};
void core_module(hls::stream<ap_axis_str> &input_stream, hls::stream<ap_axis_str> &output_stream);
#endif
The problem is that this doesn't separate the signals. When I synthesise it and run it in the co-simulation (giving it values from 0 to 9), even if the result is what I expect it to be, the waveform produced looks like this:
We can see that TREADY, TVALID, and TDATA are there, but not the other 3. Furthermore, looking at the contents of TDATA (which for some reason are 64 bits) we notice that they contain all the signals. They are the following:
0001000001030000,
0001000000030002,
0001000001030004,
0001000000030006,
...
000100000003000c, (they are in base 16)
0001000001030010,
0001000100030012.
From which we can see that the 3 in position 12 is probably what was intended to be TKEEP, the 1 in position 8 which only appears in the last case is probably what was intended to be TUSER, the last 4 digits are what was supposed to be TDATA, etc. Additionally, the program drops TREADY when it isn't ready to receive data, which is what is intended of TREADY, but I didn't program it to work this way, which means that it's automatically generated and probably has nothing to do with the TREADY I told it to have.
So my question is: How do I make a module that gives out the correct 6 separate signals for the version of the AXI4 protocol that we are using?
Well, according to the Xilinx Documentation,
If you specify an hls::stream object with a data type other than ap_axis or ap_axiu, the tool will infer an AXI4-Stream interface without the TLAST signal, or any of the side-channel signals. This implementation of the AXI4-Stream interface consumes fewer device resources, but offers no visibility into when the stream is ending.
Now I had already imported the needed module with#include <ap_axi_sdata.h>, all I needed to do was actually use it by removing
struct ap_axis_str {
word data;
ap_uint<2> keep;
bool user;
bool last;
bool ready;
bool valid;
};
and replacing it with
typedef ap_axiu<16, 1, 0, 0> ap_axis_str;
Additionally, I needed to remove my manual attempt to control TREADY and TVALID, as those are done automatically.

Use field of union in intialiser of another variable

I am building records for unit testing a software module. Record data is serialised before sending it to the UUT.
The records contain bitfields, so I would like to build serialised records using these same bitfields at compile-time (to prevent having to account for little- and big-endian issues and where the bits in bitfield go) and use a union to access the (serialised) data. I have to calculate a checksum over the record, so I need the bitfields as bytes to do so.
My attempt so far is:
/* defines for 64 bit valid record */
#define REC3_ID EEID_ARRAY_FIRST
#define REC3_SIZE 1
#define REC3_INDEX 248
#define REC3_SI0 MAKE_SIZE_INDEX0(REC3_SIZE,REC3_INDEX)
#define REC3_SI1 MAKE_SIZE_INDEX1(REC3_SIZE,REC3_INDEX)
#define REC3_VALUE0 0xf2
#define REC3_VALUE1 0x4f
#define REC3_VALUE2 0xb8
#define REC3_VALUE3 0xa0
#define REC3_DATA \
MAKE_CHKSUM7(REC3_ID,REC3_SI0,REC3_SI1,REC3_VALUE0,REC3_VALUE1,REC3_VALUE2,REC3_VALUE3),\
REC3_ID,REC3_SI0,REC3_SI1,REC3_VALUE0,REC3_VALUE1,REC3_VALUE2,REC3_VALUE3
#define CHKSUM_SEED (0x2a)
#define MAKE_CHKSUM7(v0,v1,v2,v3,v4,v5,v6) (0x100-(((v0)+(v1)+(v2)+(v3)+(v4)+(v5)+(v6)+CHKSUM_SEED)%0x100))
typedef union
{
uint8_t si[2];
struct
{
uint16_t s: 6;
uint16_t i: 10;
} b;
} si_t;
MAKE_SIZE_INDEX0(size,index) ((si_t){.b.s=size,.b.i=index}).si[0]
MAKE_SIZE_INDEX1(size,index) ((si_t){.b.s=size,.b.i=index}).si[1]
static uint8_t rec3[] = {REC3_DATA};
The problem lies with the macros MAKE_SIZE_INDEX0 and MAKE_SIZE_INDEX1. I can't get them to compile (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11))
The problem can be simplified to:
uint8_t rec[] = {0x12, (si_t){.b.s=4,.b.i=8}.si[0], (si_t){.b.s=4,.b.i=8}.si[1], 0x34};
But that results in error:
error: initializer element is not constant
I know I can create my records at run-time, but I wondered whether it is possible to let the preprocessor handle it.
My alternative is something like:
#if defined (TGT_ARCHITECTURE_x86_64)
#define MAKE_SIZE_INDEX0(size,index) (((size)&0x3f)+(((index)<<6)&0xc0))
#define MAKE_SIZE_INDEX1(size,index) ((index>>2)&0xff)
#endif
But this depends on whether the target is little-endian or big-endian and how it stores bitfields.
static uint8_t rec3[] = { (si_t){.b.s=4,.b.i=8}.si[0] };
Variables with static storage duration must be initialized only using a static initializer - it has to be a constant expression. There is a list of what is allowed in a constant expression - using array subscript operator on a array embedded in a compound literal is not allowed in a constant expression. It's basically the same as you can't do static int a[] = {1, 2}; static int b = a[1];
On a side note, the standard says that implementations are allowed to accept custom forms of constant expression. So the code may happen to work with a different compiler and even a different gcc version (as with newest gcc versions you may initialize variable with static storage duration with const qualived variable, which is an extension).
The compiler errors with "initializer element is not constant", as the element used to initialize variable with static storage duration is not a constant expression.
Using bit-fields to extract a bit-mask of a variable is compiler dependent, compiler options dependent (gcc storage layout) and shouldn't be used in portable code. Compiler is free to reorder the bitfields in your struct and is free to add padding between bit fields members. As advertised on stackoverflow many, many times, use bitmasks - they work every time.

duplicate symbol of a function defined in a header file

Suppose I have a header file file_ops.hpp that looks something like this
#pragma once
bool systemIsLittleEndian() {
uint16_t x = 0x0011;
uint8_t *half_x = (uint8_t *) &x;
if (*half_x == 0x11)
return true;
else
return false;
}
I initially thought it had something to do with the implementation, but as it turns out, I'll get duplicate symbols with just
#pragma once
bool systemIsLittleEndian() { return true; }
If I make it inline, the linker errors go away. That's not something I want to rely on, since inline is a request not a guarantee.
What causes this behavior? I'm not dealing with a scenario where I'm returning some kind of singleton.
There are other methods that are marked as
bool MY_LIB_EXPORT someFunc();// implemented in `file_ops.cpp`
are these related somehow (mixed exported functions and "plain old functions")? Clearly I can just move the implementation to file_ops.cpp, I'm rather intrigued as to why this happens.
If I make it inline, the linker errors go away. That's not something I want to rely on, since inline is a request not a guarantee.
It's OK to inline the function.
Even if the object code is not inlined, the language guarantees that is will not cause linker errors or undefined behavior as long as the function is somehow not altered in different translation units.
If you #include the .hpp in hundreds of .cpp files, you may notice a bit of code bloat but the program is still correct.
What causes this behavior? I'm not dealing with a scenario where I'm returning some kind of singleton.
The #include mechanism is a convenience for reducing the amount of code you have to manually create in multiple files with the exact content. In the end, all translation units that #include other files get the lines of code from the files they #include.
If you #include file_ops.hpp in, let's say, file1.cpp and file2.cpp, it's as if you have:
file1.cpp:
bool systemIsLittleEndian() {
uint16_t x = 0x0011;
uint8_t *half_x = (uint8_t *) &x;
if (*half_x == 0x11)
return true;
else
return false;
}
file2.cpp:
bool systemIsLittleEndian() {
uint16_t x = 0x0011;
uint8_t *half_x = (uint8_t *) &x;
if (*half_x == 0x11)
return true;
else
return false;
}
When you compile those two .cpp files and link them together to create an executable, the linker notices that there are two definitions of the function named systemIsLittleEndian. That's the source of the linker error.
One solution without using inline
One solution to your problem, without using inline, is:
Declare the function in the .hpp file.
Define it in the appropriate .cpp file..
file_ops.hpp:
bool systemIsLittleEndian(); // Just the declaration.
file_ops.cpp:
#include "file_ops.hpp"
// The definition.
bool systemIsLittleEndian() {
uint16_t x = 0x0011;
uint8_t *half_x = (uint8_t *) &x;
if (*half_x == 0x11)
return true;
else
return false;
}
Update
Regarding
bool MY_LIB_EXPORT someFunc();// implemented in `file_ops.cpp`
There is lots of information on the web regarding. This is a Microsoft/Windows issue. Here are couple of starting points to learn about it.
Exporting from a DLL Using __declspec(dllexport)
Importing into an Application Using __declspec(dllimport)

Does MSP430 GCC support newer C++ standards? (like 11, 14, 17)

I'm writing some code that would greatly benefit from the concise syntax of lambdas, which were introduced with C++ 11. Is this supported by the compiler?
How do I specify the compiler flags when compiling using Energia or embedXcode?
As of February 2018, up to C++14 is supported with some limitations:
http://processors.wiki.ti.com/index.php/C%2B%2B_Support_in_TI_Compilers
There isn't much about this topic on the TI site, or, at least, I don't know enough C++ to give you a detailed and precise response.
The implementation of the embedded ABI is described in this document that is mainly a derivation of the Itanium C++ ABI. It explains nothing about the implementation of lambdas nor the auto, keyword (or probably I'm not able to derive this information from the documentation).
Thus I decided to directly test in Energia. Apparently the g++ version is 4.6.3, thus it should support both.
And in fact (from a compilation point of view, I don't have my MSP here to test the code) it can compile something like:
// In template.hpp
#ifndef TEMPLATE_HPP_
#define TEMPLATE_HPP_
template<class T>
T func(T a) {
auto c = [&](int n) { return n + a; };
return c(0);
}
#endif /* TEMPLATE_HPP_ */
// in the sketch main
#include "template.hpp"
void setup() { int b = func<int>(0); }
void loop() { }
(the template works only if in an header, in the main sketch raises an error). To compile this sketch I had to modify one internal file of the editor. The maximum supported standard seems to be -std=c++0x, and the compilation flags are in the file:
$ENERGIA_ROOT/hardware/energia/msp430/platform.txt
in my setup the root is in /opt/energia. Inside that file I modified line 32 (compiler.cpp.flags) and added the option. Notice that -std=c++11 is not supported (raises an error).
compiler.cpp.flags=-std=c++0x -c -g -O2 {compiler.mlarge_flag} {compiler.warning_flags} -fno-exceptions -ffunction-sections -fdata-sections -fno-threadsafe-statics -MMD
Unfortunately I have zero experience with embedXcode :\
Mimic std::function
std::function is not provided, thus you have to write some sort of class that mimics it. Something like:
// callback.hpp
#ifndef CALLBACK_HPP_
#define CALLBACK_HPP_
template <class RET, class ARG>
class Callback {
RET (*_f)(ARG);
public:
Callback() : _f(0) { };
Callback(RET (*f)(ARG)) : _f(f) { };
bool is_set() const { return (_f) ? true : false; }
RET operator()(ARG a) const { return is_set() ? _f(a) : 0; }
};
#endif /* CALLBACK_HPP_ */
// sketch
#include "callback.hpp"
// | !! empty capture!
void setup() { // V
auto clb = Callback<int, char>([](char c) { return (int)c; });
if (clb.is_set())
auto b = clb('a');
}
void loop() {}
may do the work, and it uses a simple trick:
The closure type for a lambda-expression with no lambda-capture has a public non-virtual non-explicit const conversion function to pointer to function having the same parameter and return types as the closure type’s function call operator. [C++11 standard 5.1.2]
As soon as you leave the capture empty, you are assured to have a "conversion" to a function pointer, thus you can store it without issues. The code I have written:
requires a first template RET that is the returned type
requires a second template ARG that is one argument for the callback. In the majority of the case you may consider to use void* as common argument (cast a struct pointer in a void pointer and use it as argument, to counter-cast in the function, the operation costs nothing)
implements two constructors: the empty constructor initialize the function pointer to NULL, while the second directly assigns the callback. Notice that the copy constructor is missing, you need to implement it.
implements a method to call the function (overloading the operator ()) and to check if the callback actually exists.
Again: this stuff compiles with no warnings, but I don't know if it works on the MSP430, since I cannot test it (it works on a common amd64 linux system).

Resources