Using <chrono> as a timer in bare-metal microcontroller? - c++11

Can chrono be used as a timer/counter in a bare-metal microcontroller (e.g. MSP432 running an RTOS)? Can the high_resolution_clock (and other APIs in chrono) be configured so that it increments based on the given microcontroller's actual timer tick/register?
The Real-Time C++ book (section 16.5) seems to suggest this is possible, but I haven't found any examples of this being applied, especially within bare-metal microcontrollers.
How could this be implemented? Would this be even recommended? If not, where can chrono aid in RTOS-based embedded software?

I would create a clock that implements now by reading from your timer register:
#include <chrono>
#include <cstdint>
struct clock
{
using rep = std::int64_t;
using period = std::milli;
using duration = std::chrono::duration<rep, period>;
using time_point = std::chrono::time_point<clock>;
static constexpr bool is_steady = true;
static time_point now() noexcept
{
return time_point{duration{"asm to read timer register"}};
}
};
Adjust period to whatever speed your processor ticks at (but it does have to be a compile-time constant). Above I've set it for 1 tick/ms. Here is how it should read for 1 tick == 2ns:
using period = std::ratio<1, 500'000'000>;
Now you can say things like:
auto t = clock::now(); // a chrono::time_point
and
auto d = clock::now() - t; // a chrono::duration

Related

How to optimize SYCL kernel

I'm studying SYCL at university and I have a question about performance of a code.
In particular I have this C/C++ code:
And I need to translate it in a SYCL kernel with parallelization and I do this:
#include <sycl/sycl.hpp>
#include <vector>
#include <iostream>
using namespace sycl;
constexpr int size = 131072; // 2^17
int main(int argc, char** argv) {
//Create a vector with size elements and initialize them to 1
std::vector<float> dA(size);
try {
queue gpuQueue{ gpu_selector{} };
buffer<float, 1> bufA(dA.data(), range<1>(dA.size()));
gpuQueue.submit([&](handler& cgh) {
accessor inA{ bufA,cgh };
cgh.parallel_for(range<1>(size),
[=](id<1> i) { inA[i] = inA[i] + 2; }
);
});
gpuQueue.wait_and_throw();
}
catch (std::exception& e) { throw e; }
So my question is about c value, in this case I use directly the value two but this will impact on the performance when I'll run the code? I need to create a variable or in this way is correct and the performance are good?
Thanks in advance for the help!
Interesting question. In this case the value 2 will be a literal in the instruction in your SYCL kernel - this is as efficient as it gets, I think! There's the slight complication that you have an implicit cast from int to float. My guess is that you'll probably end up with a float literal 2.0 in your device assembly. Your SYCL device won't have to fetch that 2 from memory or cast at runtime or anything like that, it just lives in the instruction.
Equally, if you had:
constexpr int c = 2;
// the rest of your code
[=](id<1> i) { inA[i] = inA[i] + c; }
// etc
The compiler is almost certainly smart enough to propagate the constant value of c into the kernel code. So, again, the 2.0 literal ends up in the instruction.
I compiled your example with DPC++ and extracted the LLVM IR, and found the following lines:
%5 = load float, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
%add.i = fadd float %5, 2.000000e+00
store float %add.i, float addrspace(4)* %arrayidx.ascast.i.i, align 4, !tbaa !17
This shows a float load & store to/from the same address, with an 'add 2.0' instruction in between. If I modify to use the variable c like I demonstrated, I get the same LLVM IR.
Conclusion: you've already achieved maximum efficiency, and compilers are smart!

Redefine some functions of gcc-arm-none-eabi's stdlibc

STM32 chips (and many others) have hardware random number generator (RNG), it is faster and more reliable than software RNG provided by libc. Compiler knows nothing about hardware.
Is there a way to redefine implementation of rand()?
There are other hardware modules, i.e real time clock (RTC) which can provide data for time().
You simply override them by defining functions with identical signature. If they are defined WEAK in the standard library they will be overridden, otherwise they are overridden on a first resolution basis so so long as your implementation is passed to the linker before libc is searched, it will override. Moreover .o / .obj files specifically are used in symbol resolution before .a / .lib files, so if your implementation is included in your project source, it will always override.
You should be careful to get the semantics of your implementation correct. For example rand() returns a signed integer 0 to RAND_MAX, which is likley not teh same as the RNG hardware. Since RAND_MAX is a macro, changing it would require changing the standard header, so your implementation needs to enforce the existing RAND_MAX.
Example using STM32 Standard Peripheral Library:
#include <stdlib.h>
#include <stm32xxx.h> // Your processor header here
#if defined __cplusplus
extern "C"
{
#endif
static int rng_running = 0 ;
int rand( void )
{
if( rng_running == 0 )
{
RCC_AHB2PeriphClockCmd(RCC_AHB2Periph_RNG, ENABLE);
RNG_Cmd(ENABLE);
rng_running = 1 ;
}
while(RNG_GetFlagStatus(RNG_FLAG_DRDY)== RESET) { }
// Assumes RAND_MAX is an "all ones" integer value (check)
return (int)(RNG_GetRandomNumber() & (unsigned)RAND_MAX) ;
}
void srand( unsigned ) { }
#if defined __cplusplus
}
#endif
For time() similar applies and there is an example at Problem with time() function in embedded application with C

How do I instantiate a boost::date_time::time_duration with nanoseconds?

I feel like I'm missing something obvious here. I can easily make a time_duration with milliseconds, microseconds, or seconds by doing:
time_duration t1 = seconds(1);
time_duration t2 = milliseconds(1000);
time_duration t3 = microseconds(1000000);
But there's no function for nanoseconds. What's the trick to converting a plain integer nanoseconds value to a time_duration?
I'm on amd64 architecture on Debian Linux. Boost version 1.55.
boost::posix_time::microseconds is actually subsecond_duration<boost::posix_time::time_duration, 1000000>. So...
#include <boost/date_time/posix_time/posix_time.hpp>
using nanoseconds = boost::date_time::subsecond_duration<boost::posix_time::time_duration, 1000000000>;
int main() {
boost::posix_time::time_duration t = nanoseconds(1000000000);
std::cout << t << "\n";
}
Prints
00:00:01
UPDATE
Indeed, in the Compile Options for the Boost DateTime library you can see that there's an option to select nanosecond resolution:
By default the posix_time system uses a single 64 bit integer
internally to provide a microsecond level resolution. As an
alternative, a combination of a 64 bit integer and a 32 bit integer
(96 bit resolution) can be used to provide nano-second level
resolutions. The default implementation may provide better performance
and more compact memory usage for many applications that do not
require nano-second resolutions.
To use the alternate resolution (96 bit nanosecond) the variable
BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG must be defined in the library
users project files (ie Makefile, Jamfile, etc). This macro is not
used by the Gregorian system and therefore has no effect when building
the library.
Indeed, you can check it using:
Live On Coliru
#define BOOST_DATE_TIME_POSIX_TIME_STD_CONFIG
#include <boost/date_time/posix_time/posix_time.hpp>
int main() {
using namespace boost::posix_time;
std::cout << nanoseconds(1000000000) << "\n";
}

How can one share depth images between two processes?

I have 4 different depth cameras available to me: Kinect, Xtion, PMD nano, Softkinetic DepthSense.
I have the libraries that know how to read all of them: OpenNI, PMD drivers, Softkinetic drivers.
I would ideally like to make a simple grabber for each kind of camera and then just use it as a plugin into any other program i.e. get fast, non redundant access (i.e. not too many memory copies) to the data stream.
One of the problems is that in many cases I dont have the right library in 32 or 64 bit so I cant compile all grabbers in the same project.
What is the best way to achieve this?
I am a researcher so this idea isnt necessarily useful for production code but given this scenario my best solution has been to create a server process for each type of camera. Each server process knows how to load its own type of camera stream and then throws it into a shared memory space that other processes can read from.
It is obviously possible to use different kind of locking mechanisms but I have left the below code without any locks.
The server process will include the following:
#define BOOST_ALL_NO_LIB
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
using namespace std;
using namespace boost::interprocess;
struct sharedImage
{
enum { width = 320 };
enum { height = 240 };
enum { dataLength = width*height*sizeof(unsigned short) };
sharedImage(){}
interprocess_mutex mutex;
unsigned short data[dataLength];
};
shared_memory_object shm;
sharedImage * sIm;
mapped_region region;
int setupSharedMemory(){
// Clear the object if it exists
shared_memory_object::remove("ImageMem");
shm = shared_memory_object(create_only /*only create*/,"ImageMem" /*name*/,read_write/*read-write mode*/);
printf("Size:%i\n",sizeof(sharedImage));
//Set size
shm.truncate(sizeof(sharedImage));
//Map the whole shared memory in this process
region = mapped_region(shm, read_write);
//Get the address of the mapped region
void * addr = region.get_address();
//Construct the shared structure in the preallocated memory of shm
sIm = new (addr) sharedImage;
return 0;
}
int shutdownSharedMemory(){
shared_memory_object::remove("ImageMem");
return 0;
}
To start it up call setupSharedMemory() and to shut down call shutdownSharedMemory().
All the values are hard coded in this simple example but its easy to imagine making it more flexible.
Now lets assume that you are using SoftKinetic's DepthSense. So then you could write the following callback for the Depth node.
void onNewDepthSample(DepthNode node, DepthNode::NewSampleReceivedData data) {
//scoped_lock<interprocess_mutex> lock(sIm->mutex);
memcpy(sIm->data, data.depthMap, sIm->dataLength);
}
What this does is simply copies the latest depth map into the shared memory space.
You could also add a timestamp and a lock and anything else you need but this basic code works well enough for me so I will leave it as it is.
Now in some other process you can access the data in a very similar fashion.
The code below is what I use to get the live SoftKinetic DepthSense depth stream into Matlab for real time processing. This method has a huge advantage over trying to write my own mex wrapper specifically for SoftKinetic because I can use the same code for all the other cameras if I write servers for them.
#include <math.h>
#include <windows.h>
#include "mex.h"
#define BOOST_ALL_NO_LIB
#include <boost/interprocess/shared_memory_object.hpp>
#include <boost/interprocess/mapped_region.hpp>
#include <boost/interprocess/sync/scoped_lock.hpp>
#include <boost/interprocess/sync/interprocess_mutex.hpp>
#include <iostream>
#include <cstdio>
#include <cstdlib>
using namespace boost::interprocess;
struct sharedImage
{
enum { width = 320 };
enum { height = 240 };
enum { dataLength = width*height*sizeof(short) };
sharedImage(): dirty(true){}
interprocess_mutex mutex;
uint8_t data[dataLength];
bool dirty;
};
void getFrame(unsigned short *D)
{
//Open the shared memory object.
shared_memory_object shm(open_only ,"ImageMem", read_write);
//Map the whole shared memory in this process
mapped_region region(shm ,read_write);
//Get the address of the mapped region
void * addr = region.get_address();
//Construct the shared structure in memory
sharedImage * sIm = static_cast<sharedImage*>(addr);
//scoped_lock<interprocess_mutex> lock(sIm->mutex);
memcpy((char*)D, (char*)sIm->data, sIm->dataLength);
}
void mexFunction(int nlhs, mxArray *plhs[ ], int nrhs, const mxArray *prhs[ ])
{
// Build outputs
mwSize dims[2] = {320, 240};
plhs[0] = mxCreateNumericArray(2, dims, mxUINT16_CLASS, mxREAL);
unsigned short *D = (unsigned short*)mxGetData(plhs[0]);
try
{
getFrame(D);
}
catch (interprocess_exception &ex)
{
mexPrintf("getFrame:%s\n", ex.what());
}
}
which on my computer I compile in Matlab with: mex getSKFrame.cpp -IC:\Development\boost_1_48_0
And then finally to use it in Matlab: D = getSKFrame()'; imagesc(D)

How to sleep() from kernel init?

I'm debugging some code of the kernel init with an oscilloscope by setting up values on GPIO, what is the best way to sleep() for a given time very early, i.e, in ddr3_init() ?
Thank you
You could use a busy loop that stops after a given time interval. This should sleep for one second (I'm not sure if it works, I put it together by looking at the time.h header):
#include <linux/time.h>
struct timespec start_ts = current_kernel_time();
s64 start = timespec_to_ns(&start_ts);
do {
struct timespec now_ts = current_kernel_time();
s64 now = timespec_to_ns(&now_ts);
} while (now - start < 1000000000ULL);

Resources