Strict aliasing rule in C++11 - c++11

I use the following C structs in my C++11 code (the code comes from liblwgeom of PostGis, but this is not the core of the question). The code is compiled with the following options using g++-4.8:
-std=c++11 -Wall -Wextra -pedantic-errors -pedantic -Werror
and I don't get any errors during compilation (or warnings) (should I get any?)
Question
Is safe to use LWPOLY (actually pointed by LWGEOM*) in functions that accept LWGEOM and don't modify the void *data; member. I understand that this is poor man's inheritance but this is what I need to work with.
Details
POLYGON:
typedef struct
{
uint8_t type; /* POLYGONTYPE */
uint8_t flags;
GBOX *bbox;
int32_t srid;
int nrings; /* how many rings we are currently storing */
int maxrings; /* how many rings we have space for in **rings */
POINTARRAY **rings; /* list of rings (list of points) */
}
LWPOLY; /* "light-weight polygon" */
LWGEOM:
typedef struct
{
uint8_t type;
uint8_t flags;
GBOX *bbox;
int32_t srid;
void *data;
}
LWGEOM;
POINTARRAY:
typedef struct
{
/* Array of POINT 2D, 3D or 4D, possibly missaligned. */
uint8_t *serialized_pointlist;
/* Use FLAGS_* macros to handle */
uint8_t flags;
int npoints; /* how many points we are currently storing */
int maxpoints; /* how many points we have space for in serialized_pointlist */
}
POINTARRAY;
GBOX:
typedef struct
{
uint8_t flags;
double xmin;
double xmax;
double ymin;
double ymax;
double zmin;
double zmax;
double mmin;
double mmax;
} GBOX;
Am I violating strict aliasing rule when I do something like?
const LWGEOM* lwgeom;
...
const LWPOLY* lwpoly = reinterpret_cast<const LWPOLY*>(lwgeom);
I know that in PostGis types are specifically designed to be "compatible" however I'd like to know if I am violating the standard by doing so.
Also, I noticed that PostGis is not compiled with strict aliasing disabled by default (at least version 2.1.5).
Solution
My colleague helped me to investigate it and it seems the answer is No it doesn't violate strict aliasing, but only in case we access LWGEOMS members that are of the same type as of LWPOLY's and are laid out in the beginning of the struct contiguously. Here is why (quoting standard):
3.10.10 says that you can access a member through a pointer to "aggregate or union".
8.5.1 defines aggregates (C structs are aggregates):
An aggregate is an array or a class (Clause 9) with no user-provided constructors (12.1), no private or
protected non-static data members (Clause 11), no base classes (Clause 10), and no virtual functions (10.3).
9.2.19 says that pointer to the struct is the same as pointer to the fist member for standard layout classes (C structs are standard layout).
Whether this is a safe way to code is a different question.

Yes, it violates the strict aliasing rule. LWGEOM and LWPOLY are unrelated types, and so are int and void*. So, for example, modification to lwgeom->data may not be read through lwpoly->nrings and vice versa.
I validated this with GCC4.9. My code is as follows:
#include <cinttypes>
#include <iostream>
using namespace std;
typedef struct {
uint8_t type; /* POLYGONTYPE */
uint8_t flags;
int32_t srid;
int nrings; /* how many rings we are currently storing */
} LWPOLY; /* "light-weight polygon" */
typedef struct {
uint8_t type;
uint8_t flags;
int32_t srid;
void *data;
} LWGEOM;
void f(LWGEOM* pgeom, LWPOLY* ppoly) {
ppoly->nrings = 7;
pgeom->data = 0;
std::cout << ppoly->nrings << '\n';
}
int main() {
LWGEOM geom = {};
LWGEOM* pgeom = &geom;
LWPOLY* ppoly = (LWPOLY*)pgeom;
f(pgeom, ppoly);
}
Guess what, the output is 7.

Related

how to fix wrong GCC ARM startup code pointer to initialized and zero variables?

[skip to UPDATE2 and save some time :-)]
I use ARM Cortex-M4, with CMSIS 5-5.7.0 and FreeRTOS, compiling using GCC for ARM (10_2021.10)
My variables are not initialized as they should.
My startup code is pretty simple, the entry point is the reset handler (CMSIS declared startup_ARMCM4.s as deprecated and recommend using the C code startup code so this is what I do).
Here is my code:
__attribute__((__noreturn__)) void Reset_Handler(void)
{
DataInit();
SystemInit(); /* CMSIS System Initialization */
main();
}
static void DataInit(void)
{
typedef struct {
uint32_t const* src;
uint32_t* dest;
uint32_t wlen;
} __copy_table_t;
typedef struct {
uint32_t* dest;
uint32_t wlen;
} __zero_table_t;
extern const __copy_table_t __copy_table_start__;
extern const __copy_table_t __copy_table_end__;
extern const __zero_table_t __zero_table_start__;
extern const __zero_table_t __zero_table_end__;
for (__copy_table_t const* pTable = &__copy_table_start__; pTable < &__copy_table_end__; ++pTable) {
for(uint32_t i=0u; i<pTable->wlen; ++i) {
pTable->dest[i] = pTable->src[i];
}
}
for (__zero_table_t const* pTable = &__zero_table_start__; pTable < &__zero_table_end__; ++pTable) {
for(uint32_t i=0u; i<pTable->wlen; ++i) {
pTable->dest[i] = 0u;
}
}
}
__copy_table_start__, __copy_table_end__ etc. have the wrong values an so no data is copied to the appropriate place in RAM.
I tried adding __libc_init_array() before DataInit(), as suggested in this answer, and remove the nostartfiles flag from the linker, but at some point __libc_init_array() jumps to an illegal address and I get a HardFault interrupt.
Is there a different method to fix it? maybe one where I can use the nostartfiles flag?
UPDATE:
Looking at the memory, where __copy_table_start__ is located, I see the data there is valid (even without the use of __libc_init_array()). It seems that pTable doesn't get the correct value.
I tried using __data_start__, __data_end__, __bss_start__, __bss_end__ and __etext instead of the above variables, in the linker file it is said they can be used in code without definition, but they cannot (maybe that's a clue?). In any case they didn't work either.
UPDATE2:
found the actual problem
all struct members get the same value (modifying one changes all others), it happens with every struct. I have no idea how this is possible. In other words the value of __copy_table_start__.src is, for example, 0x14651234, __copy_table_start__.dest is 0x00100000, and __copy_table_start__.wlen is 0x0365. When looking at pTable all members are 0x14651234.

How to avoid C++ code bloat issued by template instantiation and symbol table?

I'd started a bare-metal (Cortex-M) project some years ago. At project setup we decided to use gcc toolchain with C++11 / C++14 etc. enabled and even for using C++ exceptions and rtti.
We are currently using gcc 4.9 from launchpad.net/gcc-arm-embedded (having some issue which prevent us currently to update to a more recent gcc version).
For example, I'd wrote a base class and a derived class like this (see also running example here):
class OutStream {
public:
explicit OutStream() {}
virtual ~OutStream() {}
OutStream& operator << (const char* s) {
write(s, strlen(s));
return *this;
}
virtual void write(const void* buffer, size_t size) = 0;
};
class FixedMemoryStream: public OutStream {
public:
explicit FixedMemoryStream(void* memBuffer, size_t memBufferSize): memBuffer(memBuffer), memBufferSize(memBufferSize) {}
virtual ~FixedMemoryStream() {}
const void* getBuffer() const { return memBuffer; }
size_t getBufferSize() const { return memBufferSize; }
const char* getText() const { return reinterpret_cast<const char*>(memBuffer); } ///< returns content as zero terminated C-string
size_t getSize() const { return index; } ///< number of bytes really written to the buffer (max = buffersize-1)
bool isOverflow() const { return overflow; }
virtual void write(const void* buffer, size_t size) override { /* ... */ }
private:
void* memBuffer = nullptr; ///< buffer
size_t memBufferSize = 0; ///< buffer size
size_t index = 0; ///< current write index
bool overflow = false; ///< flag if we are overflown
};
So that the customers of my class are now able to use e.g.:
char buffer[10];
FixedMemoryStream ms1(buffer, sizeof(buffer));
ms1 << "Hello World";
Now I'd want to make the usage of the class a bit more comfortable and introduced the following template:
template<size_t bufferSize> class FixedMemoryStreamWithBuffer: public FixedMemoryStream {
public:
explicit FixedMemoryStreamWithBuffer(): FixedMemoryStream(buffer, bufferSize) {}
private:
uint8_t buffer[bufferSize];
};
And from now, my customers can write:
FixedMemoryStreamWithBuffer<10> ms2;
ms2 << "Hello World";
But from now, I'd observed increasing size of my executable binary. It seems that gcc added symbol information for each different template instantiation of FixedMemoryStreamWithBuffer (because we are using rtti for some reason).
Might there be a way to get rid of symbol information only for some specific classes / templates / template instantiations?
It's ok to get a non portable gcc only solution for this.
For some reason we decided to prefer templates instead of preprocessor macros, I want to avoid a preprocessor solution.
First of all, keep in mind that compiler also generates separate v-table (as well as RTTI information) for every FixedMemoryStreamWithBuffer<> type instance, as well as every class in the inheritance chain.
In order to resolve the problem I'd recommend using containment instead of inheritance with some conversion function and/or operator inside:
template<size_t bufferSize>
class FixedMemoryStreamWithBuffer
{
uint8_t buffer[bufferSize];
FixedMemoryStream m_stream;
public:
explicit FixedMemoryStreamWithBuffer() : m_stream(m_buffer, bufferSize) {}
operator FixedMemoryStream&() { return m_stream; }
FixedMemoryStream& toStream() { return m_stream; }
};
Yes, there's a way to bring the necessary symbols almost down to 0: using the standard library. Your OutStream class is a simplified version of std::basic_ostream. Your OutStream::write is really just std::basic_ostream::write and so on. Take a look at it here. Overflow is handled really closely, though, for completeness' sake, it also deals with underflow i.e. the need for data retrieval; you may leave it as undefined (it's virtual too).
Similarly, your FixedMemoryStream is std::basic_streambuf<T> with a fixed-size (a std::array<T>) get/put area.
So, just make your classes inherit from the standard ones and you'll cut off on binary size since you're reusing already declared symbols.
Now, regarding template<size_t bufferSize> class FixedMemoryStreamWithBuffer. This class is very similar to std::array<std::uint8_t, bufferSize> as for the way memory is specified and acquired. You can't optimize much about that: each instantiation is a different type with all what that implies. The compiler cannot "merge" or do anything magic about them: each instantiation must have its own type.
So either fall back on std::vector or have some fixed-size specialized chunks, like 32, 128 etc. and for any values in between would choose the right one; this can be achieved entirely at compile-time, so no runtime cost.

Structure defined in a dynamically loaded library

I am dynamically loading the cudart (Cuda Run Time Library) to access just the cudaGetDeviceProperties function. This one requires two arguments:
A cudaDeviceProp structure which is defined in a header of the run time library;
An integer which represents the device ID.
I am not including the cuda_runtime.h header in order to not get extra constants, macros, enum, class... that I do not want to use.
However, I need the cudaDeviceProp structure. Is there a way to get it without redefining it? I wrote the following code:
struct cudaDeviceProp;
class CudaRTGPUInfoDL
{
typedef int(*CudaDriverVersion)(int*);
typedef int(*CudaRunTimeVersion)(int*);
typedef int(*CudaDeviceProperties)(cudaDeviceProp*,int);
public:
struct Properties
{
char name[256]; /**< ASCII string identifying device */
size_t totalGlobalMem; /**< Global memory available on device in bytes */
size_t sharedMemPerBlock; /**< Shared memory available per block in bytes */
int regsPerBlock; /**< 32-bit registers available per block */
int warpSize; /**< Warp size in threads */
size_t memPitch; /**< Maximum pitch in bytes allowed by memory copies */
/*... Tons of members follow..*/
};
public:
CudaRTGPUInfoDL();
~CudaRTGPUInfoDL();
int getCudaDriverVersion();
int getCudaRunTimeVersion();
const Properties& getCudaDeviceProperties();
private:
QLibrary library;
private:
CudaDriverVersion cuDriverVer;
CudaRunTimeVersion cuRTVer;
CudaDeviceProperties cuDeviceProp;
Properties properties;
};
As everybody can see, I simply "copy-pasted" the declaration of the structure.
In order to get the GPU properties, I simply use this method:
const CudaRTGPUInfoDL::Properties& CudaRTGPUInfoDL::getCudaDeviceProperties()
{
// Unsafe but needed.
cuDeviceProp(reinterpret_cast<cudaDeviceProp*>(&properties), 0);
return properties;
}
Thanks for your answers.
If you need the structure to be complete, you should define it (probably by including the appropriate header).
If you're just going to be passing around references or pointers, such as in the method you show, then it doesn't need to be complete and can just be forward declared:
class cudaDeviceProp;

exposing a function with 2D slice as a parameter in a c-shared library (to be used in Java via JNA and C)

I am trying to write a simple matrix operations API using go and expose the APIs as a shared library. This shared library will be used from Java(using JNA) and from C.
The documentation is very sparse about using any data type beyond simple int or string as function parameters.
My requirement is to expose functions with 1 or more 2D slices as parameters AND also as return types. I am not able to figure out if such a thing is supported.
Is this possible? Are there any examples for this?
I think the key point is to have a look to the c bindings of slice,string and int generated by go build tool. I not tried 2D slice, but it should no different to 1D slice with unsafe pointer converter, maybe just be one more time allocation and convertion.
I'm not sure it's the best way, but here's the example for 1D slice:
the go part:
import "C"
//export CFoo
func CFoo(content []byte) string{
var ret []byte
//blablabla to get ret
cbuf := unsafe.Pointer(C.malloc(C.size_t(len(ret))))
C.memcpy(cbuf, unsafe.Pointer(&ret[0]), C.size_t(len(ret)))
var finalString string
hdr := (*reflect.StringHeader)(unsafe.Pointer(&finalString))
hdr.Data = uintptr(unsafe.Pointer(cbuf))
hdr.Len = len(ret)
return finalString
}
compile with -buildmode=c-shared, to get libmygo.so.
I not know JNA, expecting it like JNI. the JNI part as well as pure C part:
#include <stdio.h>
#include <jni.h>
#include <string.h>
typedef signed char GoInt8;
typedef unsigned char GoUint8;
typedef short GoInt16;
typedef unsigned short GoUint16;
typedef int GoInt32;
typedef unsigned int GoUint32;
typedef long long GoInt64;
typedef unsigned long long GoUint64;
typedef GoInt32 GoInt;
typedef GoUint32 GoUint;
typedef __SIZE_TYPE__ GoUintptr;
typedef float GoFloat32;
typedef double GoFloat64;
typedef float _Complex GoComplex64;
typedef double _Complex GoComplex128;
typedef struct { const char *p; GoInt n; } GoString;
typedef void *GoMap;
typedef void *GoChan;
typedef struct { void *t; void *v; } GoInterface;
typedef struct { void *data; GoInt len; GoInt cap; } GoSlice;
JNIEXPORT JNICALL jbyteArray Java_com_mynextev_infotainment_app_myev_Native_foo(JNIEnv* env, jobject obj,jbyteArray content){
JNIEnv ienv = *env;
void * Ccontent = ienv->GetByteArrayElements(env, content, 0);
int Lcontent = ienv->GetArrayLength(env, content);
GoSlice Gcontent = {Ccontent, Lcontent, Lcontent};
if(!gret.n){
printf("jni CDoAESEnc");
return NULL;
}
jbyteArray ret = ienv->NewByteArray(env, gret.n);
ienv->SetByteArrayRegion(env, ret, 0, gret.n, gret.p);
free((void*)gret.p);
ienv->ReleaseByteArrayElements(env, content, Ccontent, JNI_ABORT);
return ret;
}
build it with libmygo.so.
finally you get two so files. one for C which can be used standalone; one for Java which must be used with libmygo.so together.

OpenCL & Xcode - Incorrect kernel header being generated for custom data type argument

I'm parallelising a LBM using OpenCL and have across a problem regarding how the kernel header files are being generated for a custom data type as an argument to the kernel. I define the data type within the kernel file (rebound.cl) as required (typedef struct {...} t_speed;) and the data type t_speed is generated in the header file which is obviously syntactically incorrect and the build subsequently fails. Whilst this is more of an annoyance than a major problem, fixing it would save a lot of time!
Kernel file: rebound.cl
#ifdef cl_khr_fp64
#pragma OPENCL EXTENSION cl_khr_fp64 : enable
#elif defined(cl_amd_fp64)
#pragma OPENCL EXTENSION cl_amd_fp64 : enable
#else
#error "Double precision floating point not supported by OpenCL implementation."
#endif
#define NSPEEDS 5
typedef struct {
double speeds[NSPEEDS];
} t_speed;
__kernel void rebound (__global t_speed* cells,
__global t_speed* tmp_cells,
__global const unsigned char* obstacles,
const unsigned short int count)
{
int i = get_global_id(0);
if (i < count) {
if (obstacles[i]) {
cells[i].speeds[1] = tmp_cells[i].speeds[3]; /* East -> West */
cells[i].speeds[3] = tmp_cells[i].speeds[1]; /* West -> East*/
cells[i].speeds[2] = tmp_cells[i].speeds[4]; /* North -> South */
cells[i].speeds[4] = tmp_cells[i].speeds[2]; /* South -> North */
}
}
}
Kernel header file: rebound.cl.h
/***** GCL Generated File *********************/
/* Automatically generated file, do not edit! */
/**********************************************/
#include <OpenCL/opencl.h>
typedef struct {
double [5] speeds;
} _t_speed_unalign;
typedef _t_speed_unalign __attribute__ ((aligned(8))) t_speed;
extern void (^rebound_kernel)(const cl_ndrange *ndrange, t_speed* cells, t_speed* tmp_cells, cl_uchar* obstacles, cl_ushort count);

Resources