changing the alignment requirement while casting - gcc

I get the warning " cast increases required alignment of target type" while compiling the following code for ARM.
char data[2] = "aa";
int *ptr = (int *)(data);
I understand that the alignment requirement for char is 1 byte and that of int is 4 bytes and hence the warning.
I tried to change the alignment of char by using the aligned attribute.
char data[2] __attribute__((aligned (4)));
memcpy(data, "aa", 2);
int *ptr = (int *)(data);
But the warning doesn't go away.
My questions are
Why doesn't the warning go away?
As ARM generates hardware exception for misaligned accesses, I want to make sure that alignment issues don't occur. Is there any other way to write this code so that the alignment issue won't arise?
By the way, when I print alignof(data), it prints 4 which means the alignment of data is changed.
I'm using gcc version 4.4.1. Is it possible that the gcc would give the warning even if the aligned was changed using aligned attribute?

I don't quite understand why you would want to do this... but the problem is that the string literal "aa" isn't stored at an aligned address. The compiler likely optimized away the variable data entirely, and therefore only sees the code as int* ptr = (int*)"aa"; and then give the misalignment warning. No amount of fiddling with the data variable will change how the literal "aa" is aligned.
To avoid the literal being allocated on a misaligned address, you would have to tweak around with how string literals are stored in the compiler settings, which is probably not a very good idea.
Also note that it doesn't make sense to have a pointer to non-constant data pointing at a string literal.
So your code is nonsense. If you still for reasons unknown insist of having an int pointer to a string literal, I'd do some kind of work-around, for example like this:
typedef union
{
char arr[3];
int dummy;
} data_t;
const data_t my_literal = { .arr="aa" };
const int* strange_pointer = (const int*)&my_literal;

Related

Is ZeroMemory the windows equivalent of null terminating a buffer?

For example I by convention null terminate a buffer (set buffer equal to zero) the following way, example 1:
char buffer[1024] = {0};
And with the windows.h library we can call ZeroMemory, example 2:
char buffer[1024];
ZeroMemory(buffer, sizeof(buffer));
According to the documentation provided by microsoft: ZeroMemory Fills a block of memory with zeros. I want to be accurate in my windows application so I thought what better place to ask than stack overflow.
Are these two examples equivalent in logic?
Yes, the two codes are equivalent. The entire array is filled with zeros in both cases.
In the case of char buffer[1024] = {0};, you are explicitly setting only the first char element to 0, and then the compiler implicitly value-initializes the remaining 1023 char elements to 0 for you.
In C++11 and later, you can omit that first element value:
char buffer[1024] = {};
char buffer[1024]{};

array declaration vs pointer + new

I'm not quite sure why is it happening. I have a program:
std::ifstream file(path, std::ios::binary);
file.seekg(0, file.end);
int length = file.tellg();
file.seekg(0, file.beg);
char* buffer = new char[length];
file.read(buffer, length);
file.close();
It is running well, and reads data correctly. However, if I replace buffer's declaration with:
char buffer[length];
then I get a segmentation fault. The size of data is around a few megabytes. What is the difference?
The difference is that "a few megabytes" is too large to put on your process's "stack", where you are now putting the data.
(Furthermore, you are relying on a GCC extension; length, as a variable whose value is not known until runtime, cannot legally be used as the size of a "normal" array like this)
Put your code back the way it was, and don't forget to delete[] the buffer when you are done using it.
Actually, this would be better:
std::vector<char> buffer(length);
file.read(&buffer[0], length);

__packed qualifier ignored

Why I get a '__packed__' attribute ignored [-Wattributes] warning in an android NDK Project?
Here is the code
mem_ = malloc(size_);
uint8_t* ui8_ptr = reinterpret_cast<uint8_t*>(mem_);
*ui8_ptr++ = packet_version;
//uint32_t* ui32_ptr = reinterpret_cast<uint32_t*>(ui8_ptr);
__packed uint32_t* ui32_ptr = (__packed uint32_t*)(ui8_ptr);
*ui32_ptr++ = size_;
*ui32_ptr++ = flags;
I am using the packed attribute because I think I have alignment problem when casting from uint8_t to uint32_t (See [1]).
[1] http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
GCC doesn't seem to support the packed attribute for all variables. It is only supported for struct, union, and enum types. So you could try something like this instead:
mem_ = malloc(size_);
uint8_t* ui8_ptr = reinterpret_cast<uint8_t*>(mem_);
*ui8_ptr++ = packet_version;
struct unaligned32_t
{
uint32_t data __attribute__((packed, aligned(1)));
};
//unaligned32_t* ui32_ptr = reinterpret_cast<unaligned32_t*>(ui8_ptr);
unaligned32_t* ui32_ptr = (unaligned32_t*)(ui8_ptr);
(ui32_ptr++)->data = size_;
(ui32_ptr++)->data = flags;
This won't produce the warning unless you use it on a char type, which is already byte aligned anyway.
I'm still investigating if this produces the code I'm aiming for on my ARM micro-controller but it is the only legal way I can think of using the packed attribute.
Here be dragons! Don't take the address of unaligned32_t.data. You should only access the data member of the struct using the . or -> directly, and not via a pointer. See this answer for why.
Link you referenced is for RVDS compiler (probably armcc) and GCC doesn't support packed attribute on pointers for such usage.

How to use arrays in program (global) scope in OpenCL

AMD OpenCL Programming Guide, Section 6.3 Constant Memory Optimization:
Globally scoped constant arrays. These arrays are initialized,
globally scoped, and in the constant address space (as specified in
section 6.5.3 of the OpenCL specification). If the size of an array is
below 64 kB, it is placed in hardware constant buffers; otherwise, it
uses global memory. An example of this is a lookup table for math
functions.
I want to use this "globally scoped constant array". I have such code in pure C
#define SIZE 101
int *reciprocal_table;
int reciprocal(int number){
return reciprocal_table[number];
}
void kernel(int *output)
{
for(int i=0; i < SIZE; i+)
output[i] = reciprocal(i);
}
I want to port it into OpenCL
__kernel void kernel(__global int *output){
int gid = get_global_id(0);
output[gid] = reciprocal(gid);
}
int reciprocal(int number){
return reciprocal_table[number];
}
What should I do with global variable reciprocal_table? If I try to add __global or __constant to it I get an error:
global variable must be declared in addrSpace constant
I don't want to pass __constant int *reciprocal_table from kernel to reciprocal. Is it possible to initialize global variable somehow? I know that I can write it down into code, but does other way exist?
P.S. I'm using AMD OpenCL
UPD Above code is just an example. I have real much more complex code with a lot of functions. So I want to make array in program scope to use it in all functions.
UPD2 Changed example code and added citation from Programming Guide
#define SIZE 2
int constant array[SIZE] = {0, 1};
kernel void
foo (global int* input,
global int* output)
{
const uint id = get_global_id (0);
output[id] = input[id] + array[id];
}
I can get the above to compile with Intel as well as AMD. It also works without the initialization of the array but then you would not know what's in the array and since it's in the constant address space, you could not assign any values.
Program global variables have to be in the __constant address space, as stated by section 6.5.3 in the standard.
UPDATE Now, that I fully understood the question:
One thing that worked for me is to define the array in the constant space and then overwrite it by passing a kernel parameter constant int* array which overwrites the array.
That produced correct results only on the GPU Device. The AMD CPU Device and the Intel CPU Device did not overwrite the arrays address. It also is probably not compliant to the standard.
Here's how it looks:
#define SIZE 2
int constant foo[SIZE] = {100, 100};
int
baz (int i)
{
return foo[i];
}
kernel void
bar (global int* input,
global int* output,
constant int* foo)
{
const uint id = get_global_id (0);
output[id] = input[id] + baz (id);
}
For input = {2, 3} and foo = {0, 1} this produces {2, 4} on my HD 7850 Device (Ubuntu 12.10, Catalyst 9.0.2). But on the CPU I get {102, 103} with either OCL Implementation (AMD, Intel). So I can not stress, how much I personally would NOT do this, because it's only a matter of time, before this breaks.
Another way to achieve this is would be to compute .h files with the host during runtime with the definition of the array (or predefine them) and pass them to the kernel upon compilation via a compiler option. This, of course, requires recompilation of the clProgram/clKernel for every different LUT.
I struggled to get this work in my own program some time ago.
I did not find any way to initialize a constant or global scope array from the host via some clEnqueueWriteBuffer or so. The only way is to write it explicitely in your .cl source file.
So here my trick to initialize it from the host is to use the fact that you are actually compiling your source from the host, which also means you can alter your src.cl file before compiling it.
First my src.cl file reads:
__constant double lookup[SIZE] = { LOOKUP }; // precomputed table (in constant memory).
double func(int idx) {
return(lookup[idx])
}
__kernel void ker1(__global double *in, __global double *out)
{
... do something ...
double t = func(i)
...
}
notice the lookup table is initialized with LOOKUP.
Then, in the host program, before compiling your OpenCL code:
compute the values of my lookup table in host_values[]
on your host, run something like:
char *buf = (char*) malloc( 10000 );
int count = sprintf(buf, "#define LOOKUP "); // actual source generation !
for (int i=0;i<SIZE;i++) count += sprintf(buf+count, "%g, ",host_values[i]);
count += sprintf(buf+count,"\n");
then read the content of your source file src.cl and place it right at buf+count.
you now have a source file with an explicitely defined lookup table that you just computed from the host.
compile your buffer with something like clCreateProgramWithSource(context, 1, (const char **) &buf, &src_sz, err);
voilĂ  !
It looks like "array" is a look-up table of sorts. You'll need to clCreateBuffer and clEnqueueWriteBuffer so the GPU has a copy of it to use.

warning: cast from pointer to integer of different size

I'm working on socket programming.. my code executes the way I want it to, I'm able to use it. BUT it gives me a warning on compilation.
I compile using
gcc server1.c -o server1 -lpthread
And I get the warning
warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
This error comes for the following code
int newsockfd;
newsockfd = (int)newsockfdd; //this line
And I'm using newsockfdd (which is int) in the following chunk of code
if (pthread_create(&threadID[i++], NULL, serverThread, (void *)(intptr_t)newsockfdd) != 0)
{
perror("Thread create error");
}
As you can probably tell, the code is not written too well (I am working on making it better). I know that this warning comes because of something to do with the size of int. But I really dunno how to fix it. Before I put in (intptr_t) in the pthread_create statement, it was showing a warning on that line, but that time the warning was
warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
It seems like there should be a simple fix to this? But I can't find it. I'm using Ubuntu 64-bit. Is that a reason for the warning?
As has been established in the comments, the situation is (modulo renaming to avoid confusing occurrences of newsockfdd as passed argument or received parameter)
void *serverThread(void *arg) {
// ...
int newsockfd = (int)arg;
// ...
}
and in main (or a function called from there)
// ...
int newsockfdd = whatever;
// ...
if (pthread_create(&threadID[i++], NULL, serverThread, (void *)(intptr_t)newsockfdd) != 0)
// ..
So when the int newsockfdd is passed as an argument to serverThread, it is cast to a void*. Originally, that cast was direct, but the intermediate cast to intptr_t was inserted to remove the warning about the cast to pointer from integer of different size.
And in serverThread, the received void* is cast to int, resulting in the warning about the cast from pointer to integer of different size.
That warning could probably also be removed by inserting an intermediate cast to intptr_t.
But, while the standard allows casting integers to pointers and vice versa, the results are implementation-defined and there's no guarantee that int -> void* -> int round-trips (although, a footnote in the standard says
The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to
be consistent with the addressing structure of the execution environment.
so probably it will round-trip and work as intended in this case - but it would likely not work [only for values of small enough absolute value] if the size of a void* is smaller than that of the integer type [consider long long -> void* -> long long on 32-bit systems]).
The proper fix is to avoid the casting between integers and pointers,
void *serverThread(void *arg) {
// ... check that arg isn't NULL
int newsockfd = *(int *)arg;
// ...
}
in severThread, cast the received pointer to a pointer of appropriate type, and read the pointed-to value, and in main
if// ...
int newsockfdd = whatever;
// ...
if (pthread_create(&threadID[i++], NULL, serverThread, &newsockfdd) != 0)
pass the address of newsockfdd.
One caveat: if serverThread is called from multiple places, the calls in all these places need to be fixed.

Resources