Char conversion in gcc - gcc

What are the char implicit typecasting rules? The following code gives an awkward output of -172.
char x = 200;
char y = 140;
printf("%d", x+y);
My guess is that being signed, x is casted into 72, and y is casted into 12, which should give 84 as the answer, which however is not the case as mentioned above. I am using gcc on Ubuntu.

The following code gives an awkward output of -172.
The behavior of an overflow is implementation dependent, but visibly in your case (and mine) a char has 8 bits and its representation is the complement by 2. So the binary representation of the unsigned char 200 and 140 are 11001000 and 10001100, corresponding to the binary representation of the  signed char -56 and -116, and -56 + -116 equals -172 (the char are promoted to int to do the addition).
Example forcing x and y to be signed whatever the default for char:
#include <stdio.h>
int main()
{
signed char x = 200;
signed char y = 140;
printf("%d %d %d\n", x, y, x+y);
return 0;
}
Compilation and execution :
pi#raspberrypi:/tmp $ gcc -Wall c.c
pi#raspberrypi:/tmp $ ./a.out
-56 -116 -172
pi#raspberrypi:/tmp $
My guess is that being signed, x is casted into 72, and y is casted into 12
You supposed the higher bit is removed (11001000 -> 1001000 and 10001100 -> 1100) but this is not the case, contrarily to the IEEE floats using a bit for the sign.

Related

Qsort comparison

I'm converting C++ code to Go, but I have difficulties in understanding this comparison function:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <iostream>
using namespace std;
typedef struct SensorIndex
{ double value;
int index;
} SensorIndex;
int comp(const void *a, const void* b)
{ SensorIndex* x = (SensorIndex*)a;
SensorIndex* y = (SensorIndex*)b;
return abs(y->value) - abs(x->value);
}
int main(int argc , char *argv[])
{
SensorIndex *s_tmp;
s_tmp = (SensorIndex *)malloc(sizeof(SensorIndex)*200);
double q[200] = {8.48359,8.41851,-2.53585,1.69949,0.00358129,-3.19341,3.29215,2.68201,-0.443549,-0.140532,1.64661,-1.84908,0.643066,1.53472,2.63785,-0.754417,0.431077,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256,-0.123256};
for( int i=0; i < 200; ++i ) {
s_tmp[i].value = q[i];
s_tmp[i].index = i;
}
qsort(s_tmp, 200, sizeof(SensorIndex), comp);
for( int i=0; i<200; i++)
{
cout << s_tmp[i].index << " " << s_tmp[i].value << endl;
}
}
I expected that the "comp" function would allow the sorting from the highest (absolute) value to the minor, but in my environment (gcc 32 bit) the result is:
1 8.41851
0 8.48359
2 -2.53585
3 1.69949
11 -1.84908
5 -3.19341
6 3.29215
7 2.68201
10 1.64661
14 2.63785
12 0.643066
13 1.53472
4 0.00358129
9 -0.140532
8 -0.443549
15 -0.754417
16 0.431077
17 -0.123256
18 -0.123256
19 -0.123256
20 -0.123256
...
Moreover one thing that seems strange to me is that by executing the same code with online services I get different values (cpp.sh, C++98):
0 8.48359
1 8.41851
5 -3.19341
6 3.29215
2 -2.53585
7 2.68201
14 2.63785
3 1.69949
10 1.64661
11 -1.84908
13 1.53472
4 0.00358129
8 -0.443549
9 -0.140532
12 0.643066
15 -0.754417
16 0.431077
17 -0.123256
18 -0.123256
19 -0.123256
20 -0.123256
...
Any help?
This behavior is caused by using abs, a function that works with int, and passing it double arguments. The doubles are being implicitly cast to int, truncating the decimal component before comparing them. Essentially, this means you take the original number, strip off the sign, and then strip off everything to the right of the decimal and compare those values. So 8.123 and -8.9 are both converted to 8, and compare equal. Since the inputs are reversed for the subtraction, the ordering is in descending order by magnitude.
Your cpp.sh output reflects this; all the values with a magnitude between 8 and 9 appear first, then 3-4s, then 2-3s, 1-2s and less than 1 values.
If you wanted to fix this to actually sort in descending order in general, you'd need a comparison function that properly used the double-friendly fabs function, e.g.
int comp(const void *a, const void* b)
{ SensorIndex* x = (SensorIndex*)a;
SensorIndex* y = (SensorIndex*)b;
double diff = fabs(y->value) - fabs(x->value);
if (diff < 0.0) return -1;
return diff > 0;
}
Update: On further reading, it looks like std::abs from <cmath> has worked with doubles for a long time, but std::abs for doubles was only added to <cstdlib> (where the integer abs functions dwell) in C++17. And the implementers got this stuff wrong all the time, so different compilers would behave differently at random. In any event, both the answers given here are right; if you haven't included <cmath> and you're on pre-C++17 compilers, you should only have access to integer based versions of std::abs (or ::abs from math.h), which would truncate each value before the comparison. And even if you were using the correct std::abs, returning the result of double subtraction as an int would drop fractional components of the difference, making any values with a magnitude difference of less than 1.0 appear equal. Worse, depending on specific comparisons performed and their ordering (since not all values are compared to each other), the consequences of this effect could chain, as comparison ordering changes could make 1.0 appear equal to 1.6 which would in turn appear equal to 2.5, even though 1.0 would be correctly identified as less than 2.5 if they were compared to each other; in theory, as long as each number is within 1.0 of every other number, the comparisons might evaluate as if they're all equal to each other (pathological case yes, but smaller runs of such errors would definitely happen).
Point is, the only way to figure out the real intent of this code is to figure out the exact compiler version and C++ standard it was originally compiled under and test it there.
There is a bug in your comparison function. You return an int which means you lose the distinction between element values whose absolute difference is less then 1!
int comp(const void* a, const void* b)
{
SensorIndex* x = (SensorIndex*)a;
SensorIndex* y = (SensorIndex*)b;
// what about differences between 0.0 and 1.0?
return abs(y->value) - abs(x->value);
}
You can fix it like this:
int comp(const void* a, const void* b)
{ SensorIndex* x = (SensorIndex*)a;
SensorIndex* y = (SensorIndex*)b;
if(std::abs(y->value) < std::abs(x->value))
return -1;
return 1;
}
A more modern (and safer) way to do this would be to use std::vector and std::sort:
// use a vector for dynamic arrays
std::vector<SensorIndex> s_tmp;
for(int i = 0; i < 200; ++i) {
s_tmp.push_back({q[i], i});
}
// use std::sort
std::sort(std::begin(s_tmp), std::end(s_tmp), [](SensorIndex const& a, SensorIndex const& b){
return std::abs(b.value) < std::abs(a.value);
});

Invalid assertion for overflow check Frama-C

While checking the overflow for short and char data type for add operation, the assertions inserted by Frama-C are seems to be incorrect:
For char and short data the maximum positive and negative values are of integer data type.
What could be the reason for this?
Integral types of rank less than int are converted to either int or unsigned when used in an arithmetic operation (see C11 6.3.1.8 Usual arithmetic conversions). This is why you see the cast to (int) for x and y. Note that by default -rte will not emit warning for downcasts, as they are not undefined behavior (6.3.1.3§3 indicates that signed downcasts are implementation defined and that an implementation may raise a signal). If you add the option -warn-signed-downcast, you'll see the assertions you were probably looking for, which are due to the cast into (char) of the result:
/*# assert rte: signed_downcast: (int)x+(int)y ≤ 127; */
/*# assert rte: signed_downcast: -128 ≤ (int)x+(int)y; */
Note that if you store the result into an int, as in
void main(void) {
char x;
char y;
int z;
x = 1;
y = 127;
z = x + y;
return;
}
There won't be any downcast warning (but the signed overflow warnings will be present).

How do the conversions between signed, unsigned and float types work?

The compiler I use is g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4.
I compile my programs with the following command:
g++ -std=c++11 -pedantic -Wall program.cpp
The program no. 1.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54;
cout << b << endl;
return 0;
}
The program prints 4294967242 and this is the value I expected, because this is the case when we assign an out-of-range value to a variable of unsigned type, so the result is the remainder of a modulo division.
The program no. 2.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = 54.1234;
cout << b << endl;
return 0;
}
The program prints 54, and this is also OK, because the stored value is the part before the decimal point, and the franctional part is truncated.
The program no. 3.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54.1234;
cout << b << endl;
return 0;
}
Here during compilation I get the warning "overflow in implicit constant conversion".
And the program prints 0. Why is it so? I thought that it will do the truncation of the fractional part (as in program 2) and then store the result of the modulo division (as in program 1).
But if I write program no. 4.:
program no. 4.
#include <iostream>
using namespace std;
int main() {
unsigned int b;
float k = -54.1234;
b = k;
cout << b << endl;
return 0;
}
then I get no warning, and I get the result (expected by me) 4294967242, which is the result of the modulo division.
I would be grateful if somebody can explain it to me.
Why doesn't the program no. 3 behave like program no. 4? Why don't I get a warning when compiling program no. 1, but I get one when compiling program no. 3.?
According to the standard (§[conv.fpint]).
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.
So, your -54.1234 is truncated to -54. Since that can't be represented in an unsigned, you get undefined behavior.
When converting floating point numbers to integers, C and C++ round floating point numbers towards zero. The rounded result must then be representable in the destination type.
As a result, for 32 bit unsigned int the conversion is guaranteed to give the correct result if -1 < x < 2^32. For smaller numbers there are no guarantees. Since numbers between -1 and 0 must be rounded to zero, and numbers -1 and smaller have no requirements, it wouldn't be surprising if the compiler checks whether x < 0 and gives a result of 0 in that case. (The compiler might check whether x < 1 and give a result of 0; this handles very small positive numbers as well).

Manually Converting rgba8 to rgba5551

I need to convert rgba8 to rgba5551 manually. I found some helpful code from another post and want to modify it to convert from rgba8 to rgba5551. I don't really have experience with bitewise stuff and haven't had any luck messing with the code myself.
void* rgba8888_to_rgba4444( void* src, int src_bytes)
{
// compute the actual number of pixel elements in the buffer.
int num_pixels = src_bytes / 4;
unsigned long* psrc = (unsigned long*)src;
unsigned short* pdst = (unsigned short*)src;
// convert every pixel
for(int i = 0; i < num_pixels; i++){
// read a source pixel
unsigned px = psrc[i];
// unpack the source data as 8 bit values
unsigned r = (px << 8) & 0xf000;
unsigned g = (px >> 4) & 0x0f00;
unsigned b = (px >> 16) & 0x00f0;
unsigned a = (px >> 28) & 0x000f;
// and store
pdst[i] = r | g | b | a;
}
return pdst;
}
The value of RGBA5551 is that it has color info condensed into 16 bits - or two bytes, with only one bit for the alpha channel (on or off). RGBA8888, on the other hand, uses a byte for each channel. (If you don't need an alpha channel, I hear RGB565 is better - as humans are more sensitive to green). Now, with 5 bits, you get the numbers 0 through 31, so r, g, and b each need to be converted to some number between 0 and 31, and since they are originally a byte each (0-255), we multiply each by 31/255. Here is a function that takes RGBA bytes as input and outputs RGBA5551 as a short:
short int RGBA8888_to_RGBA5551(unsigned char r, unsigned char g, unsigned char b, unsigned char a){
unsigned char r5 = r*31/255; // All arithmetic is integer arithmetic, and so floating points are truncated. If you want to round to the nearest integer, adjust this code accordingly.
unsigned char g5 = g*31/255;
unsigned char b5 = b*31/255;
unsigned char a1 = (a > 0) ? 1 : 0; // 1 if a is positive, 0 else. You must decide what is sensible.
// Now that we have our 5 bit r, g, and b and our 1 bit a, we need to shift them into place before combining.
short int rShift = (short int)r5 << 11; // (short int)r5 looks like 00000000000vwxyz - 11 zeroes. I'm not sure if you need (short int), but I've wasted time tracking down bugs where I didn't typecast properly before shifting.
short int gShift = (short int)g5 << 6;
short int bShift = (short int)b5 << 1;
// Combine and return
return rShift | gShift | bShift | a1;
}
You can, of course condense this code.

Character range in Java

I've read in a book:
..characters are just 16-bit unsigned integers under the hood. That means you can assign a number literal, assuming it will fit into the unsigned 16-bit range (65535 or less).
It gives me the impression that I can assign integers to characters as long as it's within the 16-bit range.
But how come I can do this:
char c = (char) 80000; //80000 is beyond 65535.
I'm aware the cast did the magic. But what exactly happened behind the scenes?
Looks like it's using the int value mod 65536. The following code:
int i = 97 + 65536;
char c = (char)i;
System.out.println(c);
System.out.println(i % 65536);
char d = 'a';
int n = (int)d;
System.out.println(n);
Prints out 'a' and then '97' twice (a is char 97 in ascii).

Resources