Strict conversion of std::atomic_bool values - c++11

I have a code with very simple logical arithmetic that involves values returned from std::atomic_bool.
#include <iostream>
#include <atomic>
int main() {
uint16_t v1 = 0x1122;
uint16_t v2 = 0xaaff;
std::atomic_bool flag1(false);
uint16_t r1 = v1 | v2;
std::cout << std::hex << r1 << std::endl;
uint16_t r2 = static_cast<uint16_t>(flag1.load()) | static_cast<uint16_t>(0xaaff);
std::cout << std::hex << r2 << std::endl;
std::cout << __VERSION__ << std::endl;
}
Code example is here. Compile line: g++ -std=c++17 -O3 -Wall -pedantic -Wconversion -pthread main.cpp && ./a.out.
Based on the STD API, load() should return the underlined type stored in the atomic. So the flag1.load() should be returning bool. However, the compiler send a warning that it is asked to convert an int to uint16_t:
main.cpp:13:55: warning: conversion from 'int' to 'uint16_t' {aka 'short unsigned int'} may change value [-Wconversion]
uint16_t r2 = static_cast<uint16_t>(flag1.load()) | static_cast<uint16_t>(0xaaff);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Where exactly does it do this conversion? Both sides of the | are converted to uint16_t. Why is it still printing a warning?

Usual arithmetic conversions (as specified at §5 of ISO standard) define that any operands for most or all arithmetic and binary operators undergo integer promotion before the operation itself.
This means that both uint16_t operands are first promoted to int to compute the bitwise | and then truncated back to uint16_t to store in r2.
Indeed that's what the warning is about: there's an implicit truncation of an int to an uint16_t.
These conversions also define that a bool will always evaluate to 1 or 0, so the first cast is useless, but since the second operand will be promoted to an int, then also the second cast is useless, you could go with
uint16_t r2 = flag.load() | 0xaaff;
and possibly silence the warning by explicitly casting to a narrower type, which makes you aware of the fact that this is happening:
uint16_t r2 = static_cast<uint16_t>(flag.load() | 0xaaff);

Related

GCC AVX __m256i cast to int array leads to wrong values [duplicate]

I'm trying to learn to code using intrinsics and below is a code which does addition
compiler used: icc
#include<stdio.h>
#include<emmintrin.h>
int main()
{
__m128i a = _mm_set_epi32(1,2,3,4);
__m128i b = _mm_set_epi32(1,2,3,4);
__m128i c;
c = _mm_add_epi32(a,b);
printf("%d\n",c[2]);
return 0;
}
I get the below error:
test.c(9): error: expression must have pointer-to-object type
printf("%d\n",c[2]);
How do I print the values in the variable c which is of type __m128i
Use this function to print them:
#include <stdint.h>
#include <string.h>
void print128_num(__m128i var)
{
uint16_t val[8];
memcpy(val, &var, sizeof(val));
printf("Numerical: %i %i %i %i %i %i %i %i \n",
val[0], val[1], val[2], val[3], val[4], val[5],
val[6], val[7]);
}
You split 128bits into 16-bits(or 32-bits) before printing them.
This is a way of 64-bit splitting and printing if you have 64-bit support available:
#include <inttypes.h>
void print128_num(__m128i var)
{
int64_t v64val[2];
memcpy(v64val, &var, sizeof(v64val));
printf("%.16llx %.16llx\n", v64val[1], v64val[0]);
}
Note: casting the &var directly to an int* or uint16_t* would also work MSVC, but this violates strict aliasing and is undefined behaviour. Using memcpy is the standard compliant way to do the same and with minimal optimization the compiler will generate the exact same binary code.
Portable across gcc/clang/ICC/MSVC, C and C++.
fully safe with all optimization levels: no strict-aliasing violation UB
print in hex as u8, u16, u32, or u64 elements (based on #AG1's answer)
Prints in memory order (least-significant element first, like _mm_setr_epiX). Reverse the array indices if you prefer printing in the same order Intel's manuals use, where the most significant element is on the left (like _mm_set_epiX). Related: Convention for displaying vector registers
Using a __m128i* to load from an array of int is safe because the __m128 types are defined to allow aliasing just like ISO C unsigned char*. (e.g. in gcc's headers, the definition includes __attribute__((may_alias)).)
The reverse isn't safe (pointing an int* onto part of a __m128i object). MSVC guarantees that's safe, but GCC/clang don't. (-fstrict-aliasing is on by default). It sometimes works with GCC/clang, but why risk it? It sometimes even interferes with optimization; see this Q&A. See also Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior?
See GCC AVX _m256i cast to int array leads to wrong values for a real-world example of GCC breaking code which points an int* at a __m256i.
(uint32_t*) &my_vector violates the C and C++ aliasing rules, and is not guaranteed to work the way you'd expect. Storing to a local array and then accessing it is guaranteed to be safe. It even optimizes away with most compilers, so you get movq / pextrq directly from xmm to integer registers instead of an actual store/reload, for example.
Source + asm output on the Godbolt compiler explorer: proof it compiles with MSVC and so on.
#include <immintrin.h>
#include <stdint.h>
#include <stdio.h>
#ifndef __cplusplus
#include <stdalign.h> // C11 defines _Alignas(). This header defines alignas()
#endif
void p128_hex_u8(__m128i in) {
alignas(16) uint8_t v[16];
_mm_store_si128((__m128i*)v, in);
printf("v16_u8: %x %x %x %x | %x %x %x %x | %x %x %x %x | %x %x %x %x\n",
v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7],
v[8], v[9], v[10], v[11], v[12], v[13], v[14], v[15]);
}
void p128_hex_u16(__m128i in) {
alignas(16) uint16_t v[8];
_mm_store_si128((__m128i*)v, in);
printf("v8_u16: %x %x %x %x, %x %x %x %x\n", v[0], v[1], v[2], v[3], v[4], v[5], v[6], v[7]);
}
void p128_hex_u32(__m128i in) {
alignas(16) uint32_t v[4];
_mm_store_si128((__m128i*)v, in);
printf("v4_u32: %x %x %x %x\n", v[0], v[1], v[2], v[3]);
}
void p128_hex_u64(__m128i in) {
alignas(16) unsigned long long v[2]; // uint64_t might give format-string warnings with %llx; it's just long in some ABIs
_mm_store_si128((__m128i*)v, in);
printf("v2_u64: %llx %llx\n", v[0], v[1]);
}
If you need portability to C99 or C++03 or earlier (i.e. without C11 / C++11), remove the alignas() and use storeu instead of store. Or use __attribute__((aligned(16))) or __declspec( align(16) ) instead.
(If you're writing code with intrinsics, you should be using a recent compiler version. Newer compilers usually make better asm than older compilers, including for SSE/AVX intrinsics. But maybe you want to use gcc-6.3 with -std=gnu++03 C++03 mode for a codebase that isn't ready for C++11 or something.)
Sample output from calling all 4 functions on
// source used:
__m128i vec = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16);
// output:
v2_u64: 0x807060504030201 0x100f0e0d0c0b0a09
v4_u32: 0x4030201 0x8070605 0xc0b0a09 0x100f0e0d
v8_u16: 0x201 0x403 0x605 0x807 | 0xa09 0xc0b 0xe0d 0x100f
v16_u8: 0x1 0x2 0x3 0x4 | 0x5 0x6 0x7 0x8 | 0x9 0xa 0xb 0xc | 0xd 0xe 0xf 0x10
Adjust the format strings if you want to pad with leading zeros for consistent output width. See printf(3).
I know this question is tagged C, but it was the best search result also when looking for a C++ solution to the same problem.
So, this could be a C++ implementation:
#include <string>
#include <cstring>
#include <sstream>
#if defined(__SSE2__)
template <typename T>
std::string __m128i_toString(const __m128i var) {
std::stringstream sstr;
T values[16/sizeof(T)];
std::memcpy(values,&var,sizeof(values)); //See discussion below
if (sizeof(T) == 1) {
for (unsigned int i = 0; i < sizeof(__m128i); i++) { //C++11: Range for also possible
sstr << (int) values[i] << " ";
}
} else {
for (unsigned int i = 0; i < sizeof(__m128i) / sizeof(T); i++) { //C++11: Range for also possible
sstr << values[i] << " ";
}
}
return sstr.str();
}
#endif
Usage:
#include <iostream>
[..]
__m128i x
[..]
std::cout << __m128i_toString<uint8_t>(x) << std::endl;
std::cout << __m128i_toString<uint16_t>(x) << std::endl;
std::cout << __m128i_toString<uint32_t>(x) << std::endl;
std::cout << __m128i_toString<uint64_t>(x) << std::endl;
Result:
141 114 0 0 0 0 0 0 151 104 0 0 0 0 0 0
29325 0 0 0 26775 0 0 0
29325 0 26775 0
29325 26775
Note: there exists a simple way to avoid the if (size(T)==1), see https://stackoverflow.com/a/28414758/2436175
#include<stdio.h>
#include<emmintrin.h>
int main()
{
__m128i a = _mm_set_epi32(1,2,3,4);
__m128i b = _mm_set_epi32(1,2,3,4);
__m128i c;
const int32_t* q;
//add a pointer
c = _mm_add_epi32(a,b);
q = (const int32_t*) &c;
printf("%d\n",q[2]);
//printf("%d\n",c[2]);
return 0;
}
Try this code.

Recommended way to cast a boost cpp_int to a double?

I have some code were I avoid some costly divisions by converting a boost integer to a double. For the real code I will build an fp type that's big enough to hold the maximal value (exponent). To test I am using a double. So I do this:
#define NTYPE_BITS 512
typedef number<cpp_int_backend<NTYPE_BITS, NTYPE_BITS, unsigned_magnitude, unchecked, void> > NTYPE;
NTYPE a1 = BIG_VALUE;
double a1f = (double)a1;
The code generated for that cast is quite complicated. I see it's basically looping over all the values in a1 (least significant first) scaling them by powers of two.
Now in this case I guess at most the number of elements that could affect the result are the last two (64 bits for each element and the most significant element might have less that 64 bits used).
Is there a better way to do this?
First off, NEVER use C-Style casts. (Why use static_cast<int>(x) instead of (int)x?).
Second, avoid using namespace.
(Third, reserve all-caps names for macros).
That said:
double a1f = a1.convert_to<double>();
Is your ticket.
Live On Coliru
#include <boost/multiprecision/cpp_int.hpp>
#include <iostream>
namespace bmp = boost::multiprecision;
//0xDEADBEEFE1E104B1D00008BADF00D000ABADBABE000D15EA5E
#define BIG_VALUE "0xDEADBEEFE1E104B1D00008BADF00D000ABADBABE000D15EA5E"
#define NTYPE_BITS 512
int main() {
using NTYPE = bmp::number<
bmp::cpp_int_backend<
NTYPE_BITS, NTYPE_BITS,
bmp::unsigned_magnitude, bmp::unchecked, void>>;
NTYPE a1(BIG_VALUE);
std::cout << a1 << "\n";
std::cout << std::hex << a1 << "\n";
std::cout << a1.convert_to<double>() << "\n";
}
Prints
1397776821048146366831161011449418369017198837637750820563550
deadbeefe1e104b1d00008badf00d000abadbabe000d15ea5e
1.39778e+60

'round()' vs 'std::round()' and 'fabs()' vs 'std::fabs()' in C++ / GCC 4.8

By accident I was calling round() and fabs() instead of std::round() and std::fabs() and for the largest integer a long double can hold without loosing precision there was a difference.
Consider this test program round.cpp:
#include <iostream>
#include <iomanip>
#include <cstdint>
#include <limits>
#include <cmath>
using std::cout;
using std::endl;
using std::setw;
using std::setprecision;
void print(const char* msg, const long double ld)
{
cout << msg << setprecision(20) << ld << endl;
}
void test(const long double ld)
{
const long double ldRound = round(ld);
const long double ldStdRound = std::round(ld);
const long double ldFabs = fabs(ld);
const long double ldStdFabs = std::fabs(ld);
print("Rounding using 'round()': ", ldRound);
print("Rounding using 'std::round()': ", ldStdRound);
print("Absolute value using 'fabs()': ", ldFabs);
print("Absolute value using 'std::fabs()': ", ldStdFabs);
}
int main()
{
const int maxDigits = std::numeric_limits<long double>::digits;
const int64_t maxPosInt = 0xffffffffffffffff >> (64 - maxDigits + 1);
const long double maxPosLongDouble = (long double) maxPosInt;
cout << setw(20);
cout << "Max decimal digits in long double: " << maxDigits << endl;
cout << "Max positive integer to store in long double: " << maxPosInt << endl;
print("Corresponding long double: ", maxPosLongDouble);
test(maxPosLongDouble);
return 0;
}
When compiling with g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-36.0.1)
/usr/bin/g++ -std=c++11 round.cpp -o round
and then running it, the results are one larger for the non-std function compared to the std functions:
Max decimal digits in long double: 64
Max positive integer to store in long double: 9223372036854775807
Corresponding long double: 9223372036854775807
Rounding using 'round()': 9223372036854775808 <== one larger
Rounding using 'std::round()': 9223372036854775807
Absolute value using 'fabs()': 9223372036854775808 <== one larger
Absolute value using 'std::fabs()': 9223372036854775807
I get the exact same output (including 64 bits for long double) when I compile for 32 bits using option -m32. Looking at the disassembly (using gbd on the 32 bit executable) for function test() I get:
(gdb) disassemble test(long double)
Dump of assembler code for function _Z4teste:
0x080488c0 <+0>: push %ebp
...
0x080488d2 <+18>: call 0x8048690 <round#plt>
...
0x080488ee <+46>: call 0x8048b59 <_ZSt5rounde> (demangled: std::round(long double))
...
0x080488ff <+63>: fabs
...
0x08048918 <+88>: call 0x8048b4f <_ZSt4fabse> (demangled: std::fabs(long double))
...
0x080489a4 <+228>: leave
0x080489a5 <+229>: ret
End of assembler dump.
So it seems different function are called for round() and std::round(). For fabs() a floating point instruction is emitted whereas for std::fabs() a function call is emitted.
Can someone explain what is causing this difference and please tell me whether using std::round() and std::fabs() is the preferred portable choice?
As #Praetorian explains in the comments to the question above, the answer is very simple.
When including the C++ header <cmath> GCC brings a number of C++ math functions in the std namespace with appropriate overloads, for example:
float std::round(float)
double std::round(double)
long double std::round(long double)
float std::fabs(float)
double std::fabs(double)
long double std::fabs(long double)
However, it also brings into global scope the corresponding old C functions (same names) and as C does not support overloading these functions are only taking double as argument and returning double:
double round(double)
double fabs(double)
Therefore, the calls to round() and fabs() with no explicit namespace (and no using namespace std in the program) are calls to ::round() and ::fabs() which are the C functions that will then truncate the argument of type long double (64 bit precision) to double (53 bit precision) which explains the incorrect results.
Therefore, in C++ always ensure you either prefix with std:: or have an appropriate using declaration. I would recommend to be explicit calling std::round() and std::fabs().
P.S. In C there are also these functions if you need to handle float or long double:
float roundf(float)
long double roundl(long double)
float fabsf(float)
long double fabsl(long double)
P.P.S. In C++-14 you can also use std::abs() for any floating point type whereas in C++-11 std::abs() in only for integer types and std::fabs() is for floating point types.

difference in output using std::size_t and std::bitset for bit operations

Having following code:
#include <iostream>
#include <bitset>
#include <limits>
#include <limits.h>
using namespace std;
constexpr std::size_t maxBits = CHAR_BIT * sizeof(std::size_t);
int main() {
std::size_t value =47;
unsigned int begin=0;
unsigned int end=32;
//std::size_t allBitsSet(std::numeric_limits<std::size_t>::max());
std::bitset<maxBits> allBitsSet(std::numeric_limits<std::size_t>::max());
//std::size_t mask((allBitsSet >> (maxBits - end)) ^(allBitsSet >> (maxBits - begin)));
std::bitset<maxBits> mask = (allBitsSet >> (maxBits - end)) ^(allBitsSet >> (maxBits - begin));
//std::size_t bitsetValue(value);
std::bitset<maxBits> bitsetValue(value);
auto maskedValue = bitsetValue & mask;
auto result = maskedValue >> begin;
//std::cout << static_cast<std::size_t>(result) << std::endl;
std::cout << static_cast<std::size_t>(result.to_ulong()) << std::endl;
}
Which in fact should return the same value as value, but for some reason the version with std::bitset works just fine and version with std::size_t does not.
It is strange as such, because AFAIK std::bitset, when something is wrong simply throws exception and what is more AFAIK bitset should behave the same way as operations on unsigned integers, but as we can see even if bitset has same number of bits it does not behave the same. In fact it seems for me, that std::bitset works fine, while std::size_t does not.
My configuration is:
intel corei7 - g++-5.4.0-r3
[expr.shift]/1 ... The behavior [of the shift operator - IT] is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.
Emphasis mine. allBitsSet >> (maxBits - begin) (in the non-bitset version) exhibits undefined behavior.
On the other hand, the behavior of bitset::operator>> is well-defined: allBitsSet >> (maxBits - begin) produces a bitset with all zero bits.

How do the conversions between signed, unsigned and float types work?

The compiler I use is g++ (Ubuntu 4.8.4-2ubuntu1~14.04) 4.8.4.
I compile my programs with the following command:
g++ -std=c++11 -pedantic -Wall program.cpp
The program no. 1.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54;
cout << b << endl;
return 0;
}
The program prints 4294967242 and this is the value I expected, because this is the case when we assign an out-of-range value to a variable of unsigned type, so the result is the remainder of a modulo division.
The program no. 2.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = 54.1234;
cout << b << endl;
return 0;
}
The program prints 54, and this is also OK, because the stored value is the part before the decimal point, and the franctional part is truncated.
The program no. 3.:
#include <iostream>
using namespace std;
int main() {
unsigned int b;
b = -54.1234;
cout << b << endl;
return 0;
}
Here during compilation I get the warning "overflow in implicit constant conversion".
And the program prints 0. Why is it so? I thought that it will do the truncation of the fractional part (as in program 2) and then store the result of the modulo division (as in program 1).
But if I write program no. 4.:
program no. 4.
#include <iostream>
using namespace std;
int main() {
unsigned int b;
float k = -54.1234;
b = k;
cout << b << endl;
return 0;
}
then I get no warning, and I get the result (expected by me) 4294967242, which is the result of the modulo division.
I would be grateful if somebody can explain it to me.
Why doesn't the program no. 3 behave like program no. 4? Why don't I get a warning when compiling program no. 1, but I get one when compiling program no. 3.?
According to the standard (§[conv.fpint]).
A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.
So, your -54.1234 is truncated to -54. Since that can't be represented in an unsigned, you get undefined behavior.
When converting floating point numbers to integers, C and C++ round floating point numbers towards zero. The rounded result must then be representable in the destination type.
As a result, for 32 bit unsigned int the conversion is guaranteed to give the correct result if -1 < x < 2^32. For smaller numbers there are no guarantees. Since numbers between -1 and 0 must be rounded to zero, and numbers -1 and smaller have no requirements, it wouldn't be surprising if the compiler checks whether x < 0 and gives a result of 0 in that case. (The compiler might check whether x < 1 and give a result of 0; this handles very small positive numbers as well).

Resources