I am using a gcc command recommended by Maxim which has the -fsingle-precision-constant option. (For the purposes of this question, assume I have to use that option, for compatibility with Maxim stuff. Although, I must say, things seem to work just fine without that option.) Is there a documented way to write a double-precision constant when using this option?
I have found that appending "L" to the constant seems to make it double-precision. For example:
double d = 0.123456789123456L;
But I did a quick Google search and did not find any documentation for this syntax. As KamilCuk points out, the "L" suffix is documented in the C spec. However, the gcc documentation seems to override it:
-fsingle-precision-constant causes floating-point constants to be loaded in single precision even when this is not exact.
To make sense of the option in the first place, we have to assume that the gcc documentation overrides the C spec. Therefore, in my reading, constants ending in "L" are not exempt from the demotion effected by -fsingle-precision-constant.
Is there a documented way to write a double-precision constant when using -fsingle-precision-constant?
Related
With gmp rationals, do I have the duty to bookkeep my calls to canonicalize() (which can be costly performance-wise)? Does gmp know that the rational was not changed since the last call to canonicalize() and will just return if I attempt canonicalization?
I cannot find an answer in the documentation, and maybe someone already looked into the source for this.
It won't likely just return, as mpq_t does not contain any form of information whether fraction is already in canonical form or not. At least GMP documentation does not mention it as of 16.2 Rational Internals:
mpq_t variables represent rationals using an mpz_t numerator and
denominator (see Integer Internals).
In practice, it will likely call mpz_gcd() (or equivalent), to check whether numerator and denominator are coprimes or not.
As advised by an answer here, I turned on -Wbad-function-cast to see if my code had any bad behavior gcc could catch, and it turned up this example:
unsigned long n;
// ...
int crossover = (int)pow(n, .14);
(it's not critical here that crossover is an int; it could be unsigned long and the message would be the same).
This seems like a pretty ordinary and useful example of a cast. Why is this problematic? Otherwise, is there a reason to keep this warning turned on?
I generally like to set a lot of warnings, but I can't wrap my mind around the use case for this one. The code I'm working on is heavily numerical and there are lots of times that things are cast from one type to another as required to meet the varying needs of the algorithms involved.
You'd better to take this warning seriously.
If you want to get integer from floating-point result of pow, it is rounding operation, which must be done with one of standard rounding functions like round. Doing this with integer cast may yield in surprises: you generally loose the fractional part and for instance 2.76 may end up as 2 with integer truncation, just as 2.12 would end up as 2. Even if you want this behavior, you'd better to specify it explicitly with floor function. This will increase readability and supportability of your code.
The utility of the -Wbad-function-cast warning is limited.
Likely, it is no coincidence that neither -Wall nor -Wextra enable that warning. As well as it is not available for C++ (it is C/Objective-C only).
Your concrete example doesn't exploit undefined behavior nor implementation defined behavior (cf. ISO C11, Section 6.3.1.4). Thus, this warning gives you zero benefits.
In contrast, if you try to rewrite your code to make -Wbad-function-cast happy you just add superfluous function calls that even recent GCC/Clang compilers don't optimize away with -O3:
#include <math.h>
#include <fenv.h>
int f(unsigned n)
{
int crossover = lrint(floor(pow(n, .14)));
return crossover;
}
(negative example, no warning emitted with -Wbad-function-cast but superfluous function calls)
Is it possible to set the trigonometric functions to use degrees instead of radians?
Short answer
No, this is not possible. I'd suggest to define alternative functions, and work with those: sinDeg[d_] := Sin[d Degree]. Or just use Degree explicitly: Sin[30 Degree]. (Try also entering ESC deg ESC.)
Longer answer
You can Unprotect the functions, and re-define them using the Gayley-Villegas trick, but this is very likely to break several things in Mathematica, as I expect it is using these functions internally.
Since this is such a nasty thing to do, I'm not going to give a code example, instead I'll leave it to you to figure out based on my link above. :-)
I think the output is based on the input. So for example Cos[60 Degree] will output in degrees.
Consider the following example:
int a[4];
int main() {
a[4] = 12; // <--
return 0;
}
This is clearly an out of bounds error, is it not? I was wondering when gcc would warn about this, and found that it will only do so if optimisation is -O2 or higher (this is affected by the -ftree-vrp option that is only set automatically for -O2 or higher).
I don't really see why this makes sense and whether it is correct that gcc does not warn otherwise.
The documentation has this to say about the matter:
This allows the optimizers to remove unnecessary range checks like array bound checks and null pointer checks.
Still, I don't see why that check should be unnecessary?
Your example is a case of constant propagation, not value range propagation, and it certainly triggers a warning on my version of gcc (4.5.1) whether or not -ftree-vrp is enabled.
In general, Java and Fortran are the only languages supported by gcc which (Java by default, and Fortan if you explicitly ask for it with -fbounds-check) will generate code for checking array bounds.
However, although C/C++ does not support any such thing, the compiler will still warn you at compile time if it believes that something is amiss. For constants, this is pretty obvious, for variable ranges, it is somewhat harder.
The clause "allows the compiler to remove unnecessary range checks" relates to cases where for example you use an unsigned 8 bit wide variable to index into an array that has >256 entries or an unsigned 16 bit value to index into an array of >65536 elements. Or, if you iterate over an array in a loop, and the (variable) loop counter is bounded by values that can be proven as compile-time constants which are legal array indices, so the counter can never possibly go beyond the array bounds.
In such cases, the compiler will neither warn you nor generate any code for target languages where this is supported.
I have medium size C99 program which uses long double type (80bit) for floating-point computation. I want to improve precision with new GCC 4.6 extension __float128. As I get, it is a software-emulated 128-bit precision math.
How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision?
What need I change? Compiler flags, sources?
My program have reading of full precision values with strtod, doing a lot of different operations on them (like +-*/ sin, cos, exp and other from <math.h>) and printf-ing of them.
PS: despite that float128 is declared only for Fortran (REAL*16), the libquadmath is written in C and it uses float128. I'm unsure will GCC convert operations on float128 to runtime library or not and I'm unsure how to migrate from long double to __float128 in my sources.
PPS: There is a documentation on "C" language gcc mode: http://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html
"GNU C compiler supports ... 128 bit (TFmode) floating types. Support for additional types includes the arithmetic operators: add, subtract, multiply, divide; unary arithmetic operators; relational operators; equality operators ... __float128 types are supported on i386, x86_64"
How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources?
You need recent software, GCC version with support of __float128 type (4.6 and newer) and libquadmath (supported only on x86 and x86_64 targets; in IA64 and HPPA with newer GCC). You should add linker flag -lquadmath (the cannot find -lquadmath' will show that you have no libquadmath installed)
Add #include <quadmath.h> header to have macro and function definitions.
You should modify all long double variable definitions to __float128.
Complex variables may be changed to __complex128 type (quadmath.h) or directly with typedef _Complex float __attribute__((mode(TC))) _Complex128;
All simple arithmetic operations are automatically handled by GCC (converted to calls of helper functions like __*tf3()).
If you use any macro like LDBL_*, replace them with FLT128_* (full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
If you need some specific constants like pi (M_PI) or e (M_E) with quadruple precision, use predefined constants with q suffix (M_*q), like M_PIq and M_Eq (full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
User-defined constants may be written with Q suffix, like 1.3000011111111Q
All math function calls should be replaced with *q versions, like sqrtq(), sinq() (full list http://gcc.gnu.org/onlinedocs/libquadmath/Math-Library-Routines.html#Math-Library-Routines)
Reading quad-float from string should be done with __float128 strtoflt128 (const char *s, char **sp) - http://gcc.gnu.org/onlinedocs/libquadmath/strtoflt128.html#strtoflt128 (Warning, in older libquadmaths there may be some bugs in strtoflt128, do a double check)
Printing the __float128 is done with help of quadmath_snprintf function. On linux distributions with recent glibc the function will be automagically registered by libquadmath to handle Q (may be also q) length modifier of a, A, e, E, f, F, g, G conversion specifiers in all printfs/sprintfs, like it did L for long doubles. Example: printf ("%Qe", 1.2Q), http://gcc.gnu.org/onlinedocs/libquadmath/quadmath_005fsnprintf.html#quadmath_005fsnprintf
You should also know, that since 4.6 Gfortran will use __float128 type for DOUBLE PRECISION, if the option -fdefault-real-8 was given and there were no option -fdefault-double-8. This may be problem, since 128 long double is much slower than standard long double on many platforms due to software computation. (Thanks to post by glennglockwood http://glennklockwood.blogspot.com/2014/02/linux-perf-libquadmath-and-gfortrans.html)