gcc vector extensions don't work as stated in docs

gcc vector extensions don't work as stated in docs - gcc

As per Using vector instructions through built-in functions, this program should compile:
int main(){
double v_sse __attribute__ ((vector_size (16)));
/*
* Should work: "For the convenience in C it is allowed to use a binary vector operation where one operand is a scalar."
*/
v_sse=v_sse+3.4;
/*
* Should work: "Vectors can be subscripted as if the vector were an array with the same number of elements and base type."
*/
double result=v_sse[0];
}
Instead I get errors at both operations, complaining about invalid operand/types.
I compile on a x86-64 system, so -msse2 is implicit, and my compiler is 4.6.3 (tested also with 4.7.0, it doesn't work). Where's the catch?

The documentation at http://gcc.gnu.org/onlinedocs/gcc/ refers to current development.
Look at http://gcc.gnu.org/onlinedocs/gcc-X.Y.Z/ to find the documentation for the version you're using (see the index at http://gcc.gnu.org/onlinedocs/ for links to the docs for the most recent point release of each series).
In this case:
the binary operator functionality is not documented for 4.6.3 - because it was introduced as a new feature in 4.7: see the 4.7 release notes;
the subscript feature is present in 4.6.3; and
both features are specifically documented as working "in C"
...which explains what you're seeing.
Both of these do work with 4.7.0 when compiling as C -- but not when compiling as C++.

Related

Building coreboot: undefined reference __udivmoddi4

In building coreboot I got error regarding during linking:
coreboot/src/console/vtxprintf.c:102: undefined reference to '__udivmoddi4'.
Where can I find the library containing this function?
I'm building coreboot for x86_64 (Lenovo x230) using gcc (8.1.1 20180531).
Coreboot - git hash: f59a052ee8dae6f1378514cb622d229e652ad2f6

__udivmoddi4 is a function in libgcc which is used to implement a combined unsigned division/modulo operation for what GCC calls DI mode (doubled-up integers, 64-bit on i686). It is used for operations like this one:
unsigned long long
div (unsigned long long a, unsigned long long b, unsigned long long *p)
{
*p = a % b;
return a / b;
}
The use of __udivmoddi4 on i386 is a new optimization in GCC 7, related to this patch. Previous versions emitted separate calls to __umoddi3 and __udivdi3, basically doing the same work twice.
Normally, all these functions are provided by libgcc, but Coreboot does not link against the standard libraries. It supplies its own implementations of these functions in payloads/libpayload/libc/64bit_div.c, but __udivmoddi4 has yet to be added there.
Either you implement the function yourself (easiest way would be to call __umoddi3 and __udivdi3 from it), or you use GCC 6 to compile Coreboot for the time being. Lowering the optimization level just for the printf implementation might constitute a workaround.

I faced the same problem today, and found out that Coreboot is made to be built using the i386 toolchain, not the x86_64. See https://doc.coreboot.org/tutorial/part1.html:
Note that the i386 toolchain is currently used for all x86 platforms, including x86_64.
After cloning coreboot, one should run make crossgcc-i386 CPUS=$(nproc) to build a compatible toolchain.
Here is the mailing list that gave me the hint:
https://mail.coreboot.org/pipermail/coreboot/2018-August/087195.html

Storing pairs in a GCC rope with c++11

I'm using a GCC extension rope to store pairs of objects in my program and am running into some C++11 related trouble. The following compiles under C++98
#include <ext/rope>
typedef std::pair<int, int> std_pair;
int main()
{
__gnu_cxx::rope<std_pair> r;
}
but not with C++11 under G++ 4.8.2 or 4.8.3.
What happens is that the uninitialised_copy_n algorithm is pulled in from two places, the ext/memory and the C++11 version of the memory header. The gnu_cxx namespace is pulled in by rope and the std namespace is pulled in by pair and there are now two identically defined methods in scope leading to a compile error.
I assume this is a bug in a weird use case for a rarely used library but what would be the correct fix? You can't remove the function from ext/memory to avoid breaking existing code and it now required to be in std. I've worked around it using my own pair class but how should this be fixed properly?

If changing the libstdc++ headers is an option (and I asked in the comments whether you were looking for a way to fix it in libstdc++, or work around it in your program), then the simple solution, to me, seems to be to make sure there is only one uninitialized_copy_n function. ext/memory already includes <memory>, which provides std::uninitialized_copy_n. So instead of defining __gnu_cxx::uninitialized_copy_n, it can have using std::uninitialized_copy_n; inside the __gnu_cxx namespace. It can even conditionalize this on C++11 support, so that pre-C++11 code gets the custom implementation of those functions, and C++11 code gets the std implementation of those functions.
This way, code that attempts to use __gnu_cxx::uninitialized_copy_n, whether directly or through ADL, will continue to work, but there is no ambiguity between std::uninitialized_copy_n and __gnu_cxx::uninitialized_copy_n, because they are the very same function.

GCC atomic built-ins: Is there a list showing which are supported on which platform?

Is there a site listing the various platforms and their support for GCC's atomic built-ins, for the various GCC versions?
EDIT:
To be more clear:
GCC adds _sync... as intrinsics on platforms it contains support for. On all other platforms it keeps those as normal functions declarations but does not supply an implementation. This must be done by some framework.
So the question is: For which platforms does GCC supply which intrinsics without need to add a function implementation?

I'm not aware if there's such a list, however http://gcc.gnu.org/projects/cxx0x.html says atomics are supported since GCC 4.4.
GCC libstdc++ implements <atomic> on top of the builtin functions `__sync_fetch_and_add' and friends ( http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Atomic-Builtins.html ).
These functions are expanded either using machine specific expanders in the machine description of the target (usually in a file named `sync.md') or, lacking such expanders, using a CAS loop. If the presense of `sync.md' file is any indication for a proper atomics support, then you can count in MIPS, i386, ARM, BlackFin, Alpha, PowerPC, IA64 and Sparc.

[Though this is an old question, I thought I should update and complete the answer]
I am not aware of a per-architecture-version and per-gcc-version table, describing supported built-ins.
The __sync built-in functions of gcc exist since version 4.1 (see, e.g., gcc 4.1.2 manual. As stated there:
Not all operations are supported by all target processors. If a particular operation cannot be implemented on the target processor, a warning will be generated and a call an external function will be generated. The external function will carry the same name as the builtin, with an additional suffix `_n' where n is the size of the data type.
So, when there is not an implementation for a specific architecture, a compilation warning will appear and, I guess, a link-time error, unless you provide the required function with the appropriate name.
After gcc 4.7 there are also __atomic built-ins and __sync built-ins are deprecated.
For example, see how Fedora uses gcc __sync and __atomic here

Where Is gcvt or gcvtf Defined in gcc Source Code?

I'm working on some old source code for an embedded system on an m68k target, and I'm seeing massive memory allocation requests sometimes when calling gcvtf to format a floating point number for display. I can probably work around this by writing my own substitute routine, but the nature of the error has me very curious, because it only occurs when the heap starts at or above a certain address, and it goes away if I hack the .ld linker script or remove any set of global variables (which are placed before the heap in my memory map) that add up to enough byte size so that the heap starts below the mysterious critical address.
So, I thought I'd look in the gcc source code for the compiler version I'm using (m68k-elf-gcc 3.3.2). I downloaded what appears to be the source for this version at http://gcc.petsads.us/releases/gcc-3.3.2/, but I can't find the definition for gcvt or gcvtf anywhere in there. When I search for it, grep only finds some documentation and .h references, but not the definition:
$ find | xargs grep gcvt
./gcc/doc/gcc.info: C library functions `ecvt', `fcvt' and `gcvt'. Given va
lid
./gcc/doc/trouble.texi:library functions #code{ecvt}, #code{fcvt} and #code{gcvt
}. Given valid
./gcc/sys-protos.h:extern char * gcvt(double, int, char *);
So, where is this function actually defined in the source code? Or did I download the entirely wrong thing?
I don't want to change this project to use the most recent gcc, due to project stability and testing considerations, and like I said, I can work around this by writing my own formatting routine, but this behavior is very confusing to me, and it will grind my brain if I don't find out why it's acting so weird.

Wallyk is correct that this is defined in the C library rather than the compiler. However, the GNU C library is (nearly always) only used with Linux compilers and distributions. Your compiler, being a "bare-metal" compiler, almost certainly uses the Newlib C library instead.
The main website for Newlib is here: http://sourceware.org/newlib/, and this particular function is defined in the newlib/libc/stdlib/efgcvt.c file. The sources have been quite stable for a long time, so (unless this is a result of a bug) chances are pretty good that the current sources are not too different from what your compiler is using.
As with the GNU C source, I don't see anything in there that would obviously cause this weirdness that you're seeing, but it's all eventually a bunch of wrappers around the basic sprintf routines.

It is in the GNU C library as glibc/misc/efgcvt.c. To save you some trouble, the code for the function is:
char *
__APPEND (FUNC_PREFIX, gcvt) (value, ndigit, buf)
FLOAT_TYPE value;
int ndigit;
char *buf;
{
sprintf (buf, "%.*" FLOAT_FMT_FLAG "g", MIN (ndigit, NDIGIT_MAX), value);
return buf;
}
The directions for obtain glibc are here.

Getting GCC to compile without inserting call to memcpy

I'm currently using GCC 4.5.3, compiled for PowerPC 440, and am compiling some code that doesn't require libc. I don't have any direct calls to memcpy(), but the compiler seems to be inserting one during the build.
There are linker options like -nostdlib, -nostartfiles, -nodefaultlibs but I'm unable to use them as I'm not doing the linking phase. I'm only compiling. With something like this:
$ powerpc-440-eabi-gcc -O2 -g -c -o output.o input.c
If I check the output.o with nm, I see a reference to memcpy:
$ powerpc-440-eabi-nm output.o | grep memcpy
U memcpy
$
The GCC man page makes it clear how to remove calls to memcpy and other libc calls with the linker, but I don't want the compiler to insert them in the first place, as I'm using a completely different linker (not GNU's ld, and it doesn't know about libc).
Thanks for any help you can provide.

There is no need to -fno-builtins or -ffreestanding as they will unnecessarily disable many important optimizations
This is actually "optimized" by gcc's tree-loop-distribute-patterns, so to disable the unwanted behavior while keeping the useful builtin capabilities, you can just use:
-fno-tree-loop-distribute-patterns
Musl-libc uses this flag for its build and has the following note in their configure script (I looked through the source and didn't find any macros, so this should be enough)
# Check for options that may be needed to prevent the compiler from
# generating self-referential versions of memcpy,, memmove, memcmp,
# and memset. Really, we should add a check to determine if this
# option is sufficient, and if not, add a macro to cripple these
# functions with volatile...
# tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
You can also add this as an attribute to individual functions in gcc using its optimize attribute, so that other functions can benefit from calling mem*()
__attribute__((optimize("no-tree-loop-distribute-patterns")))
size_t strlen(const char *s){ //without attribute, gcc compiles to jmp strlen
size_t i = -1ull;
do { ++i; } while (s[i]);
return i;
}
Alternatively, (at least for now) you may add a confounding null asm statement into your loop to thwart the pattern recognition.
size_t strlen(const char *s){
size_t i = -1ull;
do {
++i;
asm("");
} while (s[i]) ;
return i;
}

Gcc emits call to memcpy in some circumstance, for example if you are copying a structure.
There is no way to change GCC behaviour but you can try to avoid this by modifying your code to avoid such copy. Best bet is to look at the assembly to figure out why gcc emitted the memcpy and try to work around it. This is going to be annoying though, since you basically need to understand how gcc works.
Extract from http://gcc.gnu.org/onlinedocs/gcc/Standards.html:
Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC will emit a call to abort.

You need to disable a that optimization with -fno-builtin. I had this problem once when trying to compile memcpy for a C library. It called itself. Oops!

You can also make your binary a "freestanding" one:
The ISO C standard defines (in clause 4) two classes of conforming implementation. A conforming hosted implementation supports the whole standard [...]; a conforming freestanding implementation is only required to provide certain library facilities: those in , , , and ; since AMD1, also those in ; and in C99, also those in and . [...].
The standard also defines two environments for programs, a freestanding environment, required of all implementations and which may not have library facilities beyond those required of freestanding implementations, where the handling of program startup and termination are implementation-defined, and a hosted environment, which is not required, in which all the library facilities are provided and startup is through a function int main (void) or int main (int, char *[]).
An OS kernel would be a freestanding environment; a program using the facilities of an operating system would normally be in a hosted implementation.
(paragraph added by me)
More here. And the corresponding gcc option/s (keywords -ffreestanding or -fno-builtin) can be found here.

This is quite an old question, but I've hit the same issue, and none of the solutions here worked.
So I defined this function:
static __attribute__((always_inline)) inline void* imemcpy (void *dest, const void *src, size_t len) {
char *d = dest;
const char *s = src;
while (len--)
*d++ = *s++;
return dest;
}
And then used it instead of memcpy. This has solved the inlining issue for me permanently. Not very useful if you are compiling some sort of library though.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

gcc vector extensions don't work as stated in docs - gcc

Related

Building coreboot: undefined reference __udivmoddi4

Storing pairs in a GCC rope with c++11

GCC atomic built-ins: Is there a list showing which are supported on which platform?

Where Is gcvt or gcvtf Defined in gcc Source Code?

Getting GCC to compile without inserting call to memcpy

Categories

Resources