How to use _mm256_log_ps by leveraging Intel OpenCL SVML?

How to use _mm256_log_ps by leveraging Intel OpenCL SVML? - gcc

I found that _mm256_log_ps can't be used with GCC7. Most common suggestions on stackoverflow is to use ICC or leveraging OpenCL SDK.
After downloading SDK and extracting RPM file, there are three .so files: __ocl_svml_l9.so, __ocl_svml_e9.so, __ocl_svml_h8.so
Can someone teach me how to call _mm256_log_ps with these .so files?
Thank you.

You can use the log function from the Eigen library:
#include <Eigen/Core>
void foo(float* data, int size)
{
Eigen::Map<Eigen::ArrayXf> arr(data, size);
arr = arr.log();
}
Depending on the compile flags this generates optimized SSE or AVX code (as well as SIMD for other architectures). The implementation is based on http://gruntthepeon.free.fr/ssemath/ which is based on cephes.

Related

How to call C++ code in llvm ... IRBuilder?

I am using LLVM to build a simple language along the lines demonstrated in the LLVM kaleidoscope documentation. However the doc omits two important things (at least):
how to call C++ code
how to do strings
You can imagine a custom language allows constructs like,
String str("Hello World");
String str1(" and StackOverflow");
str.append(str1);
Since LLVM ships with SmallString, the obvious thing to do is treat the user type String as a llvm::SmallString and emit code for it. But how to do this? I see very little sense in creating a String class from scratch provided there's a way to invoke those C++ calls from LLVM. Would one use IRBuilder to build out the C++ calls?
Another option is to create C++ file containing the equivalent C++ code for the user provided code in terms of llvm::SmallString intermixed with inlined IR code ala the kaleidoscope demo ... and pass the whole thing to Clang/LLVM to compile down to object code. But Clang/LLVM doesn't allow intermixing this way.
Putting aside technicalities of creating static strings that go into the initializer how do people do this? Once this is sorted out for strings, most likely the approach would work for sets, hashmaps, vectors, and all the various other containers LLVM ships with.
LLVM is a strange, cool thing. The things I need are right there, and not readily accessible based on any documentation I can find.
I want a considered approach because even this simple hello world program generates 81 lines of IR code. Replace the llvm::SimpleString with a C-string and the resulting IR code is smaller almost by an order of 10:
#include <llvm/ADT/SmallString.h>
int main(int argc, char **argv)
{
llvm::SmallString<32> str("hello world\n");
printf("%s\n", str.c_str());
return 0;
}
That's a lot of code to hand assemble and we're no where close to supporting all the methods doable for strings.
Thanks.

What is the correct jmp_buf size?

I got a library compiled wit GCC for the ARM Cortex-M3 processor compiled as static lib. This library has a jmp_buf at its interface.
struct png_struct_def {
#ifdef PNG_SETJMP_SUPPORTED
jmp_buf jmpbuf;
#endif
png_error_ptr error_fn;
// a lot more members ...
};
typedef png_struct_def png_struct;
When I pass a png_struct address to the library function it stores the value not to error_fn but the last field of jmpbuf. Obviously it has been compiled with another assumptions of the size of a jmp_buf.
Both compiler versions are arm-none-eabi-gcc. Why is the code incompatible. And what is the "correct" jmp_buf size? I can see in the disassembly that only the first half of the jmp_buf is used. Why does the size changes between the versions of GCC when it is too large anyway?
Edit:
The library is compiled with another library that I can't recompile because the source code is not available. This other library uses this interface. So I can't change the structure of the interface.

You may simply re-order the data declaration. I would suggest the following,
typedef struct if {
int some_value;
union
{
jmp_buf jmpbuf;
char pad[511];
} __attribute__ ((__transparent_union__));
} *ifp;
The issue is that depending on the ARM library, different registers maybe saved. At a maximum, 16 32bit general purpose registers and 32 64bit NEON registers might be saved. This gives around 320 bytes. If you struct is not used many times, then you can over-allocate. This should work no matter which definition of jmp_buf you get.
If you can not recompile the library, you may try to use,
typedef struct if {
char pad[LIB_JMPBUF_SZ];
int some_value;
} *ifp;
where you calculated the jmp_buf size. The libc may have changed the definition of jmp_buf between versions. Also, even though the compiler names match, one may support floating point and another one does not, etc. Even if the versions match, it is conceivable that the compiler configuration can give different jmp_buf sizes.
Both suggestion are non-portable. The 2nd suggestion will not work if your code calls setjmp() or longjmp(). Ie, I assume that the library is using these functions and the caller allocates the space.

LAPACK with gcc startup guide

I need help to setup LAPACK in Linux gcc. I am new to LAPACK and have no knowledge in using Fortran.
I have downloaded lapack-3.4.0, and make the libraries to get
liblapack.a and librefblas.a.
Afterwards, i link these libraries to my program:
-llapack -lrefblas
I wanted to use the LAPACK functions like dpotrf, dgetrf, dgetri etc,
How do I include the header files in order for my compiler to find the functions? Is it neccessary for me to use lapacke, a C interface to LAPACK?

There are several ways of using LAPACK with c++. Here's what I'd do (assuming you are on some *nix system).
Make sure you have the right libraries and you know the right set of compiler/linker options. Since these are written in Fortran, I'd start with a Fortran code. Like this one. Make sure you can compile it using gfortran. Possible linker options can be (depending on your system): -llapack, -lblas or some combinations of these.
Then move onto using C++ interface. Again, there are multiple ways of doing. I personally find it the easiest to use the clapack interface, where you are declaring the LAPACk functions like this:
extern "C" void dsyev_( char *jobz, char *uplo, int &n, double *a, int &lda, double *w, double *work, int &lwork, int &info);
The right set of linker options again depends on your system and can be something like this: -llapack -lf77blas -latlas (this set works on my Ubuntu where LAPACK comes from the atlas package).

Getting GCC to compile without inserting call to memcpy

I'm currently using GCC 4.5.3, compiled for PowerPC 440, and am compiling some code that doesn't require libc. I don't have any direct calls to memcpy(), but the compiler seems to be inserting one during the build.
There are linker options like -nostdlib, -nostartfiles, -nodefaultlibs but I'm unable to use them as I'm not doing the linking phase. I'm only compiling. With something like this:
$ powerpc-440-eabi-gcc -O2 -g -c -o output.o input.c
If I check the output.o with nm, I see a reference to memcpy:
$ powerpc-440-eabi-nm output.o | grep memcpy
U memcpy
$
The GCC man page makes it clear how to remove calls to memcpy and other libc calls with the linker, but I don't want the compiler to insert them in the first place, as I'm using a completely different linker (not GNU's ld, and it doesn't know about libc).
Thanks for any help you can provide.

There is no need to -fno-builtins or -ffreestanding as they will unnecessarily disable many important optimizations
This is actually "optimized" by gcc's tree-loop-distribute-patterns, so to disable the unwanted behavior while keeping the useful builtin capabilities, you can just use:
-fno-tree-loop-distribute-patterns
Musl-libc uses this flag for its build and has the following note in their configure script (I looked through the source and didn't find any macros, so this should be enough)
# Check for options that may be needed to prevent the compiler from
# generating self-referential versions of memcpy,, memmove, memcmp,
# and memset. Really, we should add a check to determine if this
# option is sufficient, and if not, add a macro to cripple these
# functions with volatile...
# tryflag CFLAGS_MEMOPS -fno-tree-loop-distribute-patterns
You can also add this as an attribute to individual functions in gcc using its optimize attribute, so that other functions can benefit from calling mem*()
__attribute__((optimize("no-tree-loop-distribute-patterns")))
size_t strlen(const char *s){ //without attribute, gcc compiles to jmp strlen
size_t i = -1ull;
do { ++i; } while (s[i]);
return i;
}
Alternatively, (at least for now) you may add a confounding null asm statement into your loop to thwart the pattern recognition.
size_t strlen(const char *s){
size_t i = -1ull;
do {
++i;
asm("");
} while (s[i]) ;
return i;
}

Gcc emits call to memcpy in some circumstance, for example if you are copying a structure.
There is no way to change GCC behaviour but you can try to avoid this by modifying your code to avoid such copy. Best bet is to look at the assembly to figure out why gcc emitted the memcpy and try to work around it. This is going to be annoying though, since you basically need to understand how gcc works.
Extract from http://gcc.gnu.org/onlinedocs/gcc/Standards.html:
Most of the compiler support routines used by GCC are present in libgcc, but there are a few exceptions. GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp. Finally, if __builtin_trap is used, and the target does not implement the trap pattern, then GCC will emit a call to abort.

You need to disable a that optimization with -fno-builtin. I had this problem once when trying to compile memcpy for a C library. It called itself. Oops!

You can also make your binary a "freestanding" one:
The ISO C standard defines (in clause 4) two classes of conforming implementation. A conforming hosted implementation supports the whole standard [...]; a conforming freestanding implementation is only required to provide certain library facilities: those in , , , and ; since AMD1, also those in ; and in C99, also those in and . [...].
The standard also defines two environments for programs, a freestanding environment, required of all implementations and which may not have library facilities beyond those required of freestanding implementations, where the handling of program startup and termination are implementation-defined, and a hosted environment, which is not required, in which all the library facilities are provided and startup is through a function int main (void) or int main (int, char *[]).
An OS kernel would be a freestanding environment; a program using the facilities of an operating system would normally be in a hosted implementation.
(paragraph added by me)
More here. And the corresponding gcc option/s (keywords -ffreestanding or -fno-builtin) can be found here.

This is quite an old question, but I've hit the same issue, and none of the solutions here worked.
So I defined this function:
static __attribute__((always_inline)) inline void* imemcpy (void *dest, const void *src, size_t len) {
char *d = dest;
const char *s = src;
while (len--)
*d++ = *s++;
return dest;
}
And then used it instead of memcpy. This has solved the inlining issue for me permanently. Not very useful if you are compiling some sort of library though.

How do I use LAPACK in a Visual Studio 2008 project using Armadillo

I'm trying to use an open source library http://arma.sourceforge.net for linear algebra calculations. Some of the functions in Armadillo like pinv use LAPACK. I've written a very simple piece of code to use Armadillo to calculate pinv, but it produces a runtime error. This is probably because I do not have LAPACK linker flags in the sln file.
#include <iostream>
#include "armadillo"
using namespace arma;
using namespace std;
int main(int argc, char** argv)
{
mat A = rand<mat>(4,5);
mat pinverse = pinv(A);
A.print("A=");
return 0;
}

First things first, do you have LAPACK library? If not, get one (there's a number of implementations to choose). Otherwise, check that library's documentation or readme. There's nothing specific to Visual C++ here.
Usually all that's needed is:
add "lapack.lib" to linker input (in project settings).

In order to use LAPACK, assuming you are linking the libs to your project, you also need to uncomment #define ARMA_USE_LAPACK in Armadillo's config.hpp. Same thing goes for BLAS.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to use _mm256_log_ps by leveraging Intel OpenCL SVML? - gcc

Related

How to call C++ code in llvm ... IRBuilder?

What is the correct jmp_buf size?

LAPACK with gcc startup guide

Getting GCC to compile without inserting call to memcpy

How do I use LAPACK in a Visual Studio 2008 project using Armadillo

Categories

Resources