I am trying to use lapack functions from C.
Here is some test code, copied from this question
#include <stdlib.h>
#include <stdio.h>
#include <time.h>
#include "clapack.h"
#include "cblas.h"
void invertMatrix(float *a, unsigned int height){
int info, ipiv[height];
info = clapack_sgetrf(CblasColMajor, height, height, a, height, ipiv);
info = clapack_sgetri(CblasColMajor, height, a, height, ipiv);
}
void displayMatrix(float *a, unsigned int height, unsigned int width)
{
int i, j;
for(i = 0; i < height; i++){
for(j = 0; j < width; j++)
{
printf("%1.3f ", a[height*j + i]);
}
printf("\n");
}
printf("\n");
}
int main(int argc, char *argv[])
{
int i;
float a[9], b[9], c[9];
srand(time(NULL));
for(i = 0; i < 9; i++)
{
a[i] = 1.0f*rand()/RAND_MAX;
b[i] = a[i];
}
displayMatrix(a, 3, 3);
return 0;
}
I compile this with gcc:
gcc -o test test.c \
-lblas -llapack -lf2c
n.b.: I've tried those libraries in various orders, I've also tried others libs like latlas, lcblas, lgfortran, etc.
The error message is:
/tmp//cc8JMnRT.o: In function `invertMatrix':
test.c:(.text+0x94): undefined reference to `clapack_sgetrf'
test.c:(.text+0xb4): undefined reference to `clapack_sgetri'
collect2: error: ld returned 1 exit status
clapack.h is found and included (installed as part of atlas). clapack.h includes the offending functions --- so how can they not be found?
The symbols are actually in the library libalapack (found using strings). However, adding -lalapack to the gcc command seems to require adding -lcblas (lots of undefined cblas_* references). Installing cblas automatically uninstalls atlas, which removes clapack.h.
So, this feels like some kind of dependency hell.
I am on FreeBSD 10 amd64, all the relevant libraries seem to be installed and on the right paths.
Any help much appreciated.
Thanks
Ivan
I uninstalled everything remotely relevant --- blas, cblas, lapack, atlas, etc. --- then reinstalled atlas (from ports) alone, and then the lapack and blas packages.
This time around, /usr/local/lib contained a new lib file: libcblas.so --- previous random installations must have deleted it.
The gcc line that compiles is now:
gcc -o test test.c \
-llapack -lblas -lalapack -lcblas
Changing the order of the -l arguments doesn't seem to make any difference.
Related
I have a short program that is causing Segmentation Fault on RPi4 after run several times (e.g.: 10 times in a loop).
I am using Raspbian GNU/Linux 10 (buster) and default gcc compiler (sudo apt install build-essential)
gcc --version
gcc (Raspbian 8.3.0-6+rpi1) 8.3.0
Do you think this is a gcc compiler problem? Maybe I am missing some special settings for RPi4.
I am using this to build:
gcc threads.c -o threads -l pthread
The output is sometimes (not always) something like this:
...
in thread_dummy, loop: 003
Segmentation fault
The code is here:
#include <stdio.h> /* for puts() */
#include <unistd.h> /* for sleep() */
#include <stdlib.h> /* for EXIT_SUCCESS */
#include <pthread.h>
#define PTR_SIZE (0xFFFFFF)
#define PTR_CNT (10)
void* thread_dummy(void* param)
{
void* ptr = malloc(PTR_SIZE);
//fprintf(stderr, "thread num: %03i, stack: %08X, heap: %08X - %08X\n", (int)param, (unsigned int)¶m, (unsigned int)ptr, (unsigned int)((unsigned char*)ptr + PTR_SIZE));
fprintf(stderr, "in thread_dummy, loop: %03i\n", (int)param);
sleep(1);
free(ptr);
pthread_detach(pthread_self());
return NULL;
}
int main(void)
{
void* ptrs[PTR_CNT];
pthread_t threads[PTR_CNT];
for(int i=0; i<PTR_CNT; ++i)
{
ptrs[i] = malloc(PTR_SIZE);
//fprintf(stderr, "main num: %03i, stack: %08X, heap: %08X - %08X\n", i, (unsigned int)&ptrs, (unsigned int)ptrs[i], (unsigned int)((unsigned char*)ptrs[i] + PTR_SIZE));
fprintf(stderr, "in main, loop: %03i\n", i);
}
fprintf(stderr, "-----------------------------------------------------------\n");
for(int i=0; i<PTR_CNT; ++i)
pthread_create(&threads[i], 0, thread_dummy, (void*)i);
for(int i=0; i<PTR_CNT; ++i)
pthread_join(threads[i], NULL);
for(int i=0; i<PTR_CNT; ++i)
free(ptrs[i]);
return EXIT_SUCCESS;
}
UPDATE:
I also tested it with new gcc, but the problem remains...
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/local/libexec/gcc/arm-linux-gnueabihf/11.1.0/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ../configure --enable-languages=c,c++,fortran --with-cpu=cortex-a72 --with-fpu=neon-fp-armv8 --with-float=hard --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 11.1.0 (GCC)
pthread_create is like malloc, and pthread_detach or pthread_join is like free. You are basically doing something like "double free" - you detach a thread and join it at the same time. Either detach or join the thread.
You could remove pthread_join from main. But you should logically remove pthread_detach(...) from inside the thread, which is actually useless because the thread terminates right after anyway.
While trying to learn dynamic assignment in C++, every time I use the "new" operator, the code fails to compile.
I have already tried to use malloc and the other functions. They worked for a time, but now I need to dynamically declare an object and I can't seem to get malloc to work with that. (Also, I should be able to use new so...)
#include <stdio.h>
#include <stdlib.h>
int main(){
int * x = new int[10];
for(int i = 0; i < 10; i++){
x[i] = i;
}
for(int i = 0; i < 10; i++){
printf("%d\n", x[i]);
}
}
When I try to compile with "gcc main.cpp -o main.exe"
it always gives me the same error:
[temporary file name]cc8cDkmk.o:main.cpp:(.text+0x13): undefined reference to `operator new[](unsigned long long)'
I wrote a short program in C with OpenMP pragma, and I need to know to which libGOMP function a pragma is translated by GCC.
Here is my marvelous code:
#include <stdio.h>
#include "omp.h"
int main(int argc, char** argv)
{
int k = 0;
#pragma omp parallel private(k) num_threads(4)
{
k = omp_get_thread_num();
printf("Hello World from %d !\n", k);
}
return 0;
}
In order to generate intermediate language from GCC v8.2.0, I compiled this program with the following command:
gcc -fopenmp -o hello.exe hello.c -fdump-tree-ompexp
And the result is given by:
;; Function main (main, funcdef_no=0, decl_uid=2694, cgraph_uid=0, symbol_order=0)
OMP region tree
bb 2: gimple_omp_parallel
bb 3: GIMPLE_OMP_RETURN
Added new low gimple function main._omp_fn.0 to callgraph
Introduced new external node (omp_get_thread_num/2).
Introduced new external node (printf/3).
;; Function main._omp_fn.0 (main._omp_fn.0, funcdef_no=1, decl_uid=2700, cgraph_uid=1, symbol_order=1)
main._omp_fn.0 (void * .omp_data_i)
{
int k;
<bb 6> :
<bb 3> :
k = omp_get_thread_num ();
printf ("Hello World from %d !\n", k);
return;
}
;; Function main (main, funcdef_no=0, decl_uid=2694, cgraph_uid=0, symbol_order=0)
Merging blocks 2 and 7
Merging blocks 2 and 4
main (int argc, char * * argv)
{
int k;
int D.2698;
<bb 2> :
k = 0;
__builtin_GOMP_parallel (main._omp_fn.0, 0B, 4, 0);
D.2698 = 0;
<bb 3> :
<L0>:
return D.2698;
}
The function call to "__builtin_GOMP_parallel" is what it interest me. So, I looked at the source code of the libGOMP from GCC.
However, the only function calls I found was (from parallel.c file):
GOMP_parallel_start (void (*fn) (void *), void *data, unsigned num_threads)
GOMP_parallel_end (void)
So, I can imiagine that, in a certain manner, the call to "__builtin_GOMP_parallel" is transformed to GOMP_parallel_start and GOMP_parallel_end.
How can I be sure of this assumption ? How can I found the translation from the builtin function to the two other ones I found in the source code ?
Thank you
You almost got it. __builtin_GOMP_parallel is just a compiler alias to GOMP_parallel (defined in omp-builtins.def) which is translated very late in compilation, you can see the actual call in the assembly with gcc -S.
GOMP_parallel is similar to
GOMP_parallel_start(...);
fn(...);
GOMP_parallel_end();
I'm having a issue with my kernel.cu class
Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:
kernel.cu(17): error: identifier "atomicAdd" is undefined
My code:
#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd
__global__ void dot (int *a, int *b, int *c){
__shared__ int temp[THREADS_PER_BLOCK];
int index = threadIdx.x + blockIdx.x * blockDim.x;
temp[threadIdx.x] = a[index] * b[index];
__syncthreads();
if( 0 == threadIdx.x ){
int sum = 0;
for( int i = 0; i<THREADS_PER_BLOCK; i++)
sum += temp[i];
atomicAdd(c, sum);
}
}
Some suggest?
You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:
nvcc -arch=sm_11 -v kernel.cu -o kernel.o
and see what happens.
EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.
Today with the latest cuda SDK and toolkit this solution will not work.
People also say that adding:
compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;
to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.
You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:
compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;
I am not the best when it comes to compiling/writing makefiles.
I am trying to write a program that uses both GSL and OpenMP.
I have no problem using GSL and OpenMP separately, but I'm having issues using both. For instance, I can compile the GSL program
http://www.gnu.org/software/gsl/manual/html_node/An-Example-Program.html
By typing
$gcc -c Bessel.c
$gcc Bessel.o -lgsl -lgslcblas -lm
$./a.out
and it works.
I was also able to compile the program that uses OpenMP that I found here:
Starting a thread for each inner loop in OpenMP
In this case I typed
$gcc -fopenmp test_omp.c
$./a.out
And I got what I wanted (all 4 threads I have were used).
However, when I simply write a program that combines the two codes
#include <stdio.h>
#include <gsl/gsl_sf_bessel.h>
#include <omp.h>
int
main (void)
{
double x = 5.0;
double y = gsl_sf_bessel_J0 (x);
printf ("J0(%g) = %.18e\n", x, y);
int dimension = 4;
int i = 0;
int j = 0;
#pragma omp parallel private(i, j)
for (i =0; i < dimension; i++)
for (j = 0; j < dimension; j++)
printf("i=%d, jjj=%d, thread = %d\n", i, j, omp_get_thread_num());
return 0;
}
Then I try to compile to typing
$gcc -c Bessel_omp_test.c
$gcc Bessel_omp_test.o -fopenmp -lgsl -lgslcblas -lm
$./a.out
The GSL part works (The Bessel function is computed), but only one thread is used for the OpenMP part. I'm not sure what's wrong here...
You missed the worksharing directive for in your OpenMP part. It should be:
// Just in case GSL modifies the number of threads
omp_set_num_threads(omp_get_max_threads());
omp_set_dynamic(0);
#pragma omp parallel for private(i, j)
for (i =0; i < dimension; i++)
for (j = 0; j < dimension; j++)
printf("i=%d, jjj=%d, thread = %d\n", i, j, omp_get_thread_num());
Edit: To summarise the discussion in the comments below, the OP failed to supply -fopenmp during the compilation phase. That prevented GCC from recognising the OpenMP directives and thus no paralle code was generated.
IMHO, it's incorrect to declare the variables i and j as shared. Try declaring them private. Otherwise, each thread would get the same j and j++ would generate a race condition among threads.