glibc set __TIMESIZE - linux-kernel

I'm trying to port my 32 bit ARM architecture to 64 bit time values.
Reading the answers from 64-bit time_t in Linux Kernel it tells me the following:
All user space must be compiled with a 64-bit time_t, which will be supported in the coming musl-1.2 and glibc-2.32 releases, along with installed kernel headers from linux-5.6 or higher.
I'm using a custom Linux kernel version 5.10.10 and build my own gcc toolchain with crosstool-ng using glibc version 2.32.
To test the time sizes, I wrote a simple print:
printf("Timesize = %d\n", __TIMESIZE);
printf("sizeof time_t = %d\n", sizeof(tv.tv_sec));
Which gives me the output:
Timesize = 32
sizeof time_t = 4
Following the defines and typedefs in glibc-2.32, I can see the relevant types defined as follows:
#define __TIMESIZE __WORDSIZE
#define __TIME_T_TYPE __SLONGWORD_TYPE
#define __SLONGWORD_TYPE long int
But __WORDSIZE and __SLONGWORD_TYPE are both 32 bit in size for my architecture.
Is it possible to have time_t defined as 64 bit on my 32 bit architecture target?

Related

Simple OpenMP program for detecting GPU and other devices does not work

I'm trying to write a very simple example with OpenMP using GCC 12 as compiler and the target clause in order to detect all my devices, but it does not work. First of all, I have a Lenovo W540 running Debian Sid with an integrated Inted HD Graphics 4600 (which is the GPU the GUI -plasma- is using). Also, there is a NVIDIA GK106GLM [Quadro K2100M]. I have installed the xserver-xorg-video-nvidia-legacy-390xx, nvidia-legacy-390xx-driver, and nvidia-legacy-390xx-kernel-dkms packages, as they are the corresponding drivers for my GPU. Running the lspci -k | grep -A 2 -i "VGA" command I obtain
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
Subsystem: Lenovo 4th Gen Core Processor Integrated Graphics Controller
Kernel driver in use: i915
--
01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1)
Subsystem: Lenovo GK106GLM [Quadro K2100M]
Kernel modules: nvidia
so I suppose that the NVDIA GPU is well configured and detected. Am I right?
Then, I borrowed this simply code from https://enccs.github.io/openmp-gpu/target/
/* Copyright (c) 2019 CSC Training */
/* Copyright (c) 2021 ENCCS */
#include <stdio.h>
#ifdef _OPENMP
#include <omp.h>
#endif
int main()
{
int num_devices = omp_get_num_devices();
printf("Number of available devices %d\n", num_devices);
#pragma omp target
{
if (omp_is_initial_device()) {
printf("Running on host\n");
} else {
int nteams= omp_get_num_teams();
int nthreads= omp_get_num_threads();
printf("Running on device with %d teams in total and %d threads in each team\n",nteams,nthreads);
}
}
}
Compilation with GCC as gcc -Wall -fopenmp example.c -o example works fine and does not produce warnings nor errors, but at execution time I obtain simply
Number of available devices 0
Running on host
so not Inter nor NVIDIA GPUs were detected. Inspecting the Debian repositories I've seen and installed the packages gcc-offload-nvptx, gcc-12-offload-nvptx, libgomp-plugin-nvptx1, and nvptx-tools. Now, when I compile again as gcc -Wall -fopenmp example.c -o example I obtain a warning:
/usr/bin/ld: /tmp/ccrWP8gn.crtoffloadtable.o: warning: relocation against `__offload_vars_end' in read-only section `.rodata'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
When I execute the code I obtain again
Number of available devices 0
Running on host
but in this case the execution is terrubly slow (various seconds of time).
Searching the web I've seen that I must add the option -foffload=nvptx-none to the compilation order, but I obtain the same results as previously (this option only is recognized if gcc-offload-nvptx et al. are installed.
Running gcc -v I can see that GCC 12 in Debian is configured for offloading:
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
I don't know which number is my NVIDIA card, but I've tried export OFFLOAD_TARGET_DEFAULT=0 and export OFFLOAD_TARGET_DEFAULT=2 with the same wrong result of
Number of available devices 0
Running on host
So, how can I run my OpenMP code in the GPU?
Thanks

Is my OpenACC code compiled with GCC running on GPU?

I'm trying to write a very simple example with OpenACC using GCC as compiler, but I'm not sure if the program runs on the GPU. First of all, I have a Lenovo W540 running Debian Sid with an integrated Inted HD Graphics 4600 (which is the GPU the GUI -plasma- is using). Also, there is a NVIDIA GK106GLM [Quadro K2100M]. I have installed the xserver-xorg-video-nvidia-legacy-390xx, nvidia-legacy-390xx-driver, and nvidia-legacy-390xx-kernel-dkms packages, as they are the corresponding drivers for my GPU. Running the lspci -k | grep -A 2 -i "VGA" command I obtain
00:02.0 VGA compatible controller: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller (rev 06)
Subsystem: Lenovo 4th Gen Core Processor Integrated Graphics Controller
Kernel driver in use: i915
--
01:00.0 VGA compatible controller: NVIDIA Corporation GK106GLM [Quadro K2100M] (rev a1)
Subsystem: Lenovo GK106GLM [Quadro K2100M]
Kernel modules: nvidia
so I suppose that the NVDIA GPU is well configured and detected. Am I right?
On the other hand, I've installed GCC 12 and also the packages gcc-offload-nvptx. First question: is it neccessary such package in order to run programs on the GPU?
Well, I wrote this very simple code
#include <stdio.h>
#ifdef _OPENACC
#include <openacc.h>
#endif
#define N 10
int main()
{
int i=0,buf[N];
#ifdef _OPENACC
printf("Number of devices: %d\n",acc_get_num_devices(acc_device_nvidia));
#endif
#pragma acc parallel loop copyout(buf[0:N])
for(i=0;i<N;i++)
{
buf[i] = i;
}
for(i=0;i<N;i++)
{
printf("%d ",buf[i]);
}
printf("\n");
return 0;
}
If I compile as gcc -Wall -fopenacc example.c -o example I obtain a warning:
/usr/bin/ld: /tmp/cckzNCnl.crtoffloadtable.o: warning: relocation against `__offload_vars_end' in read-only section `.rodata'
/usr/bin/ld: warning: creating DT_TEXTREL in a PIE
but I can't interpret it. If I run the code, it works, but is very, ver slow and I obtain
Number of devices: 0
0 1 2 3 4 5 6 7 8 9
The result is correct and I could think that it was executed on the GPU as it is extremely slow (maybe due to data copy?), but I obtain the result Number of devices: 0
Can someone gime me any hint about how to compile and run the code on the GPU?

Unknown type name ‘off64_t’

I have a problem using Apache Portable Runtime on Ubuntu with GCC 4.8.1
The problem is that the off64_t from <sys/types.h> is not available when compiling with gcc. (When compiling with g++ everything work fine)
Does anybody know which compiler switch to use to enable off64_t? (I know that defining _LARGEFILE_SOURCE _LARGEFILE64_SOURCE avoids the problem, but wondering if this is the right way)
To reproduce the error one can simply try to compile the following code:
#include <sys/types.h>
off64_t a_variable;
off64_t is not a language defined type. No compiler switch will make it available.
It is defined in sys/types.h, but (on a 32 bit system) only if
_LARGEFILE64_SOURCE is defined
Which will make the 64 bit interfaces available (off64_t, lseek64(), etc...).
The 32 bit interfaces will still be available by their original names.
_FILE_OFFSET_BITS is defined as '64'
Which will make the names of the (otherwise 32 bit) functions and data types refer to their 64 bit counterparts.
off_t will be off64_t, lseek() will use lseek64(), and so on...
The 32 bit interface is no longer available.
Make sure that if you define these macros anywhere in your program, you define them at the beginning of all your source files. You don't want ODR violations to be biting you in the ass.
Note, this is for a 32 bit system, where off_t is normally a 32 bit value.
On a 64 bit system, the interface is already 64 bits wide, you don't need to use these macros to get the large file support.
off_t is a 64 bit type, lseek() expects a 64 bit offset, and so on.
Additionally, the types and functions with 64 in their name aren't defined, there's no point.
See http://linux.die.net/man/7/feature_test_macros
and http://en.wikipedia.org/wiki/Large_file_support
You also may be interested to know that when using g++, _GNU_SOURCE is automatically defined, which (with the gnu c runtime library) leads to _LARGEFILE64_SOURCE being defiend. That is why compiling your test program with g++ makes off64_t visible. I assume APR uses the same logic in making _LARGEFILE64_SOURCE defined.
Redefine off64_t to __off64_t in your compile flag. Edit your Makefile so it contains:
CFLAGS= -Doff64_t=__off64_t
then, just run $ make 1 (assuming you have 1.c in your directory)
A bit late, but still current.
I simply add -Doff64_t=_off64_t to the compiler flags.
In my environment gcc version 4.1.2, I need to define __USE_LARGEFILE64. I found this macro from /usr/include/unistd.h who defines lseek64()
#define __USE_LARGEFILE64
#include <sys/types.h>
#include <unistd.h>
You should define $C_INCLUDE_PATH to point to linux headers, something like
export C_INCLUDE_PATH=/usr/include/x86_64-linux-gnu
To install linux header, use
sudo apt-get install linux-headers-`uname -r`
P.S.
$ cat 1.c
#include <sys/types.h>
off64_t a_variable;
int main(){return 0;}
$ gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
$ echo $C_INCLUDE_PATH
/usr/include/x86_64-linux-gnu
$ grep off64_t /usr/include/x86_64-linux-gnu/sys/types.h
typedef __off64_t off_t;
#if defined __USE_LARGEFILE64 && !defined __off64_t_defined
typedef __off64_t off64_t;
# define __off64_t_defined
Sorry for the lateness but I did never had to embed perl code in C programs untill today ^^
I solved the issue in Unix/Linux systems (I think it is possible to create such feature in Windows since Vista) by creating a symbolic link pointing to the CORE folder of perl version...
ln -s $(perl -MConfig -e 'print $Config{archlib}')/CORE /usr/include/perl
In your project file, source code, simply add:
#include <perl/EXTERN.h>
#include <perl/perl.h>
...and I came from long list of notes and errors related to off_t and off64_t to a clean build result ^^
Also late to the party, but the main reason for receiving this issue was installing the 64-bit version of MinGW instead of 32-bit:
https://sourceforge.net/projects/mingw/

Compiling CUDA with dynamic parallelism fallback - multiple architectures/compute capability

In one application, I've got a bunch of CUDA kernels. Some use dynamic parallelism and some don't. For the purposes of either providing a fallback option if this is not supported, or simply allowing the application to continue but with reduced/partially available features, how can I go about compiling?
At the moment I'm getting invalid device function when running kernels compiled with -arch=sm_35 on a 670 (max sm_30) that don't require compute 3.5.
AFAIK you can't use multiple -arch=sm_* arguments and using multiple -gencode=* doesn't help. Also for separable compilation I've had to create an additional object file using -dlink, but this doesn't get created when using compute 3.0 (nvlink fatal : no candidate found in fatbinary due to -lcudadevrt, which I've needed for 3.5), how should I deal with this?
I believe this issue has been addressed now in CUDA 6.
Here's my simple test:
$ cat t264.cu
#include <stdio.h>
__global__ void kernel1(){
printf("Hello from DP Kernel\n");
}
__global__ void kernel2(){
#if __CUDA_ARCH__ >= 350
kernel1<<<1,1>>>();
#else
printf("Hello from non-DP Kernel\n");
#endif
}
int main(){
kernel2<<<1,1>>>();
cudaDeviceSynchronize();
return 0;
}
$ nvcc -O3 -gencode arch=compute_20,code=sm_20 -gencode arch=compute_35,code=sm_35 -rdc=true -o t264 t264.cu -lcudadevrt
$ CUDA_VISIBLE_DEVICES="0" ./t264
Hello from non-DP Kernel
$ CUDA_VISIBLE_DEVICES="1" ./t264
Hello from DP Kernel
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Sat_Jan_25_17:33:19_PST_2014
Cuda compilation tools, release 6.0, V6.0.1
$
In my case, device 0 is a Quadro5000, a cc 2.0 device, and device 1 is a GeForce GT 640, a cc 3.5 device.
I don't believe there is a way to do this using the runtime API as of CUDA 5.5.
The only way I can think of to get around the problem is to use the driver API to perform your own architecture selection and load code from different cubin files at runtime. The APIs can be safely mixed, so it is only the context establishment-device selection-module load phase which needs to be done with the driver API. You can use the runtime API after that - you will need a little bit of homemade syntactic sugar for the kernel launches, but otherwise no code changes are required in other runtime API code.

How could I know the version of gcc is 64-bit or 32-bit?

I am using windows 7 64-bit. I don't know the gcc installed on this computer is 32-bit or 64-bit. (Windows 7 support both 32- and 64-bit programs).
You can inspect the output of gcc -v or you can use the more direct option -dumpmachine. The first option allows you to discover if GCC is capable of multilib (so that it can compile both 32 and 64-bit binaries), the second option will only return the default target (if I am not mistaken).
write a c code as follows:
#include<stdio.h>
#include<stdlib.h>
void main(){
int*pointer;
printf("%d", sizeof(pointer));
}
then compile and run this
if the output shows 8, then the compiler version is 64 bit
else if the output shows 4, then the compiler version is 32 bit
the size of the c pointer is equal to the compiler version
8 means 8 bytes= 64bit
4 means 4 bytes= 32bit

Resources