Why is my 980 TI outperforming my 1080?

Why is my 980 TI outperforming my 1080? - performance

Trying to make sure my new computer is setup properly, and noticed that its GeForce 1080 GPU is significantly underperforming the 980TI on my old system (when running Tensorflow jobs). Since the systems differ in more than just GPU, I wrote a small benchmark to isolate GPU matrix multiplication performance in TensorFlow. The results confirm that the new GPU is significantly slower. I know this is something to do with the software installed, but I've checked the obvious things: same python3, same cudnn, same numpy. What could be causing this strange performance gap?
Benchmarking Script:
import tensorflow as tf
import time
sess = tf.Session()
A = tf.random_uniform((1000,1000))
for i in range(int(1e3)):
A = (tf.matmul(A,A))
cur_time = time.clock()
sess.run(A)
print(time.clock()-cur_time)
Old System (980 Ti):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.19
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.39GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
time elapsed: 0.81484
New System (1080):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.898
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.57GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
time elapsed: 1.2753620000000003

Related

Tensorflow 2 can't find gpu

OS : Ubuntu 18.04
CPU : Intel i7-8core
GPU : Nvidia GTX 1060
RAM : 16 GB
I have installed tensorflow 2 in my anaconda virtual environment using the following command :
pip unistall tensorflow-gpu==2.0
The installation has no errors but when I use :
tf.test.is_gpu_available()
I get the following message :
I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-16 20:36:36.130788: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2019-11-16 20:36:36.131397: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bd440b6050 executing computations on platform Host. Devices:
2019-11-16 20:36:36.131414: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-11-16 20:36:36.133350: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-16 20:36:36.134418: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2019-11-16 20:36:36.134439: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: quartermaine
2019-11-16 20:36:36.134445: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: quartermaine
2019-11-16 20:36:36.134490: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.50.0
2019-11-16 20:36:36.134511: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.50.0
2019-11-16 20:36:36.134518: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.50.0
False
Which tells that there is no GPU available.

First try rebooting your system as there might be some pending updates.
Secondly try this :-
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

Xcode + OSX upgrade issue

ALL,
I hope someone here knows something about OSX and Xcode..
I have and had for a long time a Mac laptop with OSX 10.8. I did install the latest available Xcode for this version and was developing my program on it.
The compilation/linking process was working and I was able to successfully start the program from inside Xcode. And of course I was able to do debugging there.
Recently I bought myself a newer Mac laptop. I did install newer Xcode on it and since my program is located on GitHub cloned my repository there and tried to compile and run.
While compilation and linking were successful, executing failed, even before it even started (the failure was somewhere inside the Assembly code).
All libraries I'm using had been compiled with the same set of options. And compilation of my software was successful.
How do I find where/how to fix the issue?
I'm sure something inside Xcode changed which made the code generation fail. But how do I find what exactly?
Thank you.
[EDIT]
I just tried to run the program from the Terminal. Here is the output from the crash:
Process: dbhandler [11125]
Path: /Users/USER/*/dbhandler.app/Contents/MacOS/dbhandler
Identifier: abc.dbhandler
Version: 1.0 (1)
Code Type: X86-64 (Native)
Parent Process: ??? [1]
Responsible: dbhandler [11125]
User ID: 501
Date/Time: 2019-03-31 15:58:19.681 -0500
OS Version: Mac OS X 10.13.6 (17G65)
Report Version: 12
Anonymous UUID: AB9F6124-7868-5E43-BBB7-1A7D8A2DEF30
Sleep/Wake UUID: 2F231255-E0CA-47D4-8FD1-08C6F47A0627
Time Awake Since Boot: 58000 seconds
Time Since Wake: 350 seconds
System Integrity Protection: enabled
Crashed Thread: 0
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Termination Reason: DYLD, [0x1] Library missing
Application Specific Information:
dyld: launch, loading dependent libraries
Dyld Error Message:
Library not loaded: /usr/local/lib/liblibdbwindow.dylib
Referenced from: /Users/USER/*/dbhandler.app/Contents/MacOS/dbhandler
Reason: image not found
Binary Images:
0x109ff9000 - 0x10a02bff7 +abc.dbhandler (1.0 - 1) <F1A3B876-188B-3BCD-839F-D586CC3F400A> /Users/USER/*/dbhandler.app/Contents/MacOS/dbhandler
0x10a071000 - 0x10a113ff7 +libwx_osx_cocoau_xrc-3.1.1.0.0.dylib (0) <FCF4309C-08BD-3559-8E0F-47851DED7DE5> /Users/USER/*/libwx_osx_cocoau_xrc-3.1.1.0.0.dylib
0x10a1e1000 - 0x10a27cfff +libwx_osx_cocoau_html-3.1.1.0.0.dylib (0) <B24F932F-CA10-3FDF-8159-8E94046BA19D> /Users/USER/*/libwx_osx_cocoau_html-3.1.1.0.0.dylib
0x10a357000 - 0x10a378ff7 +libwx_osx_cocoau_qa-3.1.1.0.0.dylib (0) <F5CE0D9C-260C-3362-88CC-AE3D8D9F3119> /Users/USER/*/libwx_osx_cocoau_qa-3.1.1.0.0.dylib
0x10a3ad000 - 0x10a4d8fff +libwx_osx_cocoau_adv-3.1.1.0.0.dylib (0) <2AF1A22E-8DC9-30C3-898E-7BE762476889> /Users/USER/*/libwx_osx_cocoau_adv-3.1.1.0.0.dylib
0x10a6c3000 - 0x10abb9ff7 +libwx_osx_cocoau_core-3.1.1.0.0.dylib (0) <4EEDAE00-B05C-3D05-8488-7EC8F8F7A824> /Users/USER/*/libwx_osx_cocoau_core-3.1.1.0.0.dylib
0x10b0ad000 - 0x10b0c0fff +libwx_baseu_xml-3.1.1.0.0.dylib (0) <7CB5A2DA-A81D-3352-AE8D-D23ACBBC426D> /Users/USER/*/libwx_baseu_xml-3.1.1.0.0.dylib
0x10b0d7000 - 0x10b10cfff +libwx_baseu_net-3.1.1.0.0.dylib (0) <F758CDB6-A8A3-32A9-95D2-2155EBEB183B> /Users/USER/*/libwx_baseu_net-3.1.1.0.0.dylib
0x10b159000 - 0x10b348ff7 +libwx_baseu-3.1.1.0.0.dylib (0) <34B6DA7C-F404-3B1C-8EC9-F5FD781F0629> /Users/USER/*/libwx_baseu-3.1.1.0.0.dylib
0x10c57c000 - 0x10c5c6acf dyld (551.4) <8A72DE9C-A136-3506-AA02-4BA2B82DCAF3> /usr/lib/dyld
0x7fff51f5f000 - 0x7fff521edff7 com.apple.audio.toolbox.AudioToolbox (1.14 - 1.14) <E0B8B5D8-80A0-308B-ABD6-F8612102B5D8> /System/Library/Frameworks/AudioToolbox.framework/Versions/A/AudioToolbox
0x7fff528c1000 - 0x7fff528c1fff com.apple.Carbon (158 - 158) <F8B370D9-2103-3276-821D-ACC756167F86> /System/Library/Frameworks/Carbon.framework/Versions/A/Carbon
0x7fff52dd3000 - 0x7fff52dd3fff com.apple.Cocoa (6.11 - 22) <78E6C28E-4308-3D10-AD14-0CBCF6789B3F> /System/Library/Frameworks/Cocoa.framework/Versions/A/Cocoa
0x7fff55dc4000 - 0x7fff55e5ffff com.apple.framework.IOKit (2.0.2 - 1445.71.1) <2EA4F383-CAA9-3AF0-99C5-90C22ADAA6B6> /System/Library/Frameworks/IOKit.framework/Versions/A/IOKit
0x7fff5db0d000 - 0x7fff5db1cff3 com.apple.opengl (16.7.4 - 16.7.4) <9BDE8FF9-5418-3C70-8D1C-09656884CE48> /System/Library/Frameworks/OpenGL.framework/Versions/A/OpenGL
0x7fff7916b000 - 0x7fff7916cffb libSystem.B.dylib (1252.50.4) <CD555F3B-FDDB-35E5-A2FB-FBBF3D62031A> /usr/lib/libSystem.B.dylib
Model: MacBookAir7,2, BootROM MBA71.0178.B00, 2 processors, Intel Core i5, 1.8 GHz, 8 GB, SMC 2.27f2
Graphics: Intel HD Graphics 6000, Intel HD Graphics 6000, Built-In
Memory Module: BANK 0/DIMM0, 4 GB, DDR3, 1600 MHz, 0x802C, 0x4D5435324C3531324D3332443250462D3130
Memory Module: BANK 1/DIMM0, 4 GB, DDR3, 1600 MHz, 0x802C, 0x4D5435324C3531324D3332443250462D3130
AirPort: spairport_wireless_card_type_airport_extreme (0x14E4, 0x117), Broadcom BCM43xx 1.0 (7.77.37.31.1a9)
Bluetooth: Version 6.0.7f10, 3 services, 27 devices, 1 incoming serial ports
Network Service: Wi-Fi, AirPort, en0
Serial ATA Device: APPLE SSD SM0128G, 121.33 GB
USB Device: USB 3.0 Bus
USB Device: BRCM20702 Hub
USB Device: Bluetooth USB Host Controller
Thunderbolt Bus: MacBook Air, Apple Inc., 27.2
The error said the the library can't be found.
This library is coming from my projectand it is written by me. But I don't see a way in the Xcode to install the project.
On top of that everything on the old Mac works out of the box.
Does this an indication of 32/64-bit incompatibility?
[/EDIT]

Something that might help is:
do you get some message on the output / issue navigator?
what type of program are you writing?
since this was a new install, did you also installed the command line tools?
if it is an iOS program do you have the right components?

Intel MPI benchmark fails when # bytes > 128: IMB-EXT

I just installed Linux and Intel MPI to two machines:
(1) Quite old (~8 years old) SuperMicro server, which has 24 cores (Intel Xeon X7542 X 4). 32 GB memory.
OS: CentOS 7.5
(2) New HP ProLiant DL380 server, which has 32 cores (Intel Xeon Gold 6130 X 2). 64 GB memory.
OS: OpenSUSE Leap 15
After installing OS and Intel MPI, I compiled intel MPI benchmark and ran it:
$ mpirun -np 4 ./IMB-EXT
It is quite surprising that I find the same error when running IMB-EXT and IMB-RMA, though I have a different OS and everything (even GCC version used to compile Intel MPI benchmark is different -- in CentOS, I used GCC 6.5.0, and in OpenSUSE, I used GCC 7.3.1).
On the CentOS machine, I get:
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.05 0.00
4 1000 30.56 0.13
8 1000 31.53 0.25
16 1000 30.99 0.52
32 1000 30.93 1.03
64 1000 30.30 2.11
128 1000 30.31 4.22
and on the OpenSUSE machine, I get
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.04 0.00
4 1000 14.40 0.28
8 1000 14.04 0.57
16 1000 14.10 1.13
32 1000 13.96 2.29
64 1000 13.98 4.58
128 1000 14.08 9.09
When I don't use mpirun (which means there is only one process to run IMB-EXT), the benchmark runs through, but Unidir_Put needs >=2 processes, so doesn't help so much, and I also find that the functions with MPI_Put and MPI_Get is extremely slower than I expected (from my experience). Also, using MVAPICH on the OpenSUSE machine did not help. The output is:
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 6 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.03 0.00
4 1000 17.37 0.23
8 1000 17.08 0.47
16 1000 17.23 0.93
32 1000 17.56 1.82
64 1000 17.06 3.75
128 1000 17.20 7.44
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 49213 RUNNING AT iron-0-1
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
update: I tested OpenMPI, and it goes through smoothly (although my application does not recommend using openmpi, and I still don't understand why Intel MPI or MVAPICH doesn't work...)
#---------------------------------------------------
# Benchmarking Unidir_Put
# #processes = 2
# ( 2 additional processes waiting in MPI_Barrier)
#---------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t[usec] Mbytes/sec
0 1000 0.06 0.00
4 1000 0.23 17.44
8 1000 0.22 35.82
16 1000 0.22 72.36
32 1000 0.22 144.98
64 1000 0.22 285.76
128 1000 0.30 430.29
256 1000 0.39 650.78
512 1000 0.51 1008.31
1024 1000 0.84 1214.42
2048 1000 1.86 1100.29
4096 1000 7.31 560.59
8192 1000 15.24 537.67
16384 1000 15.39 1064.82
32768 1000 15.70 2086.51
65536 640 12.31 5324.63
131072 320 10.24 12795.03
262144 160 12.49 20993.49
524288 80 30.21 17356.93
1048576 40 81.20 12913.67
2097152 20 199.20 10527.72
4194304 10 394.02 10644.77
Is there any chance that I am missing something in installing MPI, or installing OS in these servers? Actually, I assume that OS is the problem, but not sure where to start...
Thanks a lot in advance,
Jae

Although this question is well written, you were not explicit about
Intel MPI benchmark (please add header)
Intel MPI
Open MPI
MVAPICH
supported host network fabrics - for each MPI distribution
selected fabric while running MPI benchmark
Compilation settings
Debugging this kind of trouble with disparate host machines, multiple Linux distributions and compiler versions can be quite hard. Remote debugging on StackOverflow is even harder.
First of all ensure reproducibility. This seems to be the case. One of many debugging approaches, the one I would recommend, is to reduce complexity of the system as a whole, test smaller sub-systems and start shifting responsibility to third parties. You may replace self-compiled executables with software packages provided by distribution software/package repositories or third parties like Conda.
Intel recently started to provide its libraries through YUM/APT repos as well as for Conda and PyPI. I found that helps a lot with reproducible deployments of HPC clusters and even runtime/development environments. I recommend to use it for CentOS 7.5.
YUM/APT repository for Intel MKL, Intel IPP, Intel DAAL, and Intel® Distribution for Python* (for Linux*):
Installing Intel® Performance Libraries and Intel® Distribution for Python* Using YUM Repository
Installing Intel® Performance Libraries and Intel® Distribution for Python* Using APT Repository
Conda* package/ Anaconda Cloud* support (Intel MKL, Intel IPP, Intel DAAL, Intel Distribution for Python):
Installing Intel Distribution for Python and Intel Performance Libraries with Anaconda
Available Intel packages can be viewed here
Install from the Python Package Index (PyPI) using pip (Intel MKL, Intel IPP, Intel DAAL)
Installing the Intel® Distribution for Python* and Intel® Performance Libraries with pip and PyPI
I do not know much about OpenSUSE Leap 15.

AT91SAM9263ek booting Linux with Device Tree failed

I have problem booting Linux 3.16.1. I have compiled sources from http://www.kernel.org with at91sam9263_defconfig.
I have added Flattened Device Tree support in Boot options.
Followin tips suggested in this (https://www.slideshare.net/softpapa/device-tree-support-on-arm-linux-8930303) presentation to turn on Support device tree in /proc but i don't have that option in menuconfig.
I have U-Boot bootloader version 2014.10rc2 which supports device tree.
I have generated dtb from script shipped with kernel:
make at91sam9263ek.dtb
And now i'm getting this error:
Welcome to minicom 2.5
OPTIONS: I18n
Compiled on Feb 9 2011, 14:45:00.
Port /dev/ttyS0
Press CTRL-A Z for help on special keys
RomBOOT
>
U-Boot 2014.10-rc2-00200-g9170818-dirty (Sep 23 2014 - 15:16:39)
CPU: AT91SAM9263
Crystal frequency: 16.368 MHz
CPU clock : 199.919 MHz
Master clock : 99.960 MHz
DRAM: 64 MiB
WARNING: Caches not enabled
NAND: 256 MiB
MMC: mci: 0
In: serial
Out: serial
Err: serial
Net: macb0
Warning: Your board does not use generic board. Please read
doc/README.generic-board and take action. Boards not
upgraded by the late 2014 may break or be removed.
Hit any key to stop autoboot: 0
U-Boot> tftp uImage
macb0: Starting autonegotiation...
macb0: Autonegotiation complete
macb0: link up, 100Mbps full-duplex (lpa: 0xcde1)
Using macb0 device
TFTP from server 192.168.1.247; our IP address is 192.168.1.240
Filename 'uImage'.
Load address: 0x22000000
Loading: #################################################################
#################################################################
#################################################################
##############
1.2 MiB/s
done
Bytes transferred = 3068016 (2ed070 hex)
U-Boot> tftp 20000000 dt
macb0: link up, 100Mbps full-duplex (lpa: 0xcde1)
Using macb0 device
TFTP from server 192.168.1.247; our IP address is 192.168.1.240
Filename 'dt'.
Load address: 0x20000000
Loading: #
340.8 KiB/s
done
Bytes transferred = 13279 (33df hex)
U-Boot> bootm 22000000 - 20000000
## Booting kernel from Legacy Image at 22000000 ...
Image Name: Linux-3.16.1
Image Type: ARM Linux Kernel Image (uncompressed)
Data Size: 3067952 Bytes = 2.9 MiB
Load Address: 20008000
Entry Point: 20008000
Verifying Checksum ... OK
## Flattened Device Tree blob at 20000000
Booting using the fdt blob at 0x20000000
Loading Kernel Image ... OK
Loading Device Tree to 23ea3000, end 23ea93de ... OK
Starting kernel ...
Uncompressing Linux... done, booting the kernel.
Error: unrecognized/unsupported device tree compatible list:
[ 'atmel,at91sam9263ek' 'atmel,at91sam9263' 'atmel,at91sam9' ]
Available machine support:
ID (hex) NAME
000004b2 Atmel AT91SAM9263-EK
Please check your kernel config and/or bootloader.

Solution:
Add this line to .config:
CONFIG_MACH_AT91SAM9_DT=y

The correct configuration for this board when using device tree is at91_dt_defconfig.
However, I am quite surprised to see someone trying to use such an old kernel. This board is fully supported upstream. Why don't you use v5.3? If this doesn't work, please report any bug, we will be happy to help correct them.

# of OpenCL devices on 2012 Macbook pro

I'm writing an openCL program on a mid 2012 13" macbook pro with the following specs:
Processor: 2.9 GHz Intel Core i7
Graphics: Intel HD Graphics 4000
In my program I do the following to check how many devices I have access to:
// get first platform
cl_platform_id platform;
err = clGetPlatformIDs(1, &platform, NULL);
// get device count
cl_uint gpuCount;
err = clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 0, NULL, &gpuCount);
cl_uint cpuCount;
err |= clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 0, NULL, &cpuCount);
std::cout<<"NUM CPUS: "<<cpuCount<<" NUM GPUS: "<<gpuCount<<std::endl;
After execution, my program states that I have only one CPU and zero GPUs.
How can that be? Is openCL not compatible with Intel HD Graphics 4000 card? And I thought my computer had a dual core processor. So shouldn't there be 2 CPUs and 1 GPU?
Or am I simply not fetching the data correctly?
EDIT: I have found the issue. After upgrading my OS to Mavericks (was previously running Mountain Lion), openCL now recognizes my graphics card as a valid device.

Your processor has multiple cores, which are recognized as Compute Units. Run following code snippet & check that number of CU is as expected:
cl_device_id device;
cl_uint max_compute_units;
cl_int ret = clGetDeviceInfo(device, CL_DEVICE_MAX_COMPUTE_UNITS, sizeof(cl_uint), &max_compute_units, NULL);
printf("Number of computing units: %u\n", max_compute_units);

This doesn't answer your question, (and doesn't need downvoting please) but hopefully will help you work out what you have actually got installed. I would have posted it as a comment, but the formatting there would be useless.
If you want a legible list of your installed CPU and Graphics equipment, the following command does it nicely:
system_profiler | awk '/^Hardware/ || /^Graphics/{p=1;print;next} /^[A-Za-z]/{p=0} p'
Graphics/Displays:
AMD Radeon HD 6970M:
Chipset Model: AMD Radeon HD 6970M
Type: GPU
Bus: PCIe
PCIe Lane Width: x16
VRAM (Total): 1024 MB
Vendor: ATI (0x1002)
Device ID: 0x6720
Revision ID: 0x0000
ROM Revision: 113-C2960H-203
EFI Driver Version: 01.00.560
Displays:
iMac:
Display Type: LCD
Resolution: 2560 x 1440
Pixel Depth: 32-Bit Color (ARGB8888)
Main Display: Yes
Mirror: Off
Online: Yes
Built-In: Yes
Hardware:
Hardware Overview:
Model Name: iMac
Model Identifier: iMac12,2
Processor Name: Intel Core i7
Processor Speed: 3.4 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 8 MB
Memory: 16 GB
Boot ROM Version: IM121.0047.B1F
SMC Version (system): 1.72f2
Serial Number (system): DGKH90PWDHJW
Hardware UUID: 1025AC04-9F8E-5342-9EF4-XXXXXXXXXXXXX
And also this for the actual CPU details:
sysctl -a | grep "brand_string"
machdep.cpu.brand_string: Intel(R) Core(TM) i7-2600 CPU # 3.40GHz
And this for OpenCL version:
system_profiler | grep -A 11 OpenCL:
OpenCL:
Version: 2.3.59
Obtained from: Apple
Last Modified: 19/09/2014 10:28
Kind: Intel
64-Bit (Intel): Yes
Signed by: Software Signing, Apple Code Signing Certification Authority, Apple Root CA
Get Info String: 2.3.59, Copyright 2008-2013 Apple Inc.
Location: /System/Library/Frameworks/OpenCL.framework
Private: No
P.S. If there is a better way to provide additonal, useful information (which is not really a proper answer) on SO than this, please let me know.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Why is my 980 TI outperforming my 1080? - performance

Related

Tensorflow 2 can't find gpu

Xcode + OSX upgrade issue

Intel MPI benchmark fails when # bytes > 128: IMB-EXT

AT91SAM9263ek booting Linux with Device Tree failed

# of OpenCL devices on 2012 Macbook pro

Categories

Resources