Tensorflow 2 can't find gpu - installation

OS : Ubuntu 18.04
CPU : Intel i7-8core
GPU : Nvidia GTX 1060
RAM : 16 GB
I have installed tensorflow 2 in my anaconda virtual environment using the following command :
pip unistall tensorflow-gpu==2.0
The installation has no errors but when I use :
tf.test.is_gpu_available()
I get the following message :
I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-16 20:36:36.130788: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2808000000 Hz
2019-11-16 20:36:36.131397: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55bd440b6050 executing computations on platform Host. Devices:
2019-11-16 20:36:36.131414: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
2019-11-16 20:36:36.133350: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2019-11-16 20:36:36.134418: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
2019-11-16 20:36:36.134439: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:169] retrieving CUDA diagnostic information for host: quartermaine
2019-11-16 20:36:36.134445: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:176] hostname: quartermaine
2019-11-16 20:36:36.134490: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:200] libcuda reported version is: 430.50.0
2019-11-16 20:36:36.134511: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:204] kernel reported version is: 430.50.0
2019-11-16 20:36:36.134518: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:310] kernel version seems to match DSO: 430.50.0
False
Which tells that there is no GPU available.

First try rebooting your system as there might be some pending updates.
Secondly try this :-
import os
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

Related

Error: no device found Error: unable to open ftdi device with vid 0403, pid 6010, description '*' When trying to debug ESP32-S3 on macOS

Trying to debug ESP32-S3 with PlatformIO on VSCode with macOS on M1.
Installed ftdi drivers from their website. (installed the VCP drivers, not the D3XX ones as I couldn't find a way to compile and install them).
As ESP32-S3 has an internal debugger, I just created a USB that connects D-/D+ pins to the board gpio 19 and 20 (and grd). BTW, when I connect it to the macbook, I dont see any additional port under /dev/*
Getting the following error, regardless of my platform.ini configuration.
http://openocd.org/doc/doxygen/bugs.html
adapter speed: 20000 kHz
adapter speed: 5000 kHz
Info : tcl server disabled
Info : telnet server disabled
Error: no device found
Error: unable to open ftdi device with vid 0403, pid 6010, description '*', serial '*' at bus location '*'
Error: no device found
Error: unable to open ftdi device with vid 0403, pid 6014, description '*', serial '*' at bus location '*'
.pioinit:11: Error in sourced command file:
Remote connection closed
My platformio.ini:
[env:esp32-s3-devkitc-1]
platform = espressif32
board = esp32-s3-devkitc-1
framework = arduino
upload_port = /dev/cu.wchusbserial553C0085431
monitor_speed=115200
build_type = debug
debug_init_break = tbreak setup
;debug_tool = esp-builtin
debug_tool = esp-prog
Removed and installed the ftdi drivers.
Got a similar error when trying with ESP-IDF.
Any thoughts?
If you are connecting the USB conector directly on esp32s3 module, you should try to change the board parameter from esp32-s3-devkitc-1 to esp32s3-builtin. This way you specify that you are using the built-in debugger.

Can't debug on vscode PlatformIO

I'm trying to emulate and debug a simple main.c project for HiFive, with Freedom E SDK, in vscode + PlatformIO, on Ubuntu. It builds but, when I try to start a debug session I get:
Reading symbols from /home/dan/Documents/PlatformIO/Projects/qwe/.pio/build/hifive1/firmware.elf...
PlatformIO Unified Debugger -> ...
PlatformIO: debug_tool = ftdi
PlatformIO: Initializing remote target...
Open On-Chip Debugger 0.10.0+dev (SiFive OpenOCD 0.10.0-2019.08.2)
Licensed under GNU GPL v2
For bug reports:
https://github.com/sifive/freedom-tools/issues
adapter speed: 10000 kHz
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Error: no device found
Error: unable to open ftdi device with vid 0403, pid 6010, description 'Dual RS232-HS', serial '*' at bus location '*'
.pioinit:11: Error in sourced command file:
Remote communication error. Target disconnected.: Connection reset by peer.
It seems it doesn't find a device, which makes sense because there isn't any, but I want to simulate de execution. How can I do that?

Tensorflow2 not working on Ubuntu 18.04. Probably incorrect installation, but what?

I installed Tensorflow2.1 on Ubuntu 18.04.3 with cuda 10.1, cudnn7.6.5.32, Nvidia driver 430.5.
I could not follow the instructions on the tensorflow site properly as many parts do not work, but, after many hours, I did finally get all the components installed. When I try to run a 20 line mnist example I get:
2020-02-19 03:02:24.915143: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: GeForce GTX 1080 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 28 deviceMemorySize: 10.91GiB deviceMemoryBandwidth: 451.17GiB/s
2020-02-19 03:02:24.915194: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-19 03:02:24.915216: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-19 03:02:24.915234: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-02-19 03:02:24.915253: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-02-19 03:02:24.915271: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-02-19 03:02:24.915289: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-02-19 03:02:24.915308: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-02-19 03:02:24.917997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1697] Adding visible gpu devices: 0
2020-02-19 03:02:24.918060: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-02-19 03:02:24.920974: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-02-19 03:02:24.921000: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] 0
2020-02-19 03:02:24.921013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0: N
2020-02-19 03:02:24.924091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1241] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10258 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Train on 60000 samples
Epoch 1/5
2020-02-19 03:02:26.155747: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-02-19 03:02:26.156063: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156110: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156225: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156253: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.156483: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.158110: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
2020-02-19 03:02:26.158133: W tensorflow/stream_executor/stream.cc:2041] attempting to perform BLAS operation using StreamExecutor without BLAS support
2020-02-19 03:02:26.158158: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Internal: Blas GEMM launch failed : a.shape=(32, 784), b.shape=(784, 128), m=32, n=128, k=784
[[{{node sequential/dense/MatMul}}]]
I understand that this error probably means there is an error in the installation, but how can I determine what ? Is there ANY way to determine which version of cudnn is used ?
I have googled extensively and there are loads of people with the same problem, but no solutions.
I spent 2 days trying to get this rubbish to work. In the end I'll never know why it finally started to work. I now know that everything I did was correct in principle but, although the install apparently worked, a simple mnist example failed with CUBLAS_STATUS_NOT_INITIALIZED.
When I finally got it to work, I:
Removed every Cuda related package before I started.
Followed the procedure here. It is way clearer than the official docs.
At step 4) installing cuda 10.1 I executed:
sudo apt-get install cuda-10-1
instead of:
sudo apt-get install cuda
This ensures that cuda is not automatically upgrade to the most recent version (10.2) at time of writing.

Why is my 980 TI outperforming my 1080?

Trying to make sure my new computer is setup properly, and noticed that its GeForce 1080 GPU is significantly underperforming the 980TI on my old system (when running Tensorflow jobs). Since the systems differ in more than just GPU, I wrote a small benchmark to isolate GPU matrix multiplication performance in TensorFlow. The results confirm that the new GPU is significantly slower. I know this is something to do with the software installed, but I've checked the obvious things: same python3, same cudnn, same numpy. What could be causing this strange performance gap?
Benchmarking Script:
import tensorflow as tf
import time
sess = tf.Session()
A = tf.random_uniform((1000,1000))
for i in range(int(1e3)):
A = (tf.matmul(A,A))
cur_time = time.clock()
sess.run(A)
print(time.clock()-cur_time)
Old System (980 Ti):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:937] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 980 Ti
major: 5 minor: 2 memoryClockRate (GHz) 1.19
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.39GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 980 Ti, pci bus id: 0000:01:00.0)
time elapsed: 0.81484
New System (1080):
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.898
pciBusID 0000:03:00.0
Total memory: 7.92GiB
Free memory: 7.57GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:03:00.0)
time elapsed: 1.2753620000000003

VisualGDB with STM32L476RG Nucleo

I setup Visual Studio 2015 with VisualGDB, and setup an LED Blink project using the HAL, as described in this example: http://visualgdb.com/tutorials/arm/stm32/stm32l4/
The tools installed correctly, and the my project follows the example exactly, including Step 6, and then up through Step 7. However, after setting a breakpoint and attempting to run to it per Step 8, I get the following in the Output window:
Open On-Chip Debugger 0.9.0 (2015-10-08-15:57)
Licensed under GNU GPL v2
For bug reports, read
http://openocd.org/doc/doxygen/bugs.html
Info : auto-selecting first available session transport "hla_swd". To override use 'transport select <transport>'.
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
adapter speed: 500 kHz
adapter_nsrst_delay: 100
none separate
Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : clock speed 480 kHz
Info : STLINK v2 JTAG v24 API v2 SWIM v10 VID 0x0483 PID 0x374B
Info : using stlink api v2
Info : Target voltage: 3.263434
Info : stm32l4x.cpu: hardware has 6 breakpoints, 4 watchpoints
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0800027c msp: 0x20020000
adapter speed: 4000 kHz
Info : accepting 'gdb' connection on tcp/3333
Info : device id = 0x10076415
Info : flash size = 1024kbytes
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0800027c msp: 0x20020000
adapter speed: 4000 kHz
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0800027c msp: 0x20020000
adapter speed: 4000 kHz
Warn : Padding 4 bytes to keep 8-byte write size
target state: halted
target halted due to breakpoint, current mode: Thread
xPSR: 0x61000000 pc: 0x2000004a msp: 0x20020000
Warn : block write succeeded
target state: halted
target halted due to debug-request, current mode: Thread
xPSR: 0x01000000 pc: 0x0800027c msp: 0x20020000
Error: Memory write failure!
At the same time, I get a dialog pop-up that states: "The memory location used for the stack is not writable. Please check the device type and the linker script. You can disable automatic stack checking via VisualGDB Project Properties"
Looking in the .map file that was generated during the build, there is nothing near 0x61000000 or anything at 0x01000000. There is a _estack = 0x20020000.
I added a -N to the linker flags (LDFLAGS := -Wl,-N,-gc-sections), to see if this would affect anything, and id didn't.
Any ideas on what may be wrong?
Thank you in advance.
The "The memory location used for the stack is not writable" error occurs when VisualGDB tries to test whether the end-of-stack (_estack - 4) is writable.
If you switch the GDB Session window to the All GDB Interaction mode, you will see that VisualGDB is trying to write a random value there and then checks whether it can be read back:
-data-evaluate-expression "&_estack"
^done,value="0x20020000"
-var-create - * "*((void **)0x2001fffc)"
^done,name="var1",numchild="0",value="0x80002ad ",type="void *",has_more="0"
-var-assign "var1" 0x1b5bfd22
^done,value="0x1b5bfd22"
-data-evaluate-expression "\*\(\(void\ \*\*\)0x2001fffc\)"
^done,value="0x1b5bfd22"
If it does not, most likely you have selected an incorrect device while creating your project (e.g. your device actually has 32KB of RAM while you have selected a device with 64K of RAM). There can also be a bug in the VisualGDB device definitions.
You can find this out by comparing the address of _estack from your linker script with the end address of the RAM described in your device datasheet.

Resources