Tensorflow Gdb line numbers - debugging

I am creating a new Platform/Device for Tensorflow.
I have registered my Platform and right now am developing an Operation.
As it happens in development i am getting a crash and am trying to debug it with gdb.
The issue is that although i have added flags in bazel to generate the debug version of tensorflow, gdb does not show file/line numbers in the backtrace, nor can i see variables and code:
#0 0x00007fffe288d7f4 in tensorflow::Tensor::DebugString(int) const () from /tmp/TensorflowVT/vt_tf/lib/python3.4/site-packages/tensorflow/python/../libtensorflow_framework.so
#1 0x00007fffe4cd1373 in ConstOp::Compute(tensorflow::OpKernelContext*) () from /tmp/TensorflowVT/vt_tf/lib/python3.4/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#2 0x00007fffe2a9ab4d in tensorflow::(anonymous namespace)::ExecutorState::Process(tensorflow::(anonymous namespace)::ExecutorState::TaggedNode, long long) () from /tmp/TensorflowVT/vt_tf/lib/python3.4/site-packages/tensorflow/python/../libtensorflow_framework.so
I am building tensorflow like this:
bazel build --incompatible_remove_native_http_archive=false --incompatible_package_name_is_a_function=false --config=opt --verbose_failures --compilation_mode=dbg -c dbg --strip=never //tensorflow/tools/pip_package:build_pip_package
I also tried manualy finding the crash point, and added debug prints in
tensorflow/core/framework/tensor.cc, but
Is it possible to get file/lines and annotated code, to better understand where and why i get the crash?

Related

Issue with running Cobalt built with linux-x64x11 config

I'm trying to run current Cobalt trunk (12.81256) on Ubuntu 16.04. but it fails:
[0814/100203:FATAL:graphics_system.cc(130)] Check failed: 1 == num_configs (1 vs. 0)
base::debug::StackTrace::StackTrace() [0x1f6202d]
logging::LogMessage::~LogMessage() [0x1f5fe99]
cobalt::renderer::backend::GraphicsSystemEGL::GraphicsSystemEGL() [0x67e5bdd]
cobalt::renderer::backend::CreateDefaultGraphicsSystem() [0x67e549e]
cobalt::renderer::RendererModule::Resume() [0x67dbe65]
cobalt::renderer::RendererModule::RendererModule() [0x67db776]
cobalt::browser::BrowserModule::BrowserModule() [0x1ce38c3]
cobalt::browser::Application::Application() [0x1cb71a5]
cobalt::browser::ApplicationStarboard::ApplicationStarboard() [0x1cb09c7]
cobalt::browser::CreateApplication() [0x1cb072e]
(anonymous namespace)::StartApplication() [0x1caef05]
cobalt::wrap_main::BaseEventHandler<>() [0x1cae9af]
SbEventHandle [0x1cae225]
starboard::shared::starboard::Application::DispatchAndDelete() [0x214dc7e]
starboard::shared::starboard::Application::DispatchStart() [0x214c07e]
starboard::shared::starboard::Application::Run() [0x214b8b7]
main [0x2120f95]
<unknown> [0x7f864632f830]
_start [0x1bd6029]
_start [0x1bd6029]
I found that issue with EGL configuration comes from using:
EGL_BIND_TO_TEXTURE_RGBA, EGL_TRUE
Without it eglChooseConfig will return 1 configuration.
But then it will fail again, after call to
eglCreateWindowSurface()
in cobalt/renderer/backend/egl/display.cc
[0814/111151:FATAL:display.cc(53)] Check failed: 0x3000 == eglGetError() (12288 vs. 12297)
Since this is EGL_BAD_MATCH error, chosen EGL config is not good, but neither was the original one.
I tried with setting
'gl_type%': 'system_gles2',
in starboard/linux/shared/gyp_configuration.gypi, but results were the same.
Am I missing something?
Steps for reproduction of crash:
Build:
cobalt/build/gyp_cobalt -C debug linux-x64x11
ninja -C out/linux-x64x11_debug cobalt
Run:
./out/linux-x64x11_debug/cobalt
Is there maybe some dependency on EGL or GLES libraries?
Issue is related to which libEGL and libGLES libraries are used.
Following libraries were used on my system:
libEGL.so.1 => /usr/lib/nvidia-375/libEGL.so.1 (0x00007f66bbebc000)
libGLESv2.so.2 => /usr/lib/nvidia-375/libGLESv2.so.2 (0x00007f66bbcad000)
When using libraries from mesa:
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/mesa-egl ./cobalt
Cobalt will start and work.
Thanks Daniel and Andrew for help.

select subset of debug info files in gdb session

On my fedora box I have installed a lot of separate debug infos.
sudo dnf debuginfo-install <list of packets>
Now, if I debug some simple code it needs very long until some symbol is displayed or some values are printed. It is quite clear that is absolutly needed to evaluate all the installed symbol files to get all information.
But if I have a problem, say on a lib like goocanvas I only want to have my local debug smbols generated with my own compiled code with -g option and the only the debug infos for goocanvas libs.
How can that kind of selection be achieved? Only by renaming the folder of debug info files and generate a copy of needed ones? Maybe as a symlink? Or is there a common selection option anywhere?
You can skip all debug info from shared libraries and only load goocanvas lib symbols. Here is a sample of how to do it in gdb session:
[ ~]$ gdb -q /your/binary
(gdb) set auto-solib-add off
(gdb) start
Temporary breakpoint 1, 0x000055555564edd0 in main ()
(gdb) sharedlibrary goocanvas
From gdb doc:
If your program uses lots of shared libraries with debug info that
takes large amounts of memory, you can decrease the gdb memory
footprint by preventing it from automatically loading the symbols from
shared libraries. To that end, type set auto-solib-add off before
running the inferior, then load each library whose debug symbols you
do need with sharedlibrary regexp, where regexp is a regular
expression that matches the libraries whose symbols you want to be
loaded.
See also this related question: How to prevent GDB from loading debugging symbol for a (large) library?

Class "Veins::ObstacleControl" not found

I have followed step by step the tutorial to install Veins, but when I tried running the example scenario (final step) I ended up with the above error.
The whole error was:
Error in module (cModule) RSUExampleScenario (id=1) during network
setup: Class "Veins::ObstacleControl" not found -- perhaps its code
was not linked in, or the class wasn't registered with
Register_Class(), or in the case of modules and channels, with
Define_Module()/Define_Channel().
TRAPPING on the exception above, due to a debug-on-errors=true
configuration option. Is your debugger ready?
Simulation terminated with exit code: -2147483645 Working directory:
C:/Users/user/src/veins-4.3/examples/veins Command line:
../../../omnetpp-4.6/bin/opp_run.exe -r 0 -n .;../../src/veins
--tkenv-image-path=../../images -l ../../src/veins omnetpp.ini
I don't think I have missed a step during the tutorial as I have tried it two times. I did not make any change in anything, I've just strictly followed the tutorial like a robot, so I cannot provide an MCVE with more details than the tutorial.
Here is what I'm using:
- Windows 7 Pro 64 bits
- SUMO 0.25.0 64 bits
All other steps of the tutorial successfully worked until the final step.
I assume this error occurs when running Veins via the OMNeT++ IDE. Or, if you have compiled it with GCC (The error does not happen if you use CLANG)
There are two ways to bypass this error:
Use the .run as executable from your examples directory, which calls veins/run and includes all the required libraries:
Use opp_run as executable and set dynamic libraries to the directory where libveins.so is located (usually src/veins)
PS: to answer #ChristopSommer questions: Veins::ObstacleControl appears in opp_run -l src/veins -h classes
This could be a solution too, but I never tested it: Compiler flags in Eclipse

What is a simple way to play a .wav file in Nim on OSX?

I am trying to play a wav file in a very simple program that looks like this, currently attempting to use nim-csfml:
import csfml_audio
var alarmsong = newMusic("alarm.wav")
alarmsong.play()
but it appears to be relying on the existence of libcsfml.audio, and while my program compiles just fine, when I try to actually run it I get an error
| => ./alarm
could not load: libcsfml-audio.so
(I have a libcsfml-audio.dylib instead, being that I used the OSX shared libraries for csfml/sfml)
Is there some other way to play a .wav file in Nim?
Edit 1:
After the PR made by #def-, I now get a different, slightly more comforting error, which is probably due to some poor understanding of how nim deals with shared libraries:
| => ./alarm
could not load: libcsfml-audio.dylib
I added path = "/usr/local/lib" to my nim.cfg file, but it didn't seem to be affect anything. I also exported $LD_LIBRARY_PATH="/usr/local/lib" (/usr/local/bin is where libcsfml-audio.dylib is.), and tried compilation through
nim c alarm.nim --clib:/usr/local/lib/libcsfml-audio.dylib
Thanks for the help!
This program would just exit immediately; you need to keep it alive while the sound plays. Append this to the program:
import csfml_system
while alarmsong.status == SoundStatus.Playing:
sleep 100.milliseconds
For nim-csfml to work you'll need SFML 2.1 and CSFML 2.1. Also, it seems that nim-csfml is actually broken for Mac OS X, so I've made a pull request with a fix: https://github.com/BlaXpirit/nim-csfml/pull/4
Other modules that could play sound are sdl_mixer, sdl2/audio and allegro5.
As an OSX-only alternative without using any libraries, by calling the afplay binary:
import osproc
discard execProcess("afplay", ["file.wav"])
Edit1:
When Nim reports "could not load: libcsfml-audio.dynlib" that could also mean that one of the dependencies of that library are missing or in a wrong version. Especially SFML 2.2 doesn't work with CSFML 2.1. Make sure libsfml-audio.dynlib is in your LD_LIBRARY_PATH as well. If that doesn't work either, you could try to compile and run a regular C CSFML example like this one: https://gist.github.com/def-/fee8bb041719337c8812
Compile it with clang -o mainpage -lcsfml-graphics -lcsfml-audio -lGL -lGLEW mainpage.c to see the errors/warnings about missing libraries.

How to 'reload' source files in GDB

Is there a command in gdb that I can use to (re)load / "refresh" source files? (As far as I can see, gdb works only with source directories, according to Debugging with GDB: Source - and there is no specific command to "refresh")
Background about my problem:
I use a virtual machine with a debug kernel, so I can connect to a local instance of gdb, and can debug kernel modules. The modules are compiled with debug info on, and this specifies folders where the source of the modules is kept (Instruct GDB 6.5 to use source embedded in object file - Stack Overflow). I have the source directories in the same path(s) in both VM and local machine.
The problem is this - I need to do quite a bit of steps in order to get the module to segfault - and the remote gdb to go into the stack. Then I do a backtrace, and I can see the source files referenced, i.e.
#0 0xc0132a13 in ?? ()
#1 0xc056e551 in ?? ()
#2 0xc056e506 in ?? ()
#3 0xd8be53f3 in mymodule_func1 (var1=0xd79f9b44, var2=0x0, var3=825269148)
at /media/src/mymodule.h:954
#4 0xd8be53d0 in mymodule_func2 (data=3617561412)
at /media/src/mymodule.h:936
#5 0xc014fe87 in ?? ()
#6 0xc0151478 in ?? ()
Then I try to do say, list /media/src/mymodule.h:954 - and I realize that I have changed stuff on the local version of mymodule.h file!!
So I undo the changes - but unfortunately, GDB does not see these changes! And, of course, I don't want to restart GDB - because that means I have to restart the VM, and go through the entire procedure in order to get the kernel module to segfault again :( !!
Then I try to do something like this:
(gdb) show symbol-reloading
Dynamic symbol table reloading multiple times in one run is off.
(gdb) set symbol-reloading on
(gdb) add-symbol-file ~/mymodule.o 0xd8be4000
add symbol table from file "/media/src/mymodule.o" at
.text_addr = 0xd8be4000
(y or n) y
Reading symbols from /media/src/mymodule.o...done.
... in hope that it will somehow "reload" the source files - but unfortunately, list /media/src/mymodule.h:954 shows that it doesn't, nothing is changed - even though gdb does recognize that something has changed, as in warning: Source file is more recent than executable.... (so, for the time being, I have to restart entire VM and gdb as well :( :( )
Resetting the directory list using the directory command appears to have the desired effect.
From https://www.cs.rochester.edu/~nelson/courses/csc_173/review/gdb.html:
After changing program, reload executable with file command
(gdb) file gdbprog
A program is being debugged already. Kill it? (y or n) y
Load new symbol table from "gdbprog"? (y or n) y
Reading symbols from gdbprog...
done.
Breakpoint 1 at 0x2298: file gdbprog.cc, line 10.
(gdb) run
Starting program: gdbprog
Breakpoint 1, InitArrays (array=0x18be8)
at gdbprog.cc:10
10 for(i = 0;i < 10;i++)
This warning means source files from which binary was made are updated with new changes.
To remove this warning just rebuild the binary you are debugging with new and modified files.

Resources