I use GDB to understanding how CPython executes the test.py source file and I want to stop the CPython when it starts the execution of opcode I am interested.
OS: Ubuntu 18.04.2 LTS
Debugger: GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
The first problem - many CPython's .py own files are executed before my test.py gets its turn, so I can't just break at the _PyEval_EvalFrameDefault - there are many of them, so I should distinguish my file from others.
The second problem - I can't set the condition like "when the filename is equal to the test.py", because the filename is not a simple C string, it is the CPython's Unicode object, so the standard GDB string functions can't be used for comparing.
At this moment I do the next trick for breaking the execution at the needed line of test.py source:
For example, I have the source file:
x = ['a', 'b', 'c']
# I want to set the breakpoint at this line.
for e in x:
print(e)
I add the binary left shift operator to the code:
x = ['a', 'b', 'c']
# Added for breakpoint
a = 12
b = 2 << a
for e in x:
print(e)
And then, track the BINARY_LSHIFT opcode execution in the Python/ceval.c file by this GDB command:
break ceval.c:1327
I have chosen the BINARY_LSHIFT opcode, because of its seldom usage in the code. Thus, I can reach the needed part of .py file quickly - it happens once in the all other .py modules executed before my test.py.
I look the more straightforward way of doing the same, so
the questions:
Can I catch the moment the test.py starts executing? I should mention, what the test.py filename is appearing on different stages: parsing, compilation, execution. So, it also will be good to can break the CPython execution at the any stage.
Can I specify the line of the test.py, where I want to break? It is easy for .c files, but is not for .py files.
My idea would be to use a C-extension, to make setting C-breakpoints possible in a python-script (similar to pdb.set_trace() or breakpoint() since Python3.7), which I will call cbreakpoint.
Consider the following python-script:
#example.py
from cbreakpoint import cbreakpoint
cbreakpoint(breakpoint_id=1)
print("hello")
cbreakpoint(breakpoint_id=2)
It could be used as follows in gdb:
>>> gdb --args python example.py
[gdb] b cbreakpoint
[gdb] run
Now, the debuger would stops at cbreakpoint(breakpoint_id=1) and cbreakpoint(breakpoint_id=2).
Here is proof of concept, written in Cython to avoid the otherwise needed boilerplate-code:
#cbreakpoint.pyx
cdef extern from *:
"""
long long last_breakpoint_id = -1;
void cbreakpoint(long long breakpoint_id){
last_breakpoint_id = breakpoint_id;
}
"""
void c_cbreakpoint "cbreakpoint"(long long breakpoint_id)
def cbreakpoint(breakpoint_id = 0):
c_cbreakpoint(breakpoint_id)
which can be build inplace via:
cythonize -i cbreakpoint.pyx
If Cython isn't installed, I have uploaded a version which doesn't depend on Cython (too much code for this post) on github.
It is also possible to break conditionally, given the breakpoint_id, i.e.:
>>> gdb --args python example.py
[gdb] break src/cbreakpoint.c:595 if breakpoint_id == 2
[gdb] run
will break only after hello was printed - at cbreakpoint with id=2 (while cbreakpoint with id=1 will be skipped). Depending on Cython version the line can vary, but can be found out once gdb stops at cbreakpoint.
It would also do something similar without any additional modules:
add breakpoint or import pdb; pdb.set_trace() instead of cbreakpoint
gdb --args python example.py + run
When pdb interrupts the program, hit Ctrl+C in order to interrupt in gdb.
Activate breakpoints in gdb.
continue in gdb and then in pdb (i.e. c+enter twice).
A small problem is, that after that the breakpoints might be hit while in pdb, so the first method is a little bit more robust.
I've tried hooking to other commands such as echo and it works well. But when it comes to hooking the x command, it fails. Here's the codes inside of my .gdbinit file.
set $pince_injection_failed = 1
set $pince_debugging_mode = 0
define hook-x
if $pince_injection_failed = 1
echo asdf
end
define hookpost-x
if $pince_debugging_mode = 0
echo zxcv
end
I'm aware that gdb doesn't accept aliases of a function for hooking. But x is already a full function isn't it? I couldn't find any aliases for it. I'm also doubting about it because a single character is too short for a command to be
I found the solution thanks to the Mark Plotnick. It seems like another fault of mine, I found out that there was a function that had a misplaced end, so all functions came after that function got ignored by gdb naturally.
define keks
set $lel=0
while($lel<10)
x/x 0x00400000
set $lel = $lel+1
end
Notice the missing end at the end of while loop
I share my .gdbinit script (via NFS) across machines running different versions of gcc. I would like some gdb commands to be executed if the code I am debugging has been compiled with a specific compiler version. Can gdb do that?
I came up with this:
define hook-run
python
from subprocess import Popen, PIPE
from re import search
# grab the executable filename from gdb
# this is probably not general enough --
# there might be several objfiles around
objfilename = gdb.objfiles()[0].filename
# run readelf
process = Popen(['readelf', '-p', '.comment', objfilename], stdout=PIPE)
output = process.communicate()[0]
# match the version number with a
regex = 'GCC: \(GNU\) ([\d.]+)'
match=search(regex, output)
if match:
compiler_version = match.group(1)
gdb.execute('set $compiler_version="'+str(compiler_version)+'"')
gdb.execute('init-if-undefined $compiler_version="None"')
# do what you want with the python compiler_version variable and/or
# with the $compiler_version convenience variable
# I use it to load version-specific pretty-printers
end
end
It is good enough for my purpose, although it is probably not general enough.
A Python GUI that I develop executes an exe file in the same directory. I need to allow the user to open multiple instances of the GUI. This results in the same exe being called simultaneously and raises the following error: the process can not access the file because it is being used by another process. I use a dedicated thread in the python GUI to run the exe.
How can I allow the multiple GUIs to run the same exe simultaneously?
I would appreciate code examples.
Following is the thread. The run includes the execution of the exe. This exe was made using fortran.
class LineariseThread(threading.Thread):
def __init__(self, parent):
threading.Thread.__init__(self)
self._parent = parent
def run(self):
self.p = subprocess.Popen([exe_linearise], shell=True, stdout=subprocess.PIPE)
print threading.current_thread()
print "Subprocess started"
while True:
line = self.p.stdout.readline()
if not line:
break
print line.strip()
self._parent.status.SetStatusText(line.strip())
# Publisher().sendMessage(('change_statusbar'), line.strip())
sys.stdout.flush()
if not self.p.poll():
print " process done"
evt_show = LineariseEvent(tgssr_show, -1)
wx.PostEvent(self._parent, evt_show)
def killtree(self, pid):
print pid
parent = psutil.Process(pid)
print "in killtree sub: "
for child in parent.get_children(recursive=True):
child.kill()
parent.kill()
def abort(self):
if self.isAlive():
print "Linearisation thread is alive"
# kill the respective subprocesses
if not self.p.poll():
# stop them all
self.killtree(int(self.p.pid))
self._Thread__stop()
print str(self.getName()) + " could not be terminated"
self._parent.LineariseThread_killed=True
I think I figured out a way to avoid the error. It was actually not the execution of the exe raised the error. The error raised when the exe accesses the other files which are locked by another instance of the same exe. Therefore, I decided not to allow multiple instance of exe to run. Instead, I thought of allowing multiple cases to be opened within a single instance. That way I can manage the process threads to avoid the above mentioned issue.
I should mention that the comments given to me helped me to study the error messages in detail to figure out what was really going on.
If I am running a long R script from the command line (R --slave script.R), then how can I get it to give line numbers at errors?
I don't want to add debug commands to the script if at all possible; I just want R to behave like most other scripting languages.
This won't give you the line number, but it will tell you where the failure happens in the call stack which is very helpful:
traceback()
[Edit:] When running a script from the command line you will have to skip one or two calls, see traceback() for interactive and non-interactive R sessions
I'm not aware of another way to do this without the usual debugging suspects:
debug()
browser()
options(error=recover) [followed by options(error = NULL) to revert it]
You might want to look at this related post.
[Edit:] Sorry...just saw that you're running this from the command line. In that case I would suggest working with the options(error) functionality. Here's a simple example:
options(error = quote({dump.frames(to.file=TRUE); q()}))
You can create as elaborate a script as you want on an error condition, so you should just decide what information you need for debugging.
Otherwise, if there are specific areas you're concerned about (e.g. connecting to a database), then wrap them in a tryCatch() function.
Doing options(error=traceback) provides a little more information about the content of the lines leading up to the error. It causes a traceback to appear if there is an error, and for some errors it has the line number, prefixed by #. But it's hit or miss, many errors won't get line numbers.
Support for this will be forthcoming in R 2.10 and later. Duncan Murdoch just posted to r-devel on Sep 10 2009 about findLineNum and setBreapoint:
I've just added a couple of functions to R-devel to help with
debugging. findLineNum() finds which line of which function
corresponds to a particular line of source code; setBreakpoint() takes
the output of findLineNum, and calls trace() to set a breakpoint
there.
These rely on having source reference debug information in the code.
This is the default for code read by source(), but not for packages.
To get the source references in package code, set the environment
variable R_KEEP_PKG_SOURCE=yes, or within R, set
options(keep.source.pkgs=TRUE), then install the package from source
code. Read ?findLineNum for details on how to tell it to search
within packages, rather than limiting the search to the global
environment.
For example,
x <- " f <- function(a, b) {
if (a > b) {
a
} else {
b
}
}"
eval(parse(text=x)) # Normally you'd use source() to read a file...
findLineNum("<text>#3") # <text> is a dummy filename used by
parse(text=)
This will print
f step 2,3,2 in <environment: R_GlobalEnv>
and you can use
setBreakpoint("<text>#3")
to set a breakpoint there.
There are still some limitations (and probably bugs) in the code; I'll
be fixing thos
You do it by setting
options(show.error.locations = TRUE)
I just wonder why this setting is not a default in R? It should be, as it is in every other language.
Specifying the global R option for handling non-catastrophic errors worked for me, along with a customized workflow for retaining info about the error and examining this info after the failure. I am currently running R version 3.4.1.
Below, I've included a description of the workflow that worked for me, as well as some code I used to set the global error handling option in R.
As I have it configured, the error handling also creates an RData file containing all objects in working memory at the time of the error. This dump can be read back into R using load() and then the various environments as they existed at the time of the error can be inspected interactively using debugger(errorDump).
I will note that I was able to get line numbers in the traceback() output from any custom functions within the stack, but only if I used the keep.source=TRUE option when calling source() for any custom functions used in my script. Without this option, setting the global error handling option as below sent the full output of the traceback() to an error log named error.log, but line numbers were not available.
Here's the general steps I took in my workflow and how I was able to access the memory dump and error log after a non-interactive R failure.
I put the following at the top of the main script I was calling from the command line. This sets the global error handling option for the R session. My main script was called myMainScript.R. The various lines in the code have comments after them describing what they do. Basically, with this option, when R encounters an error that triggers stop(), it will create an RData (*.rda) dump file of working memory across all active environments in the directory ~/myUsername/directoryForDump and will also write an error log named error.log with some useful information to the same directory. You can modify this snippet to add other handling on error (e.g., add a timestamp to the dump file and error log filenames, etc.).
options(error = quote({
setwd('~/myUsername/directoryForDump'); # Set working directory where you want the dump to go, since dump.frames() doesn't seem to accept absolute file paths.
dump.frames("errorDump", to.file=TRUE, include.GlobalEnv=TRUE); # First dump to file; this dump is not accessible by the R session.
sink(file="error.log"); # Specify sink file to redirect all output.
dump.frames(); # Dump again to be able to retrieve error message and write to error log; this dump is accessible by the R session since not dumped to file.
cat(attr(last.dump,"error.message")); # Print error message to file, along with simplified stack trace.
cat('\nTraceback:');
cat('\n');
traceback(2); # Print full traceback of function calls with all parameters. The 2 passed to traceback omits the outermost two function calls.
sink();
q()}))
Make sure that from the main script and any subsequent function calls, anytime a function is sourced, the option keep.source=TRUE is used. That is, to source a function, you would use source('~/path/to/myFunction.R', keep.source=TRUE). This is required for the traceback() output to contain line numbers. It looks like you may also be able to set this option globally using options( keep.source=TRUE ), but I have not tested this to see if it works. If you don't need line numbers, you can omit this option.
From the terminal (outside R), call the main script in batch mode using Rscript myMainScript.R. This starts a new non-interactive R session and runs the script myMainScript.R. The code snippet given in step 1 that has been placed at the top of myMainScript.R sets the error handling option for the non-interactive R session.
Encounter an error somewhere within the execution of myMainScript.R. This may be in the main script itself, or nested several functions deep. When the error is encountered, handling will be performed as specified in step 1, and the R session will terminate.
An RData dump file named errorDump.rda and and error log named error.log are created in the directory specified by '~/myUsername/directoryForDump' in the global error handling option setting.
At your leisure, inspect error.log to review information about the error, including the error message itself and the full stack trace leading to the error. Here's an example of the log that's generated on error; note the numbers after the # character are the line numbers of the error at various points in the call stack:
Error in callNonExistFunc() : could not find function "callNonExistFunc"
Calls: test_multi_commodity_flow_cmd -> getExtendedConfigDF -> extendConfigDF
Traceback:
3: extendConfigDF(info_df, data_dir = user_dir, dlevel = dlevel) at test_multi_commodity_flow.R#304
2: getExtendedConfigDF(config_file_path, out_dir, dlevel) at test_multi_commodity_flow.R#352
1: test_multi_commodity_flow_cmd(config_file_path = config_file_path,
spot_file_path = spot_file_path, forward_file_path = forward_file_path,
data_dir = "../", user_dir = "Output", sim_type = "spot",
sim_scheme = "shape", sim_gran = "hourly", sim_adjust = "raw",
nsim = 5, start_date = "2017-07-01", end_date = "2017-12-31",
compute_averages = opt$compute_averages, compute_shapes = opt$compute_shapes,
overwrite = opt$overwrite, nmonths = opt$nmonths, forward_regime = opt$fregime,
ltfv_ratio = opt$ltfv_ratio, method = opt$method, dlevel = 0)
At your leisure, you may load errorDump.rda into an interactive R session using load('~/path/to/errorDump.rda'). Once loaded, call debugger(errorDump) to browse all R objects in memory in any of the active environments. See the R help on debugger() for more info.
This workflow is enormously helpful when running R in some type of production environment where you have non-interactive R sessions being initiated at the command line and you want information retained about unexpected errors. The ability to dump memory to a file you can use to inspect working memory at the time of the error, along with having the line numbers of the error in the call stack, facilitate speedy post-mortem debugging of what caused the error.
First, options(show.error.locations = TRUE) and then traceback(). The error line number will be displayed after #