How to experiment source code with `pdb`, `inspect`, `pprint`? - debugging

Problem
I want to understand source code of Kur (deep learning library)
I have no proper training in programming, I prefer to learn by experimenting without prior theoretical knowledge
I want an easy tool to help me dig into the detail workings of source code
debug tools are like pdb library seem to be a good choice to try
But what is the easiest way to get started in experimenting source with pdb?
I just want to write one line of code to dig into details, rather than writing a few lines as demonstrated in many examples when you google pdb
In other words, which function of pdb should I use? and How to use it effectively for experimenting source?
Toy Example
I want to explore the inner workings of kur dump mnist.yml
To make it simple, I want to just explore not beyond __main__.py and kurfile.py.
To be more specific, I want to explore dump() and parse_kurfile() in __main__.py, and Kurfile.__init__() in kurfile.py
Their relationship is as follows:
console: kur dump mnist.yml -->
python: __main__.py : main() --> dump() --> parse_kurfile() -->
python: kurfile.py : Kurfile class --> __init__() ...
python: ... the rest is not to be explored
Which function of pdb should I use to explore the execution flow from dump() to parse_kurfile() and to Kurfile.__init__() and back to dump() again?
Update
How to effectively explore Jupyter notebook using pdb?
pdb inside Jupyter can't even remember console history, not good

One possible solution
use pdb.set_trace only
set_trace is trace the details on the level of the current code block, it will not go deeper to the next inner function.
for example, when I put a single pdb.set_trace inside dump(), pdb will not help me trace into the function of parse_kurfile(), but stay on the current dump() block:
def dump(args):
""" Dumps the Kurfile to stdout as a JSON blob.
"""
pdb.set_trace()
### parse kurfile.yml into parts to be used in python code
spec = parse_kurfile(args.kurfile, args.engine)
If I want to go deeper into parse_kurfile in __main__.py and Kurfile.__init__ in kurfile.py, then I just need to put one pdb.set_trace in each of the two functions, like below:
Update
From my experience so far, there are two libraries inspect and pprint go well with pdb library.
Inside library inspect, I use the following functions the most:
inspect.getdoc: to see the doc of the function
inspect.getmodule: to find out where this function or object come from
inspect.getfullargspec: to find out all the inputs the func takes
inpsect.getsourceliens: to get the source code of the function
with these functions above, when I want to checkout other functions, I don't have to go to find the source code in editor, I can see them right where I am in the pdb.
From library pprint, you can guess, I use pprint.pprint to print out the source code, the doc in a more readable format right inside pdb.
More Update
A working station to explore and experiment source:
using atom to split window and see different source files at the same time;
use iterm2 to split window and use ipython to execute python or bash code
organise them in the following way:
More update
During exploring, I want to have all the attributes and methods of a module or class ready at hand.
To achieve it, I can use inspect.getmembers(module or class name) and use iterm2 split window to view it:
Update: How to change color of iterm2 for the eyes?
Go to iterm2 preferences, color, change to Tango Dark, to gray the foreground color to make white texts look soft
Change Kur logger color setting to:
## logcolor.py
# Color codes for each log-level.
COLORS = {
'DEBUG': BLUE,
'INFO': MAGENTA,
'WARNING': RED,
'ERROR': RED,
'CRITICAL': GREEN
}

How to effectively use pdb in Jupyter notebook?
One way to avoid the drawbacks of pdb in Jupyter:
download the notebook into py file
insert codes like import pdb and pdb.set_trace() into the python code
in console, run python your.py
Now you can explore this py file as you do in above answer

Related

Can I use \input{file.tex} or similar to efficiently bring content (not a whole document) from a LaTeX file into a Jupyter notebook?

I am new to Jupyter notebooks and Python and have started using these to make materials for a workshop so that I can also produce a handout. I have useful content in various LaTeX files that I would like to include in a notebook.
I would like to know whether there is a command that will allow me to efficiently bring my already modularized content into a notebook. I will be happy to take suggestions on any approach to my problem in case the LaTeX route is the wrong way to proceed.
As a particular case, suppose an external file fig.tex has only a stand-alone tikzpicture (that I have successfully included in another LaTeX document). If I start a notebook code cell with
%%itikz and follow with \input{fig} I obtain error messages.
I can remedy the problem if I add a preamble with \documentclass{}, many necessary \usepackage{} and \usetikzlibrary{} commands (which I have already included at the top of the notebook), and wrap the content with begin/end document commands.
There is more manual handling here than I would like. Is there a more efficient way to include the tikzfigure content?
So it turns out with itikz you have an --implicit-pic option that fills in the preamble for you.
In principle, with this option, your cell would look like the following:
%%itikz --implicit-pic
% my awesome figure
\input{path/to/fig}
This creates a tex file populated like so:
\documentclass[tikz]{standalone}
\begin{document}
\begin{tikzpicture}[scale=1.0]
% my awesome figure
\input{path/to/fig}
\end{tikzpicture}
\end{document}
In addition when using an implicit-pic it is usefull to load tikz packages and set options . To quote from the Quickstart guide:
In an --implicit-pic, it's often useful to:
Set the \tikzpicture[scale=X] via --scale=<X> while iterating.
Set the \usepackage{X,Y,Z} via --tex-packages=<X,Y,Z>
Set the \usetizlibrary{X,Y,Z} via --tiz-libraries=<X,Y,Z>
For more info, see items 16-20 in the Quickstart notebook.

Pycharm debugging from saved state

Is there a way to debug a code in parts? Meaning, I wish to debug the code until a certain point, save the variables state and continue to debug from that point on.
Thanks
With its debug function, Pycharm offers a fantastic opportunity to see the properties of certain variables if the breakpoint has been set accordingly.
Apart from that, Python itself offers an amazing way to serialize and de-serialize object structures with its built-in feature pickle (Pickle documentation).
The pickle.dump(VARIABLE) command can be used to dump variables at a certain state into a file or to be printed.
Sometimes I'm using pickle f.e. to dump a response variable into a file for later being used.
Example Code
import pickle
import requests
r = requests.get('https://www.strava.com/api/v3/athlete')
#
# with open('assets/requests_test.pickle', 'wb') as write:
# pickle.dump(r.json(), write)
With that you're able to open this file manually or load it with pickle.load (VARIABLE) later in your code to do something useful with it.

Rstudio difference between run and source

I am using Rstudio and not sure how options "run" and "source" are different.
I tried googling these terms but 'source' is a very common word and wasn't able to get good search results :(
Run and source have subtly different meanings. According to the RStudio documentation,
The difference between running lines from a selection and invoking
Source is that when running a selection all lines are inserted
directly into the console whereas for Source the file is saved to a
temporary location and then sourced into the console from there
(thereby creating less clutter in the console).
Something to be aware of, is that sourcing functions in files makes them available for scripts to use. What does this mean? Imagine you are trying to troubleshoot a function that is called from a script. You need to source the file containing the function, to make the changes available in the function be used when that line in the script is then run.
A further aspect of this is that you can source functions from your scripts. I use this code to automatically source all of the functions in a directory, which makes it easy to run a long script with a single run:
# source our functions
code.dir <- "c:\temp"
code.files = dir(code.dir, pattern = "[.r]")
for (file in code.files){
source(file = file.path(code.dir,file))
}
Sometimes, for reasons I don't understand, you will get different behavior depending on whether you select all the lines of code and press the run the button or go to code menu and chose 'source.' For example, in one specific case, writing a gplot to a png file worked when I selected all my lines of code but the write failed to when I went to the code menu and chose 'source.' However, if I choose 'Source with Echo,' I'm able to print to a png file again.
I'm simply reporting a difference here that I've seen between the selecting and running all your lines and code and going to code menu and choosing 'source,' at least in the case when trying to print a gplot to a png file.
An important implication of #AndyClifton's answer is:
Rstudio breakpoints work in source (Ctrl-Shift-S) but not in run (Ctrl-Enter)
Presumably the reason is that with run, the code is getting passed straight into the console with no support for a partial submission.
You can still use browser() though with run though.
print() to console is supported in debugSource (Ctrl-Shift-S) as well as run.
The "run" button simply executes the selected line or lines. The "source" button will execute the entire active document. But why not just try them and see the difference?
I also just discovered that the encoding used to read the function sourced can also be different if you source the file or if you add the function of the source file to your environment with Ctrl+Enter!
In my case there was a regex with a special character (ยต) in my function. When I imported the function directly (Ctrl+Enter) everything would work, while I had an error when sourcing the file containing this function.
To solve this issue I specified the encoding of the sourced file in the source function (source("utils.R", encoding = "UTF-8")).
Run will run each line of code, which means that it hits enter at the beginning of each line, which prints the output to the console. Source won't print anything unless you source with echo, which means that ggplot won't print to pngs, as another posted mentioned.
A big practical difference between run and source is that if you get an unaccounted for error in source it'll break you out of the code without finishing, whereas run will just pass the next line to the console and keep going. This has been the main practical difference I've seen working on cleaning up other people's scripts.
When using RSTudio u can press the run button in the script section - it will run the selected line.
Next to it you have the re - run button, to run the line again. and the source button next to it will run entire chuncks of code.
I found a video about this topic:
http://www.youtube.com/watch?v=5YmcEYTSN7k
Source/Source with echo is used to execute the whole file whereas Run as far as my personal experience goes executes the line in which your cursor is present.
Thus, Run helps you to debug your code. Watch out for the environment. It will display what's happening in the stack.
To those saying plots do not show. They won't show in Plots console. But you can definitely save the plot to disc using Source in RStudio. Using this snippet:
png(filename)
print(p)
dev.off()
I can confirm plots are written to disc. Furthermore print statements are also outputted to the console

How to diff 2 notebooks at the source level?

Any one knows a tool to find difference between 2 notebooks at the source level?
The compare notebooks tool in workbench 2 seems to work at the internal data structure level which is not useful for me. I am looking for tool that looks at differences at the source level (what one sees when looking at a notebook, i.e. not the FullForm).
I am using V8 of Mathematica on windows.
EDIT1:
How I display the output/report from NotebookDiff in a more readable form?
This answer is based on discussion in the comments to other parts of this question.
It also could (and should) be automated if it's going to be used with any regularity.
This could be done by tagging the cells you want compared and using NotebookFind to find the cells for extraction and comparison.
A solution for comparing just a single large cell of code (as sometimes occurs when makeing demonstrations) is to copy the code in InputForm from both notebooks
and paste it into a simple diff tool such as Quick Diff Online
which will then display the standard diff for you:
The above code was taken from one of Nasser's demonstrations.
Another option is to use CellDiff from the AuthorTools package.
Needs["AuthorTools`"];
CellDiff[Cell["Some text.", "Text"],
Cell["Some different text.", "Text"]]
To use on your demonstrations you can copy the cell expressions from the two versions by right clicking on the cell brackets:
There is an undocumented package in the built-in add-ons (in $InstallationDirectory/AddOns/Applications) called AuthorTools. Once loaded, it exposes a NotebookDiff function which provides some basic diff features:
Needs["AuthorTools`"];
nb1 = NotebookPut[
Notebook[{Cell["Subsection heading", "Subsection"],
Cell["Some text.", "Text"]}]];
nb2 = NotebookPut[
Notebook[{Cell["Edited Subsection heading", "Subsection"],
Cell["Some different text.", "Text"]}]];
NotebookPut#NotebookDiff[nb1, nb2]
As this package is undocumented, please realize it is potentially buggy and is not considered a supported feature, but hopefully you still find it useful.
Note that you can also get handles to notebooks with e.g.:
nb1 = NotebookOpen["path/to/a/notebook.nb"]
and a list of notebooks currently open in the front end
Notebooks[]
If you must work with notebooks then NotebookDiff in AuthorTools is probably your best bet. If this is an important part of your workflow (due to version control or some other constraint) and you have some flexibility you may want to consider moving the code from the existing notebook (.nb) into a package file (.m), which will be saved as plain text. You can still open and edit package files in the Mathematica notebook front end, but you get the added benefit of being able to diff them using existing text diffing tools.

Generate line graph for any benchmark?

I had spent so many hours failing to find a line graph generator for my benchmark results that I just wanted to plug in. I tried quite a few like Google's chart API but it still seemed confusing or not graceful looking, I am clueless.
Examples of benchmark images I wished to make something like are this:
What specific applications /web services do you recommend for generating something even close to this? I want something "neat".
You can use python mathplotlib, which generates beautiful graphs like:
(Source code)
I use gnuplot. It is not a lib, but a separate executable file. You can output plotting data to one file, and plotting commands in another - script file, which refer to data file. Then call gnuplot with this script file.
Another way is to use qwt. It is a real library, but it depends on Qt. If you already use Qt in your project, it is very straigth way to plot graphs. If not, then just use gnuplot

Resources