I'd like to know how debugging is achieved in a lazy functional language.
Can you use breakpoints, print statements and traditional techniques? Is this even a good idea?
It is my understanding that pure functional programming does not allow side-effects, with the exception of monads.
Order of execution is also not guaranteed.
Would you have to program a monad for every section of code you want to test?
I'd like some insight into this question from someone more experienced in this area.
Nothing prevents you from using breakpoints in a lazily evaluated functional program. The difference to eager evaluation is when the program will stop at the breakpoint and what the trace will look like. The program will stop when the expression a breakpoint is set on is actually being reduced (obviously).
Instead of the stack trace you're used to you get the reductions that led up to the reduction of the expression with the breakpoint on it.
Small silly example. You have this Haskell program.
add_two x = 2 + x
times_two x = 2 * x
foo = times_two (add_two 42)
And you put a breakpoint on the first line (add_two), then evaluate foo. When the program stops on the breakpoint, in an eager language you'd expect to have a trace like
add_two
foo
and times_two hasn't even begun to be evaluated, but in the GHCi debugger you get
-1 : foo (debug.hs:5:17-26)
-2 : times_two (debug.hs:3:14-18)
-3 : times_two (debug.hs:3:0-18)
-4 : foo (debug.hs:5:6-27)
<end of history>
which is the list of reductions that led up to the reduction of the expression you put the breakpoint on. Note that it looks like times_two "called" foo even though it does not do so explicitly. You can see from this that the evaluation of 2 * x in times_two (-2) did force the evaluation of (add_two 42) (-1) from the foo line. From there you can perform a step as in an imperative debugger (perform the next reduction).
Another difference to debugging in an eager language is that variables may be not yet evaluated thunks. For example, at step -2 in the above trace and inspect x, you'll find it's still an unevaluated thunk (indicated by brackets in GHCi).
For far more detailed information and examples (how to step through the trace, inspect values, ...), see the GHCi Debugger section in the GHC manual. There's also the Leksah IDE which I haven't used yet as I'm a VIM and terminal user, but it has a graphical frontend to the GHCi debugger according to the manual.
You also asked for print statements. Only with pure functions, this is not so easily possible as a print statement would have to be within the IO monad. So, you have a pure function
foo :: Int -> Int
and wish to add a trace statement, the print would return an action in the IO monad and so you'd have to adjust the signature of the function you wish to put that trace statement in, and the signatures of the functions that call it, ...
This is not a good idea. So, you need some way to break purity to achieve trace statements. In Haskell, this can be done with unsafePerformIO. There's the Debug.Trace module that already has a function
trace :: String -> a -> a
which outputs the string and returns the second parameter. It would be impossible to write as a pure function (well, if you intend to really output the string, that is). It uses unsafePerformIO under the hood. You can put that into a pure function to output a trace print.
Would you have to program a monad for every section of code you want to test?
I'd suggest rather the opposite, make as many functions pure as possible (I'm assuming here you mean the IO monad for printing, monads are not necessarily impure). Lazy evaluation allows you to separate IO code from processing code very cleanly.
Whether imperative debugging techniques are a good idea or not depends on the situation (as usual). I find testing with QuickCheck/SmallCheck much more useful than unit testing in imperative languages, so I'd go that route first to avoid as much debugging as possible. QuickCheck properties actually make nice concise function specifications (a lot of test code in imperative languages looks like just another blob of code to me).
One trick to avoid a lot of debugging is to decompose the function into many smaller subfunctions and test as many of them as possible. This may be a bit unusal when coming from imperative programming, but it's a good habit no matter what language you're using.
Then again, debugging != testing and if something goes wrong somewhere, breakpoints and traces may help you out.
I don't think this topic can be dealt within a short space. Please read the papers available at the following links:
A Theory of Tracing Pure Functional Programs.
The Haskell Tracer publications.
Haskell Debugging Technologies.
I've never delved into anything terribly complicated in Haskell, but the fact that side effects are virtually gone has eliminated most of the need for debugging. Pure functions are extremely simple to test and verify without a debugger.
On the other hand, I did experience a couple times I needed to debug something within a monad, in which case I already was able to print/log/whatever.
At least for smaller programs or systems, debugging kind of goes out the window. Strong typing and static type-checking really further eliminate the traditional bugs you find in procedural programming. Most bugs, if any, are logical bugs (called the wrong functions, mathematical error, etc) -- very easy to test interactively.
From experience with Clojure (which is lazy, functional, and encourages but does not enforce purity):
You can set breakpoints just as with any other language. However, becuase of lazy evaluation, these might not get called immediately, but will be hit as soon as evaluation the lazy structure is forced.
In lazy functional languages that allow side effects (Clojure included) you can insert printlns and other debug logging relatively easily. I personally find these very useful. You have to be careful about when these get called because of laziness, but if you don't see the output at all it can be a hint that your code isn't being evaluated because of laziness.....
Having said all the above, I have never so far needed to resort to the debugger. Often a few simple tests (perhaps on the REPL) are enough to verify that functional code is working correctly, and if these fail then it's usually quite obvious what is going wrong.
Allow me to advertise a tool of my own to debug laziness problems. It helped me resolve in an hour a laziness-related memory leak that I already spent 2 days debugging.
http://www.haskell.org/pipermail/haskell-cafe/2012-January/098847.html
http://hackage.haskell.org/package/htrace
Related
Is there a simple definition?
What is the nature, so to speak, of "assertive code"?
All the definitions of this I have, by now, found are very vague.
Is there something I can read that is concise and to the point without using a lot of jargon?
I think that the jargon could be a problem in my case. I am quite dumb but I wanna learn it so any help and pointers are welcome.
When you write "imperative code", you tell the computer what to do.
When you write "declarative code", you tell the computer what to produce.
When you write "assertive code", you tell the computer what you expect to be true.
The phrase "assertive code" isn't nearly as common as the other two, and is used in different ways in practice. In an common OO language it usually just refers to using assert expressions to catch bugs. In functional programming (the example you provide), it usually refers to pattern matching and destructuring constructs that imply a particular shape for their inputs. In a language like Prolog, it can refer to a definition of goals that the program must resolve.
An assert statement is essentially an if statement that will print an error (and, sometimes, stop the program) if the condition is false. If the condition is true, it will do nothing.
Assertions are normally used in software testing. You use them to check that a program behaves in a way that you expect it to. In other words, they will sound an alarm when a program violates an assumption that the programmer wanted to check.
However, there's nothing preventing you from leaving assertions in your production code too. This can sometimes be beneficial, especially in cases where you cannot easily simulate the program with a test - for example because you don't have the real data to test it with.
In such cases you typically want failed assertions just to print a message to a log file. After having your program run for a while, you check the log file and if everything is OK, there should be no messages about failed assertions.
In python, you can use pdb.set_trace() in the code to launch a pdb debugger right there when the code reaches that point, without having to deal with debuggers or breakpoints. Is there such an equivalent with gdb or any other debugger for go? I see https://golang.org/doc/gdb#Naming but I don't see how to apply it the same way.
No, there is no such equivalent. Python is inherently interpreted1 and pdb is simply part of any running instance of Python, so this is a lot easier there.
Once you are running under gdb or dlv, though, it's not that hard to set a breakpoint in some known function. Calling that function from the point at which you want to drop into the debugger will drop you into the debugger. So instead of pdb.set_trace just call debugging.Stop() and write a debugging package with a Stop function that just returns. Set your breakpoint there and run your program.
1Python can in theory be compiled or JIT-ted, but this tends not to work as well as with other languages due to the extremely dynamic nature of the language, e.g., method invocation for instance. Adding a few small restrictions to the language, none of which make it less usable, would make compilation to fast code much easier. For further details see Does the Python 3 interpreter have a JIT feature? (Removing the Global Interpreter Lock would have a big payoff as well, but is also hard: see the PyPy FAQ.)
Does anyone know of a Debugger or Programming Language that allows you to set a break point, and then modify the code and then execute the newly modified code.
This is even more useful if the Debugger also had the ability for reverse debugging. So you could step though the buggy code, stack backwards, fix the code, and then step though it again to see if you fixed the bug. Now that's sexy, is anyone doing this?
I believe the Hot Code Replace in eclipse is what you meant in the problem:
The idea is that you can start a debugging session on a given runtime
workbench and change a Java file in your development workbench, and
the debugger will replace the code in the receiving VM while it is
running. No restart is required, hence the reference to "hot".
But there are limitations:
HCR only works when the class signature does not change; you cannot
remove or add fields to existing classes, for instance. However, HCR
can be used to change the body of a method.
The totalview debugger provides the concept of Evaluation Point which allows user to "fix his code on the fly" or to "patch it" or to examine what if scenario without having to recompile.
Basically, user plants an Evaluation Point at some line and writes a piece of C/C++ or Fortran code he wants to execute instead. Could be a simple printf, goto, a set of if-then-else tests, some for loops etc... This is really powerful and time-sparing.
As for reverse-debugging, it's a highly desirable feature, but I'm not sure it already exists.
http://msdn.microsoft.com/en-us/library/bcew296c%28v=vs.80%29.aspx
The link is for VS 2005 but applies to 2008 and 2010 as well.
Edit, 2015: Read chapters 1 and 2 of my MSc thesis, Combining reverse debugging and live programming towards visual thinking in computer programming, it answers the question in detail.
The Python debugger, Pdb, allows you to run arbitrary code while paused (like at a breakpoint). For example, let's say you are debugging and have paused at the following line in your program, where the variable hasn't been declared in the program itself :
print (x)
so that moving forward (i.e., running that line) would result in :
NameError: name 'x' is not defined
You can define that variable in the debugger, and have the program continue executing with it :
(Pdb) 'x' in locals()
False
(Pdb) x = 1
(Pdb) 'x' in locals()
True
If you meant that the change should not be provided at the debugger console, but that you want to change the original code in some editor, then have the debugger automatically update the state of the live program in some way, so that the executing program reflects that change, that is called "live programming". (Not to be confused with "live coding" which is live performance of coding -- see TOPLAP -- though there is some confusion.) There has been an interest in research into live programming (and live coding) in the last 2 or 3 years. It is a very difficult problem to solve, and there are many different approaches. You can watch Bret Victor's talk, Inventing on Principle, for some examples of that. Note that those are prototypes only, to illustrate the idea. Hot-swapping of code so that the tree is drawn differently in the next loop of some draw() function, or so that the game character responds differently next time, (or so that the music or visuals are changed during a live coding session), is not that difficult, some languages and systems cater for that explicitly. However, the state of the program is not necessarily then a true reflection of the code (as also in the Pdb example above) -- if e.g. the game character could access an area based on some ability like jumping, and the code is then swapped out, he might never be able to access that area in the game any longer should the game be played from the start. To solve change propagation for general programming is difficult -- you can see that his search example re-runs the code from the start each time a change is made.
True reverse execution is also a tricky problem. There are a number of commercial projects, but almost all of them only record trace data to browse it afterwards, called omniscient debugging (but they are often called reverse-, back-in-time, bidirectional- or time-travel-debuggers, also a lot of confusion). In terms of free and open-source projects, the GNU debugger, gdb, has two modes, one is process record and replay which also only records the program for browsing it afterwards, the other is true reverse debugging which allows you to reverse in a live program. It is extremely slow, as it undoes single machine instruction at a time. The extended python debugger prototype, epdb, also allows for true reversing in a live program, and is much faster as it uses a snapshot/checkpoint and replay mechanism. Here is the thesis and here is the program and the code.
Are there any languages with possibility of declaring global assertions - that is assertion that should hold during the whole program execution. So that it would be possible to write something like:
global assert (-10 < speed < 10);
and this assertion will be checked every time speed changes state?
eiffel supports all different contracts: precondition, postcondition, invariant... you may want to use that.
on the other hand, why do you have a global variable? why don't you create a class which modifies the speed. doing so, you can easily check your condition every time the value changes.
I'm not aware of any languages that truly do such a thing, and I would doubt that there exist any since it is something that is rather hard to implement and at the same time not something that a lot of people need.
It is often better to simply assert that the inputs are valid and modifications are only done when allowed and in a defined, sane way. This concludes the need of "global asserts".
You can get this effect "through the backdoor" in several ways, though none is truly elegant, and two are rather system-dependent:
If your language allows operator overloading (such as e.g. C++), you can make a class that overloads any operator which modifies the value. It is considerable work, but on the other hand trivial, to do the assertions in there.
On pretty much every system, you can change the protection of memory pages that belong to your process. You could put the variable (and any other variables that you want to assert) separately and set the page to readonly. This will cause a segmentation fault when the value is written to, which you can catch (and verify that the assertion is true). Windows even makes this explicitly available via "guard pages" (which are really only "readonly pages in disguise").
Most modern processors support hardware breakpoints. Unless your program is to run on some very exotic platform, you can exploit these to have more fine-grained control in a similar way as by tampering with protections. See for example this article on another site, which describes how to do it under Windows on x86. This solution will require you to write a kind of "mini-debugger" and implies that you may possibly run into trouble when running your program under a real debugger.
I commonly place into variables values that are only used once after assignment. I do this to make debugging more convenient later, as I'm able to hover the value on the one line where it's later used.
For example, this code doesn't let you hover the value of GetFoo():
return GetFoo();
But this code does:
var foo = GetFoo();
return foo; // your hover-foo is great
This smells very YAGNI-esque, as the functionality of the foo's assignment won't ever be used until someone needs to debug its value, which may never happen. If it weren't for the merely foreseen debugging session, the first code snippet above keeps the code simpler.
How would you write the code to best compromise between simplicity and ease of debugger use?
I don't know about other debuggers, but the integrated Visual Studio debugger will report what was returned from a function in the "Autos" window; once you step over the return statement, the return value shows up as "[function name] returned" with a value of whatever value was returned.
gdb supports the same functionality as well; the "finish" command executes the rest of the current function and prints the return value.
This being a very useful feature, I'd be surprised if most other debuggers didn't support this capability.
As for the more general "problem" of "debugger-only variables," are they really debugger-only? I tend to think that the use of well-named temporary variables can significantly improve code readability as well.
Another possibility is to learn enough assembly programming that you can read the code your compiler generates. With that skill, you can figure out where the value is being held (in a register, in memory) and see the value without having to store it in a variable.
This skill is very useful if you are ever need to debug an optimized executable. The optimizer can generate code that is significantly different from how you wrote it such that symbolic debugging is not helpful.
Another reason why you don't need intermediate variables in the Visual Studio debugger is that you can evaluate the function in the Watch Window and the Immediate window. For the watch window, just simply highlight the statement you want evaluated and drag it into the window.
I'd argue that it's not worth worrying about. Given that there's no runtime overhead in the typical case, go nuts. I think that breaking down complex statements into multiple simple statements usually increases readability.
I would leave out the assignment until it is needed. If you never happen to be in that bit of code, wanting a look at that variable, you haven't cluttered up your code unnecessarily. When you run across the need, put it in (it should be a trivial Extract Variable refactoring). And when you're done with that debugging session, get rid of it (Inline Variable). If you find yourself debugging so much - and so much at that particular point - that you're weary of refactoring back and forth, then think about ways to avoid the need; maybe more unit tests would help.