Haskell redirect traces to a file - debugging

So here i was trying to figure out how to log any anomalies i might get inside my code to a log file. First i noticed the trace function, but then i saw that it only outputs to the stdin .
Then i see the logger module , but that runs inside the IO monad so its a bit of a hassle what with compromising purity and all. Then i figured that if i made a function a->b->b with the a parameter being of type IO () in my case all would be ok.
Indeed the compiler didn't see anything wrong with it but alas the append was never actually called so i was still back to basics. What i actually want to know is :
a) Are there any functions that perform IO while still having a pure signature (like unsafePerformIO) that could help me with my logging
b) is there any way to force the compiler to evaluate the first parameter in the function i built even though i's never actually used?
thank you guys in advance

Then i figured that if i made a function a->b->b with the a parameter being of type IO () in my case all would be ok.
Nope, wrong. This will do nothing, even if you "evaluate" the first argument. You cannot implement trace without unsafePerformIO.
IO values are just values, no more. Only when they happen in the course of the execution of main (or due to unsafePerformIO) are they actually executed.
It's not clear, though -- trace outputs to stderr. Is there a reason you can't just do
./MyHaskellExecutable 2>dumpStdErrToThisFile

Logging is side effecting, so has to be in some monad for that effect. Doing otherwise risk the compiler optimizing away your semantically-unnecessary logging calls.
If you are building an app with a plan to support logging, you will need to have it run in some kind of logging environment. IO is overkill, but perhaps a simpler Log monad would be more appropriate (kind of a magic Writer, with ST-like properties of local encapsulation).

Related

Why does mjit functions get invoked?

I'm doing research in ruby interpreter and mJIT.
And, as a first step, I would like to understand the behaviors of both. Thus, I simply ran a very simple ruby program without --jit command puts ("hello world!") and got the execution trace of it. Then, one thing I found that even without mJIT enabled, some of the mJIT functions get invoked, such as mjit_add_class_serial, mjit_remove_class_serial, mjit_mark, mjit_gc_finish_hook, mjit_free_iseq, and mjit_finish.
And, I would like to understand why that is. My guess is that the interpreter and mJIT shares some of those codes, but not 100% sure. Especially, the description of mjit_finish is briefly saying that it is for finishing up whatever the operation is happening by the mJIT compiler. In such case, why does this function gets invoked when interpreter-only execution code?
If anyone has an idea regarding my question, any recommendation would be very much appreciated.
Thank you.
This is for ruby version 2.6.2. And, I've gone through the source code as well as the comments explaining each code, but they are not very clear.

Is function parameter validation using errors a good pattern in Go?

Is parameter validation using error return codes considered good practice ? I mean where should somebody use errors vs panics (are there any guidelines?).
For instance:
Is checking for non-nil + returning an error if it is nil a good
practice ?
Or checking for correct integer ranges etc.
I imagine that using errors that often would make Go feel very C-ish and would look pretty bad. Are panics a good alternative in those situations ?
Or should a Gopher use the Python/Ruby/JS-approach "just let it fail" ?
I'm a bit confused because panics are for real "errors" in my understanding. But using errors all the time is just bad.
And even if I would return error code: What could I do if somebody passes wrong parameter to my function but ignores the errors codes ? -> Nothing! So honestly I would say panics are nice for those situations but in a language where error codes are used over panics this is not very clear.
"Escaping" panics1 in Go (I mean, those which might be produced by the functions comprising the public API of your package) are to deal with errors programmers do. So, if your function gets a pointer to an object, and that can't be nil (say, to indicate that the value is missing) just go on and dereference the pointer to make the runtime panic itself if it happens to be nil. If a function expects an integer that must be in a certain range, panic if it's not in that range — because in a correct program all values which might be passed to your function are in that range, and if they don't then either the programmer failed to obey the API or they did not sanitize the value acquired from the outside which, again, is not your fault.
On the other hand, problems like failure to open a file or pefrorm some other action your function is supposed to perform when called correctly should not cause panics and the function should return an appropriate error instead.
Note that the recommendation for explicit checking for null parameters in the functions of public APIs in .NET and Java code has different goal of making such kinds of errors sort-of more readable. But since 99% of .NET and Java code just lets all the exceptions propagate to the top level (and then be displayed or may be logged) it's just replacing one (generated by runtime) exception with another. It might make errors more obvious—the execution fails in the API function, not somewhere deeper down the call stack—but adds unnecessary cruft to these API functions. So yes, this is opinionated but my subjective opinion is: to let it just crash is OK in Go—you'll get a descriptive stack trace.
TL;DR
With regard to processing of run-time problems,
panics are for programming errors;
returning errors is for problems with carrying out the intended tasks of functions.
1 Another legitimate use for panics is quick "cold-path" returning from deep recursive processing/computation; in this case panic should be caught and processed by your package, and the corresponding public API functions should return errors. See this and this for more info.
The answer to this is subjective. Here are my thoughts:
Regarding panic, I like this quote from Go By Example (ref)
A panic typically means something went unexpectedly wrong. Mostly we use it to fail fast on errors that shouldn’t occur during normal operation, or that we aren’t prepared to handle gracefully.
In the description of your use case, I would argue that you should raise an errors and handle the errors. I would further argue that it is good practice to check the error status when one is provided by the function you are using and that the user should check if one is provided in the documentation.
Panics I would use to stop the execution if I run across an error that is returned from the function you are writing that I check and don't have a way to recover from.

If as assert fails, is there a bug?

I've always followed the logic: if assert fails, then there is a bug. Root cause could either be:
Assert itself is invalid (bug)
There is a programming error (bug)
(no other options)
I.E. Are there any other conclusions one could come to? Are there cases where an assert would fail and there is no bug?
If assert fails there is a bug in either the caller or callee. Why else would there be an assertion?
Yes, there is a bug in the code.
Code Complete
Assertions check for conditions that
should never occur. [...]
If an
assertion is fired for an anomalous
condition, the corrective action is
not merely to handle an error
gracefully- the corrective action is
to change the program's source code,
recompile, and release a new version
of the software.
A good way to
think of assertions is as executable
documentation - you can't rely on them
to make the code work, but they can
document assumptions more actively
than program-language comments can.
That's a good question.
My feeling is, if the assert fails due to your code, then it is a bug. The assertion is an expected behaviour/result of your code, so an assertion failure will be a failure of your code.
Only if the assert was meant to show a warning condition - in which case a special class of assert should have been used.
So, any assert should show a bug as you suggest.
If you are using assertions you're following Bertrand Meyer's Design by Contract philosophy. It's a programming error - the contract (assertion) you have specified is not being followed by the client (caller).
If you are trying to be logically inclusive about all the possibilities, remember that electronic circuitry is known to be affected by radiation from space. If the right photon/particle hits in just the right place at just the right time, it can cause an otherwise logically impossible state transition.
The probability is vanishingly small but still non-zero.
I can think of one case that wouldn't really class as a bug:
An assert placed to check for something external that normally should be there. You're hunting something nutty that occurs on one machine and you want to know if a certain factor is responsible.
A real world example (although from before the era of asserts): If a certain directory was hidden on a certain machine the program would barf. I never found any piece of code that should have cared if the directory was hidden. I had only very limited access to the offending machine (it had a bunch of accounting stuff on it) so I couldn't hunt it properly on the machine and I couldn't reproduce it elsewhere. Something that was done with that machine (the culprit was never identified) occasionally turned that directory hidden.
I finally resorted to putting a test in the startup to see if the directory was hidden and stopping with an error if it was.
No. An assertion failure means something happened that the original programmer did not intend or expect to occur.
This can indicate:
A bug in your code (you are simply calling the method incorrectly)
A bug in the Assertion (the original programmer has been too zealous and is complaining about you doing something that is quite reasonable and the method will actually handle perfectly well.
A bug in the called code (a design flaw). That is, the called code provides a contract that does not allow you to do what you need to do. The assertion warns you that you can't do things that way, but the solution is to extend the called method to handle your input.
A known but unimplemented feature. Imagine I implement a method that could process positive and negative integers, but I only need it (for now) to handle positive ones. I know that the "perfect" implementation would handle both, but until I actually need it to handle negatives, it is a waste of effort to implement support (and it would add code bloat and possibly slow down my application). So I have considered the case but I decide not to implement it until the need is proven. I therefore add an assert to mark this unimplemented code. When I later trigger the assert by passing a negative value in, I know that the additional functionality is now needed, so I must augment the implementation. Deferring writing the code until it is actually required thus saves me a lot of time (in most cases I never imeplement the additiona feature), but the assert makes sure that I don't get any bugs when I try to use the unimplemented feature.

How to avoid debugger-only variables?

I commonly place into variables values that are only used once after assignment. I do this to make debugging more convenient later, as I'm able to hover the value on the one line where it's later used.
For example, this code doesn't let you hover the value of GetFoo():
return GetFoo();
But this code does:
var foo = GetFoo();
return foo; // your hover-foo is great
This smells very YAGNI-esque, as the functionality of the foo's assignment won't ever be used until someone needs to debug its value, which may never happen. If it weren't for the merely foreseen debugging session, the first code snippet above keeps the code simpler.
How would you write the code to best compromise between simplicity and ease of debugger use?
I don't know about other debuggers, but the integrated Visual Studio debugger will report what was returned from a function in the "Autos" window; once you step over the return statement, the return value shows up as "[function name] returned" with a value of whatever value was returned.
gdb supports the same functionality as well; the "finish" command executes the rest of the current function and prints the return value.
This being a very useful feature, I'd be surprised if most other debuggers didn't support this capability.
As for the more general "problem" of "debugger-only variables," are they really debugger-only? I tend to think that the use of well-named temporary variables can significantly improve code readability as well.
Another possibility is to learn enough assembly programming that you can read the code your compiler generates. With that skill, you can figure out where the value is being held (in a register, in memory) and see the value without having to store it in a variable.
This skill is very useful if you are ever need to debug an optimized executable. The optimizer can generate code that is significantly different from how you wrote it such that symbolic debugging is not helpful.
Another reason why you don't need intermediate variables in the Visual Studio debugger is that you can evaluate the function in the Watch Window and the Immediate window. For the watch window, just simply highlight the statement you want evaluated and drag it into the window.
I'd argue that it's not worth worrying about. Given that there's no runtime overhead in the typical case, go nuts. I think that breaking down complex statements into multiple simple statements usually increases readability.
I would leave out the assignment until it is needed. If you never happen to be in that bit of code, wanting a look at that variable, you haven't cluttered up your code unnecessarily. When you run across the need, put it in (it should be a trivial Extract Variable refactoring). And when you're done with that debugging session, get rid of it (Inline Variable). If you find yourself debugging so much - and so much at that particular point - that you're weary of refactoring back and forth, then think about ways to avoid the need; maybe more unit tests would help.

How is debugging achieved in a lazy functional programming language?

I'd like to know how debugging is achieved in a lazy functional language.
Can you use breakpoints, print statements and traditional techniques? Is this even a good idea?
It is my understanding that pure functional programming does not allow side-effects, with the exception of monads.
Order of execution is also not guaranteed.
Would you have to program a monad for every section of code you want to test?
I'd like some insight into this question from someone more experienced in this area.
Nothing prevents you from using breakpoints in a lazily evaluated functional program. The difference to eager evaluation is when the program will stop at the breakpoint and what the trace will look like. The program will stop when the expression a breakpoint is set on is actually being reduced (obviously).
Instead of the stack trace you're used to you get the reductions that led up to the reduction of the expression with the breakpoint on it.
Small silly example. You have this Haskell program.
add_two x = 2 + x
times_two x = 2 * x
foo = times_two (add_two 42)
And you put a breakpoint on the first line (add_two), then evaluate foo. When the program stops on the breakpoint, in an eager language you'd expect to have a trace like
add_two
foo
and times_two hasn't even begun to be evaluated, but in the GHCi debugger you get
-1 : foo (debug.hs:5:17-26)
-2 : times_two (debug.hs:3:14-18)
-3 : times_two (debug.hs:3:0-18)
-4 : foo (debug.hs:5:6-27)
<end of history>
which is the list of reductions that led up to the reduction of the expression you put the breakpoint on. Note that it looks like times_two "called" foo even though it does not do so explicitly. You can see from this that the evaluation of 2 * x in times_two (-2) did force the evaluation of (add_two 42) (-1) from the foo line. From there you can perform a step as in an imperative debugger (perform the next reduction).
Another difference to debugging in an eager language is that variables may be not yet evaluated thunks. For example, at step -2 in the above trace and inspect x, you'll find it's still an unevaluated thunk (indicated by brackets in GHCi).
For far more detailed information and examples (how to step through the trace, inspect values, ...), see the GHCi Debugger section in the GHC manual. There's also the Leksah IDE which I haven't used yet as I'm a VIM and terminal user, but it has a graphical frontend to the GHCi debugger according to the manual.
You also asked for print statements. Only with pure functions, this is not so easily possible as a print statement would have to be within the IO monad. So, you have a pure function
foo :: Int -> Int
and wish to add a trace statement, the print would return an action in the IO monad and so you'd have to adjust the signature of the function you wish to put that trace statement in, and the signatures of the functions that call it, ...
This is not a good idea. So, you need some way to break purity to achieve trace statements. In Haskell, this can be done with unsafePerformIO. There's the Debug.Trace module that already has a function
trace :: String -> a -> a
which outputs the string and returns the second parameter. It would be impossible to write as a pure function (well, if you intend to really output the string, that is). It uses unsafePerformIO under the hood. You can put that into a pure function to output a trace print.
Would you have to program a monad for every section of code you want to test?
I'd suggest rather the opposite, make as many functions pure as possible (I'm assuming here you mean the IO monad for printing, monads are not necessarily impure). Lazy evaluation allows you to separate IO code from processing code very cleanly.
Whether imperative debugging techniques are a good idea or not depends on the situation (as usual). I find testing with QuickCheck/SmallCheck much more useful than unit testing in imperative languages, so I'd go that route first to avoid as much debugging as possible. QuickCheck properties actually make nice concise function specifications (a lot of test code in imperative languages looks like just another blob of code to me).
One trick to avoid a lot of debugging is to decompose the function into many smaller subfunctions and test as many of them as possible. This may be a bit unusal when coming from imperative programming, but it's a good habit no matter what language you're using.
Then again, debugging != testing and if something goes wrong somewhere, breakpoints and traces may help you out.
I don't think this topic can be dealt within a short space. Please read the papers available at the following links:
A Theory of Tracing Pure Functional Programs.
The Haskell Tracer publications.
Haskell Debugging Technologies.
I've never delved into anything terribly complicated in Haskell, but the fact that side effects are virtually gone has eliminated most of the need for debugging. Pure functions are extremely simple to test and verify without a debugger.
On the other hand, I did experience a couple times I needed to debug something within a monad, in which case I already was able to print/log/whatever.
At least for smaller programs or systems, debugging kind of goes out the window. Strong typing and static type-checking really further eliminate the traditional bugs you find in procedural programming. Most bugs, if any, are logical bugs (called the wrong functions, mathematical error, etc) -- very easy to test interactively.
From experience with Clojure (which is lazy, functional, and encourages but does not enforce purity):
You can set breakpoints just as with any other language. However, becuase of lazy evaluation, these might not get called immediately, but will be hit as soon as evaluation the lazy structure is forced.
In lazy functional languages that allow side effects (Clojure included) you can insert printlns and other debug logging relatively easily. I personally find these very useful. You have to be careful about when these get called because of laziness, but if you don't see the output at all it can be a hint that your code isn't being evaluated because of laziness.....
Having said all the above, I have never so far needed to resort to the debugger. Often a few simple tests (perhaps on the REPL) are enough to verify that functional code is working correctly, and if these fail then it's usually quite obvious what is going wrong.
Allow me to advertise a tool of my own to debug laziness problems. It helped me resolve in an hour a laziness-related memory leak that I already spent 2 days debugging.
http://www.haskell.org/pipermail/haskell-cafe/2012-January/098847.html
http://hackage.haskell.org/package/htrace

Resources