Suppose a large and complicated Haskell program produces a NaN sometime during the execution. How do I find where in my code this happened without spending a lot of time adding lots of NaN checks to my code? I'm only interested in debugging, so I don't care about portability or performance.
This was discussed five years ago on Haskell-cafe. A possible solution was proposed, but wasn't discussed further.
https://mail.haskell.org/pipermail/haskell-cafe/2011-May/091858.html
Below is my attempt to get a stack trace to the point where a NaN is generated (in a small example program) by using feenableexcept as suggested in the Haskell-cafe discussion:
-- https://www.gnu.org/software/libc/manual/html_node/Control-Functions.html
foreign import ccall "feenableexcept" enableFloatException :: Int -> IO Int
allFloatExceptions :: Int
allFloatExceptions = 1 {-INVALID-} + 4 {-DIVBYZERO-} + 8 {-OVERFLOW-} + 16 {-UNDERFLOW-}
main :: IO ()
main = do
_ <- enableFloatException allFloatExceptions
print $ (0/0 :: Double)
Unfortunately, running this code doesn't produce a stack trace :(
$ ghc -rtsopts -prof -fprof-auto testNaN.hs && ./testNaN +RTS -xc
[1 of 1] Compiling Main ( testNaN.hs, testNaN.o )
Linking testNaN ...
Floating point exception (core dumped)
I assume (but really have no idea) that I don't get a stack trace because the GHC runtime wasn't in control when the exception aborted the program. So next I tried using installHandler from System.Posix.Signals to try to make the program crash within GHC's runtime:
import qualified System.Posix.Signals as Signals
-- https://www.gnu.org/software/libc/manual/html_node/Control-Functions.html
foreign import ccall "feenableexcept" enableFloatException :: Int -> IO Int
allFloatExceptions :: Int
allFloatExceptions = 1 {-INVALID-} + 4 {-DIVBYZERO-} + 8 {-OVERFLOW-} + 16 {-UNDERFLOW-}
catchFloatException :: IO ()
catchFloatException = error "print stack trace?"
main :: IO ()
main = do
_ <- enableFloatException allFloatExceptions
_ <- Signals.installHandler Signals.floatingPointException (Signals.Catch catchFloatException) Nothing
print $ (0/0 :: Double)
Unfortunately, this results in a more mysterious error, and still doesn't give me a stack trace :(
$ ghc -rtsopts -prof -fprof-auto testNaN.hs && ./testNaN +RTS -xc
[1 of 1] Compiling Main ( testNaN.hs, testNaN.o )
Linking testNaN ...
testNaN: too many pending signals
I also tried using multiple threads. Most of the time the following happens.
$ ghc -rtsopts -prof -fprof-auto -threaded testNaN.hs && ./testNaN +RTS -xc -N2
[1 of 1] Compiling Main ( testNaN.hs, testNaN.o )
Linking testNaN ...
testNaN: lost signal due to full pipe: 8
testNaN: lost signal due to full pipe: 8
testNaN: lost signal due to full pipe: 8
... repeat many many times, very very fast
Although, once this happened.
http://pastebin.com/u3u2cnHE
Am I taking the right approach? Is there a way to modify my example so that I can get a stack trace?
Related
I'm trying to debug why ngspice prints annoying newlines to stderr while running a simulation. I'm trying to locate it in one of the 2400 source files tracing back to 1993 but it's not as easy as it sounds! It does however mean that I have a binary with all debug information embedded.
My first idea was that strace could help me locate what I believe is the offending call and trace it back to the source code. For example, I'm pretty sure that this is the offending syscall:
brk(0x55d1a84e9000) = 0x55d1a84e9000
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=61462905}) = 0
>> write(2, "\n", 1) = 1
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=26269}, ru_stime={tv_sec=0, tv_usec=35243}, ...}) = 0
openat(AT_FDCWD, "/proc/self/statm", O_RDONLY) = 3
I had hoped that if I traced an executable that had debug information, strace would show me the place in the source code, but that did not happen automatically and the manual is a little overwhelming.
I found a section in the manual called Tracing but couldn't find anything specific.
Is it possible with strace, and if so: How? If not, do you have any other suggestions?
Obvious in hindsight, but one very useful flag is -k. From the man-page:
-k Print the execution stack trace of the traced processes after each system call.
This needs a binary with debug information, and it will get extremely noisy, but combined with a simple filter (-e write in this case) you will eventually get something that looks like this:
write(2, "\n", 1
) = 1
> /lib/x86_64-linux-gnu/libc-2.28.so(__write+0x14) [0xea504]
> /lib/x86_64-linux-gnu/libc-2.28.so(_IO_file_write+0x2d) [0x7b3bd]
> /lib/x86_64-linux-gnu/libc-2.28.so(_IO_file_setbuf+0xef) [0x7a75f]
> /lib/x86_64-linux-gnu/libc-2.28.so(_IO_do_write+0x19) [0x7c509]
> /lib/x86_64-linux-gnu/libc-2.28.so(_IO_file_overflow+0x103) [0x7c8f3]
> /home/pipe/src/ngspice/debug/src/ngspice(OUTendPlot+0x1ae) [0xd7643]
> /home/pipe/src/ngspice/debug/src/ngspice(DCop+0x167) [0x4cd788]
> /home/pipe/src/ngspice/debug/src/ngspice(CKTdoJob+0x428) [0x4c70dd]
> /home/pipe/src/ngspice/debug/src/ngspice(if_run+0x3b9) [0xe5d3e]
> /home/pipe/src/ngspice/debug/src/ngspice(dosim+0x428) [0xe02ee]
From this I could eventually find the right place after tracking some function inline optimizations.
Using gdb, you can set conditional syscall catchpoints based on the args to the system call (analogous to the way you'd set conditional breakpoints on entry to a function based on the args to the function call). Then, when the catchpoint is triggered, you can see where the caller is (file name, line number, and source code).
Here's an example for x86_64.
$ cat gtest.c
#include <unistd.h>
int main()
{
write(1, "text\n", 5);
write(2, "text2\n", 6);
write(2, "\n", 1);
return 0;
}
$ cc gtest.c -g -o gtest
$ gdb -q gtest
Reading symbols from gtest...done.
(gdb) list
1 #include <unistd.h>
2 int main()
3 {
4 write(1, "text\n", 5);
5 write(2, "text2\n", 6);
6 write(2, "\n", 1);
7 return 0;
8 }
(gdb) catch syscall write
Catchpoint 1 (syscall 'write' [1])
(gdb) condition 1 $rdi == 2 && *(char *)$rsi == '\n' && $rdx == 1
(gdb) r
Starting program: /home/mp/gtest
text
text2
Catchpoint 1 (call to syscall write), 0x00007fffff13b970 in __write_nocancel ()
at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007fffff13b970 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:84
#1 0x00000000080006f6 in main () at gtest.c:6
(gdb) up
#1 0x00000000080006f6 in main () at gtest.c:6
6 write(2, "\n", 1);
(gdb)
My first idea was that strace could help me locate what I believe is the offending call and trace it back to the source code.
You guessed right, but must have overlooked this in the strace manual page:
-i Print the instruction pointer at the time of the system call.
I took an example of data transfer between Host and Device for CUDA Fortran and found this:
Host Code:
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
endif
end program incTest
Device Code:
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
The expected outcome is the console presenting "Success", but this did not happen. Nothing appears in the screen, nothing errors or messages.
This happen because don't enter in if, because a_d has the same value that before call inc subroutine.
I'm using:
OS: Linux - Ubuntu 16
Cuda 8
PGI to compile
Commands to compile:
pgf90 -Mcuda -c Device.cuf
pgf90 -Mcuda -c Host.cuf
pgf90 -Mcuda -o HostDevice Device.o Host.o
./HostDevice
I tried other examples and they did not work too.
I tried using simple Fortran (.f90) code with the same commands to compile and it works!
How can I fix this problem?
What type of device are you using? (If you don't know, post the output from the "pgaccelinfo" utility).
My best guess is that you have a Pascal based device in which case you need to compile with "-Mcuda=cc60".
For example, if I add error checking to the example code, we see that we get an invalid device kernel error when running on a Pascal without the "cc60" as part of the compilation.
% cat test.cuf
module simpleOps_m
contains
attributes(global) subroutine inc(a, b)
implicit none
integer :: a(:)
integer, value :: b
integer :: i
i = threadIdx%x
a(i) = a(i)+b
end subroutine inc
end module simpleOps_m
program incTest
use cudafor
use simpleOps_m
implicit none
integer, parameter :: n = 256
integer :: a(n), b, i, istat
integer, device :: a_d(n)
a = 1
b = 3
a_d = a
call inc<<<1,n>>>(a_d, b)
istat=cudaDeviceSynchronize()
istat=cudaGetLastError()
a = a_d
if (all(a == 4)) then
write(*,*) 'Success'
else
write(*,*) 'Error code:', cudaGetErrorString(istat)
endif
end program incTest
% pgf90 test.cuf -Mcuda
% a.out
Error code:
invalid device function
% pgf90 test.cuf -Mcuda=cc60
% a.out
Success
So I have the following code from Preventing caching of computation in Criterion benchmark and my aim is to be able to step from main directly into the function defaultMain in Criterion.Main :
{-# OPTIONS -fno-full-laziness #-}
{-# OPTIONS_GHC -fno-cse #-}
{-# LANGUAGE BangPatterns #-}
module Main where
import Criterion.Main
import Data.List
num :: Int
num = 100000000
lst :: a -> [Int]
lst _ = [1,2..num]
myadd :: Int -> Int -> Int
myadd !x !y = let !result = x + y in
result
mysum = foldl' myadd 0
main :: IO ()
main = defaultMain [
bgroup "summation"
[bench "mysum" $ whnf (mysum . lst) ()]
]
and the cabal file is :
name: test
version: 0.1.0.0
build-type: Simple
cabal-version: >=1.10
executable test
main-is: Main.hs
build-depends: base >=4.8 && <4.9,
criterion==1.1.0.0
default-language: Haskell2010
ghc-options: "-O3"
(using ghc 7.10.1 and cabal 1.22.0.0).
If from within cabal repl I try to set a breakpoint in criterion I get the following error :
*Main> :break Criterion.Main.defaultMain
cannot set breakpoint on defaultMain: module Criterion.Main is not interpreted
Furthermore if I try to add the package I get the following error :
*Main> :add *Criterion
<no location info>: module ‘Criterion’ is a package module
Failed, modules loaded: Main.
If I do within the directory git clone https://github.com/bos/criterion
and then add the following two lines to my cabal file :
other-modules: Criterion
hs-source-dirs: .
./criterion
then upon doing cabal build I get the following errors :
criterion/Criterion/IO.hs:23:0:
error: missing binary operator before token "("
#if MIN_VERSION_binary(0, 6, 3)
so I suspect that I have to do a full on merge of the criterion cabal
file with my cabal file above, which feels a bit excessive.
Is there an easier way for me to go about setting a breakpoint
in Criterion, so that I can step (when debugging in cabal repl/ghci) directly from my source into criterion's source? Thanks
p.s. There is a related question at Debugging IO in a package module inside GHCi but unfortunately it did not help.
This is how I managed to achieve the desired goal of being able to step (within cabal repl) from my code into the criterion source :
First do :
mkdir /tmp/testCrit
cd /tmp/testCrit
Download criterion-1.1.0.0.tar.gz
Unzip into /tmp/testCrit, so we should have /tmp/testCrit/criterion-1.1.0.0. In this directory we have Criterion.hs etc.
Then jump into the folder containing the criterion source and do :
cd /tmp/testCrit/criterion-1.1.0.0
cabal sandbox init
cabal install -j
Note that this creates a directory : /tmp/testCrit/criterion-1.1.0.0/dist/dist-sandbox-782e42f0/build/autogen which we shall use later
Back in /tmp/testCrit create a Main.hs file containing the benchmark code above and also the cabal file above, but merge it with the criterion cabal file contained in /tmp/testCrit/criterion-1.1.0.0 in the following way. Note the main new additions are the lines :
cc-options: -fPIC
which allows one to run it in cabal repl, and the following
lines :
hs-source-dirs:
./
./criterion-1.1.0.0
./criterion-1.1.0.0/dist/dist-sandbox-782e42f0/build/autogen
The full cabal file should then look like :
name: test
version: 0.1.0.0
build-type: Simple
cabal-version: >=1.10
executable test
main-is: Main.hs
build-depends:
base >=4.8 && <4.9,
aeson >= 0.8,
ansi-wl-pprint >= 0.6.7.2,
base >= 4.5 && < 5,
binary >= 0.5.1.0,
bytestring >= 0.9 && < 1.0,
cassava >= 0.3.0.0,
containers,
deepseq >= 1.1.0.0,
directory,
filepath,
Glob >= 0.7.2,
hastache >= 0.6.0,
mtl >= 2,
mwc-random >= 0.8.0.3,
optparse-applicative >= 0.11,
parsec >= 3.1.0,
statistics >= 0.13.2.1,
text >= 0.11,
time,
transformers,
transformers-compat >= 0.4,
vector >= 0.7.1,
vector-algorithms >= 0.4
default-language: Haskell2010
ghc-options: "-O3"
c-sources:
./criterion-1.1.0.0/cbits/cycles.c
./criterion-1.1.0.0/cbits/time-posix.c
hs-source-dirs:
./
./criterion-1.1.0.0
./criterion-1.1.0.0/dist/dist-sandbox-782e42f0/build/autogen
cc-options: -fPIC
Then in the main directory do :
cd /tmp/testCrit/
cabal sandbox init
cabal install -j
Then we can spin up a cabal repl and step directly into
criterion from our Main.hs code :
*Main> :break Criterion.Main.defaultMain
Breakpoint 0 activated at criterion-1.1.0.0/Criterion/Main.hs:79:15-43
*Main> main
Stopped at criterion-1.1.0.0/Criterion/Main.hs:79:15-43
_result :: [Benchmark] -> IO () = _
[criterion-1.1.0.0/Criterion/Main.hs:79:15-43] *Main> :step
Stopped at criterion-1.1.0.0/Criterion/Main.hs:(131,1)-(147,39)
_result :: IO () = _
[criterion-1.1.0.0/Criterion/Main.hs:(131,1)-(147,39)] *Main> :step
Stopped at criterion-1.1.0.0/Criterion/Main.hs:(131,29)-(147,39)
_result :: IO () = _
bs :: [Benchmark] = [_]
defCfg :: Criterion.Types.Config = _
[criterion-1.1.0.0/Criterion/Main.hs:(131,29)-(147,39)] *Main> :step
Stopped at criterion-1.1.0.0/Criterion/Main.hs:132:10-37
_result :: IO Criterion.Main.Options.Mode = _
defCfg :: Criterion.Types.Config = _
I am trying out haskell's kafka library from git and got this error.
To debug this error, i like to print stacktrace at the error line.
In python world, it is just,
import traceback; print traceback.print_exc()
(or) in java, it is
e.printStackTrace()
So, how to do the same in haskell world?
You can get stack traces in Haskell but it is not as convenient as just e.printStackTrace(). Here is a minimal example:
import Control.Exception
import Debug.Trace
getStack :: String -> SomeException -> IO a
getStack msg e = traceStack (show e) $ error msg
main :: IO ()
main = do
(head []) `catch` (getStack "error on main at head")
Finally, compile it with ghc -prof -fprof-auto StackTrace.hs and it will produce
Prelude.head: empty list
Stack trace:
Main.getStack (StackTrace.hs:5:9-56)
Main.main (StackTrace.hs:(8,9)-(9,74))
GHC.List.CAF (<entire-module>)
StackTrace.exe: error on main at head
I've tried this:
main = do
hSetBuffering stdin NoBuffering
c <- getChar
but it waits until the enter is pressed, which is not what I want. I want to read the character immediately after user presses it.
I am using ghc v6.12.1 on Windows 7.
EDIT: workaround for me was moving from GHC to WinHugs, which supports this correctly.
Yes, it's a bug. Here's a workaround to save folks clicking and scrolling:
{-# LANGUAGE ForeignFunctionInterface #-}
import Data.Char
import Foreign.C.Types
getHiddenChar = fmap (chr.fromEnum) c_getch
foreign import ccall unsafe "conio.h getch"
c_getch :: IO CInt
So you can replace calls to getChar with calls to getHiddenChar.
Note this is a workaround just for ghc/ghci on Windows. For example, winhugs doesn't have the bug and this code doesn't work in winhugs.
Might be a bug:
http://hackage.haskell.org/trac/ghc/ticket/2189
The following program repeats inputted characters until the escape key is pressed.
import IO
import Monad
import Char
main :: IO ()
main = do hSetBuffering stdin NoBuffering
inputLoop
inputLoop :: IO ()
inputLoop = do i <- getContents
mapM_ putChar $ takeWhile ((/= 27) . ord) i
Because of the hSetBuffering stdin NoBuffering line it should not be necessary to press the enter key between keystrokes. This program works correctly in WinHugs (sep 2006 version). However, GHC 6.8.2 does not repeat the characters until the enter key is pressed. The problem was reproduced with all GHC executables (ghci, ghc, runghc, runhaskell), using both cmd.exe and command.com on Windows XP Professional...
Hmm.. Actually I can't see this feature to be a bug. When you read stdin that means that you want to work with a "file" and when you turn of buffering you are saying that there is no need for read buffer. But that doesn't mean that application which is emulating that "file" should not use write buffer. For linux if your terminal is in "icanon" mode it doesn't send any input until some special event will occur (like Enter pressed or Ctrl+D). Probably console in Windows have some similar modes.
The Haskeline package worked for me.
If you need it for individual characters, then just change the sample slightly.
getInputLine becomes getInputChar
"quit" becomes 'q'
++ input becomes ++ [input]
main = runInputT defaultSettings loop
where
loop :: InputT IO ()
loop = do
minput <- getInputChar "% "
case minput of
Nothing -> return ()
Just 'q' -> return ()
Just input -> do outputStrLn $ "Input was: " ++ [input]
loop
From comment of #Richard Cook:
Use hidden-char: Provides cross-platform getHiddenChar function.
I used the haskeline package, suggested in other answers, to put together this simple alternative to getChar. It requests input again in the case that getInputChar returns Nothing. This worked for me to get past the issue; modify as needed.
import System.Console.Haskeline
( runInputT
, defaultSettings
, getInputChar
)
betterInputChar :: IO Char
betterInputChar = do
mc <- runInputT defaultSettings (getInputChar "")
case mc of
Nothing -> betterInputChar
(Just c) -> return c