Simple regular expression substitution crashes on Windows using regex-compat - windows

The following code crashes when using GHC on Windows. It works perfectly on Linux.
Does this make any sense or is there a bug?
module Main where
import qualified Text.Regex as Re -- from regex-compat
import Debug.Trace
main :: IO ()
main = do
putStr $ cleanWord "jan"
putStr $ cleanWord "dec"
putStr $ cleanWord "jun" -- crashes here
cleanWord :: String -> String
cleanWord word_ =
let word = trace (show word_) word_ in
let re = Re.mkRegexWithOpts "(jun|jul|aug|sep|oct|nov|dec)" True False in
Re.subRegex re word ""
Some additional details
I'm building with stack
It crashes in both GHCi and running the compiled executable
I tried to enable profiling but can't seem to figure out how to get that to work correctly.

Shuffling the regex around avoids the bug in this case.
This works fine for me now
let re = Re.mkRegexWithOpts "(jul|aug|sep|oct|jun|nov|dec)" True False in
I.e. moving the jun towards the end of the regex. Not particularly satisfactory but it works and it means I can continue.
#Zeta's comment suggest that this is a bug with the underlying library
It's a bug, probably in regex-posix, due to its difference in the
inclusion of the underlying BSD regex library from 1994. In sslow, an
invalid memory address (-1(%rdx) with rdx = 0) will be accessed.

Related

How to make a change to the source code in Haskell using GHCi?

I am new to Haskell and am using GHCi to edit and run Haskell files. For some reason, I am not able to edit the source code of the file. The behavior I am getting is extremely odd.
Below is a screenshot of what is happening. I am loading the file lec3.hs and am attempting to edit this file to add the following function: myfun = \w -> not w. For some reason, this function successfully runs when I call it immediately after: myfun False. I do not need to reload the file.
It is clear that the function is not being added to the source code. When I reload the file, I am getting an error stating that myfun does not exist.
Could someone help me understand why GHCi is behaving this way, and how to fix such behaviour? I have already spent an hour trying to figure this out. I would sincerely appreciate any help.
Typing things into GHCi isn't supposed to add them to the source. But if you have loaded a file into GHCi you can edit it using the :e command, and when you close the editor it will be automatically reloaded.
You can use :e filename.hs if you are dealing with more than one file and you need to specify.
Generally it's easier to work in a separate editor and just reload into GHCi with :r, but :e is occasionally useful.
To answer
Is it possible to edit a .hs file from GHCi?
It technically speaking is possible, since Haskell has like any other language file-IO operations. Concretely, appendFile allows you to add content to a file.
$ cat >> Foo.hs # Creating a simple Haskell file
foo :: Int
foo = 3
$ ghci Foo.hs
GHCi, version 8.2.2: http://www.haskell.org/ghc/ :? for help
Loaded GHCi configuration from /home/sagemuej/.ghc/ghci.conf
Loaded GHCi configuration from /home/sagemuej/.ghci
[1 of 1] Compiling Main ( Foo.hs, interpreted )
Ok, one module loaded.
*Main> foo
3
*Main> appendFile "Foo.hs" $ "myfun = \\w -> not w"
*Main> :r
[1 of 1] Compiling Main ( Foo.hs, interpreted )
Ok, one module loaded.
*Main> myfun False
True
But this really is not a good way to edit files. It makes much more sense to have two windows open, one with a text editor and one with the REPL, i.e. with GHCi. It can be either two completely separate OS windows, or two sub-windows of your IDE or whatever, but in either case GHCi will be used only for evaluation and one-line prototyping, not for actually adding/editing code. (The one-line prototypes can be copy&pasted into the editor, if it helps.)

The optimal way to set a breakpoint in the Python source code while debugging CPython by GDB

I use GDB to understanding how CPython executes the test.py source file and I want to stop the CPython when it starts the execution of opcode I am interested.
OS: Ubuntu 18.04.2 LTS
Debugger: GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
The first problem - many CPython's .py own files are executed before my test.py gets its turn, so I can't just break at the _PyEval_EvalFrameDefault - there are many of them, so I should distinguish my file from others.
The second problem - I can't set the condition like "when the filename is equal to the test.py", because the filename is not a simple C string, it is the CPython's Unicode object, so the standard GDB string functions can't be used for comparing.
At this moment I do the next trick for breaking the execution at the needed line of test.py source:
For example, I have the source file:
x = ['a', 'b', 'c']
# I want to set the breakpoint at this line.
for e in x:
print(e)
I add the binary left shift operator to the code:
x = ['a', 'b', 'c']
# Added for breakpoint
a = 12
b = 2 << a
for e in x:
print(e)
And then, track the BINARY_LSHIFT opcode execution in the Python/ceval.c file by this GDB command:
break ceval.c:1327
I have chosen the BINARY_LSHIFT opcode, because of its seldom usage in the code. Thus, I can reach the needed part of .py file quickly - it happens once in the all other .py modules executed before my test.py.
I look the more straightforward way of doing the same, so
the questions:
Can I catch the moment the test.py starts executing? I should mention, what the test.py filename is appearing on different stages: parsing, compilation, execution. So, it also will be good to can break the CPython execution at the any stage.
Can I specify the line of the test.py, where I want to break? It is easy for .c files, but is not for .py files.
My idea would be to use a C-extension, to make setting C-breakpoints possible in a python-script (similar to pdb.set_trace() or breakpoint() since Python3.7), which I will call cbreakpoint.
Consider the following python-script:
#example.py
from cbreakpoint import cbreakpoint
cbreakpoint(breakpoint_id=1)
print("hello")
cbreakpoint(breakpoint_id=2)
It could be used as follows in gdb:
>>> gdb --args python example.py
[gdb] b cbreakpoint
[gdb] run
Now, the debuger would stops at cbreakpoint(breakpoint_id=1) and cbreakpoint(breakpoint_id=2).
Here is proof of concept, written in Cython to avoid the otherwise needed boilerplate-code:
#cbreakpoint.pyx
cdef extern from *:
"""
long long last_breakpoint_id = -1;
void cbreakpoint(long long breakpoint_id){
last_breakpoint_id = breakpoint_id;
}
"""
void c_cbreakpoint "cbreakpoint"(long long breakpoint_id)
def cbreakpoint(breakpoint_id = 0):
c_cbreakpoint(breakpoint_id)
which can be build inplace via:
cythonize -i cbreakpoint.pyx
If Cython isn't installed, I have uploaded a version which doesn't depend on Cython (too much code for this post) on github.
It is also possible to break conditionally, given the breakpoint_id, i.e.:
>>> gdb --args python example.py
[gdb] break src/cbreakpoint.c:595 if breakpoint_id == 2
[gdb] run
will break only after hello was printed - at cbreakpoint with id=2 (while cbreakpoint with id=1 will be skipped). Depending on Cython version the line can vary, but can be found out once gdb stops at cbreakpoint.
It would also do something similar without any additional modules:
add breakpoint or import pdb; pdb.set_trace() instead of cbreakpoint
gdb --args python example.py + run
When pdb interrupts the program, hit Ctrl+C in order to interrupt in gdb.
Activate breakpoints in gdb.
continue in gdb and then in pdb (i.e. c+enter twice).
A small problem is, that after that the breakpoints might be hit while in pdb, so the first method is a little bit more robust.

Determine compiler name/version from gdb

I share my .gdbinit script (via NFS) across machines running different versions of gcc. I would like some gdb commands to be executed if the code I am debugging has been compiled with a specific compiler version. Can gdb do that?
I came up with this:
define hook-run
python
from subprocess import Popen, PIPE
from re import search
# grab the executable filename from gdb
# this is probably not general enough --
# there might be several objfiles around
objfilename = gdb.objfiles()[0].filename
# run readelf
process = Popen(['readelf', '-p', '.comment', objfilename], stdout=PIPE)
output = process.communicate()[0]
# match the version number with a
regex = 'GCC: \(GNU\) ([\d.]+)'
match=search(regex, output)
if match:
compiler_version = match.group(1)
gdb.execute('set $compiler_version="'+str(compiler_version)+'"')
gdb.execute('init-if-undefined $compiler_version="None"')
# do what you want with the python compiler_version variable and/or
# with the $compiler_version convenience variable
# I use it to load version-specific pretty-printers
end
end
It is good enough for my purpose, although it is probably not general enough.

How to debug Haskell code?

I have a problem. I wrote a big Haskell program, and it always works with small input. Now, when I want to test it and generate a bigger input, I always get the message:
HsProg: Prelude.head: empty list
I use Prelude.head many times. What can I do to find out more or get a better error output to get the code line in which it happens?
The GHCi option -fbreak-on-exception can be useful. Here's an example debugging session. First we load our file into GHCi.
$ ghci Broken.hs
GHCi, version 7.0.2: http://www.haskell.org/ghc/ :? for help
Loading package ghc-prim ... linking ... done.
Loading package integer-gmp ... linking ... done.
Loading package base ... linking ... done.
Loading package ffi-1.0 ... linking ... done.
[1 of 1] Compiling Main ( Broken.hs, interpreted )
Ok, modules loaded: Main.
Now, we turn on -fbreak-on-exceptions and trace our expression (main in this case for the whole program).
*Main> :set -fbreak-on-exception
*Main> :trace main
Stopped at <exception thrown>
_exception :: e = _
We've stopped at an exception. Let's try to look at the code with :list.
[<exception thrown>] *Main> :list
Unable to list source for <exception thrown>
Try :back then :list
Because the exception happened in Prelude.head, we can't look at the source directly. But as GHCi informs us, we can go :back and try to list what happened before in the trace.
[<exception thrown>] *Main> :back
Logged breakpoint at Broken.hs:2:23-42
_result :: [Integer]
[-1: Broken.hs:2:23-42] *Main> :list
1
2 main = print $ head $ filter odd [2, 4, 6]
3
In the terminal, the offending expression filter odd [2, 4, 6] is highlighted in bold font. So this is the expression that evaluated to the empty list in this case.
For more information on how to use the GHCi debugger, see the GHC User's Guide.
You may want to take a look at Haskell Wiki - Debugging, which contains many useful approaches to your problem.
One promising tool is LocH, which would would help you locate the head invocation in your code which triggered the empty list error.
Personally, I recommend the safe package, which allows to annotate most partial functions from the Prelude (and thus leads to a more conscious use of those partial functions) or better yet, use the total variants of functions such as head which always return a result (if the input value was defined at least).
Since GHC 8, you can use the GHC.Stack module or some profiling compiler flags detailed on a Simon's blog.
You can use this library:
Debug.Trace
You can replace any value a with the function:
trace :: String -> a -> a
unlike putStrLn there is no IO in the output, e.g.:
>>> let x = 123; f = show
>>> trace ("calling f with x = " ++ show x) (f x)
calling f with x = 123
123
The trace function should only be used for debugging, or for monitoring execution. The function is not referentially transparent: its type indicates that it is a pure function but it has the side effect of outputting the trace message.

Is there a quick-starting Haskell interpreter suitable for scripting?

Does anyone know of a quick-starting Haskell interpreter that would be suitable for use in writing shell scripts? Running 'hello world' using Hugs took 400ms on my old laptop and takes 300ms on my current Thinkpad X300. That's too slow for instantaneous response. Times with GHCi are similar.
Functional languages don't have to be slow: both Objective Caml and Moscow ML run hello world in 1ms or less.
Clarification: I am a heavy user of GHC and I know how to use GHCi. I know all about compiling to get things fast. Parsing costs should be completely irrelevant: if ML and OCaml can start 300x faster than GHCi, then there is room for improvement.
I am looking for
The convenience of scripting: one source file, no binary code, same code runs on all platforms
Performance comparable to other interpreters, including fast startup and execution for a simple program like
module Main where
main = print 33
I am not looking for compiled performance for more serious programs. The whole point is to see if Haskell can be useful for scripting.
Why not create a script front-end that compiles the script if it hasn't been before or if the compiled version is out of date.
Here's the basic idea, this code could be improved a lot--search the path rather then assuming everything's in the same directory, handle other file extensions better, etc. Also i'm pretty green at haskell coding (ghc-compiled-script.hs):
import Control.Monad
import System
import System.Directory
import System.IO
import System.Posix.Files
import System.Posix.Process
import System.Process
getMTime f = getFileStatus f >>= return . modificationTime
main = do
scr : args <- getArgs
let cscr = takeWhile (/= '.') scr
scrExists <- doesFileExist scr
cscrExists <- doesFileExist cscr
compile <- if scrExists && cscrExists
then do
scrMTime <- getMTime scr
cscrMTime <- getMTime cscr
return $ cscrMTime <= scrMTime
else
return True
when compile $ do
r <- system $ "ghc --make " ++ scr
case r of
ExitFailure i -> do
hPutStrLn stderr $
"'ghc --make " ++ scr ++ "' failed: " ++ show i
exitFailure
ExitSuccess -> return ()
executeFile cscr False args Nothing
Now we can create scripts such as this (hs-echo.hs):
#! ghc-compiled-script
import Data.List
import System
import System.Environment
main = do
args <- getArgs
putStrLn $ foldl (++) "" $ intersperse " " args
And now running it:
$ time hs-echo.hs "Hello, world\!"
[1 of 1] Compiling Main ( hs-echo.hs, hs-echo.o )
Linking hs-echo ...
Hello, world!
hs-echo.hs "Hello, world!" 0.83s user 0.21s system 97% cpu 1.062 total
$ time hs-echo.hs "Hello, world, again\!"
Hello, world, again!
hs-echo.hs "Hello, world, again!" 0.01s user 0.00s system 60% cpu 0.022 total
Using ghc -e is pretty much equivalent to invoking ghci. I believe that GHC's runhaskell compiles the code to a temporary executable before running it, as opposed to interpreting it like ghc -e/ghci, but I'm not 100% certain.
$ time echo 'Hello, world!'
Hello, world!
real 0m0.021s
user 0m0.000s
sys 0m0.000s
$ time ghc -e 'putStrLn "Hello, world!"'
Hello, world!
real 0m0.401s
user 0m0.031s
sys 0m0.015s
$ echo 'main = putStrLn "Hello, world!"' > hw.hs
$ time runhaskell hw.hs
Hello, world!
real 0m0.335s
user 0m0.015s
sys 0m0.015s
$ time ghc --make hw
[1 of 1] Compiling Main ( hw.hs, hw.o )
Linking hw ...
real 0m0.855s
user 0m0.015s
sys 0m0.015s
$ time ./hw
Hello, world!
real 0m0.037s
user 0m0.015s
sys 0m0.000s
How hard is it to simply compile all your "scripts" before running them?
Edit
Ah, providing binaries for multiple architectures is a pain indeed. I've gone down that road before, and it's not much fun...
Sadly, I don't think it's possible to make any Haskell compiler's startup overhead any better. The language's declarative nature means that it's necessary to read the entire program first even before trying to typecheck anything, nevermind execution, and then you either suffer the cost of strictness analysis or unnecessary laziness and thunking.
The popular 'scripting' languages (shell, Perl, Python, etc.) and the ML-based languages require only a single pass... well okay, ML requires a static typecheck pass and Perl has this amusing 5-pass scheme (with two of them running in reverse); either way, being procedural means that the compiler/interpreter has a lot easier of a job assembling the bits of the program together.
In short, I don't think it's possible to get much better than this. I haven't tested to see if Hugs or GHCi has a faster startup, but any difference there is still faaar away from non-Haskell languages.
If you are really concerned with speed you are going to be hampered by re-parsing the code for every launch. Haskell doesn't need to be run from an interpreter, compile it with GHC and you should get excellent performance.
You have two parts to this question:
you care about performance
you want scripting
If you care about performance, the only serious option is GHC, which is very very fast: http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all
If you want something light for Unix scripting, I'd use GHCi. It is about 30x faster than Hugs, but also supports all the libraries on hackage.
So install GHC now (and get GHCi for free).
What about having a ghci daemon and a feeder script that takes the script path and location, communicates with the already running ghci process to load and execute the script in the proper directory and pipes the output back to the feeder script for stdout?
Unfortunately, I have no idea how to write something like this, but it seems like it could be really fast judging by the speed of :l in ghci. As it seems most of the cost in runhaskell is in starting up ghci, not parsing and running the script.
Edit: After some playing around, I found the Hint package (a wrapper around the GHC API) to be of perfect use here. The following code will load the passed in module name (here assumed to be in the same directory) and will execute the main function. Now 'all' that's left is to make it a daemon, and have it accept scripts on a pipe or socket.
import Language.Haskell.Interpreter
import Control.Monad
run = runInterpreter . test
test :: String -> Interpreter ()
test mname =
do
loadModules [mname ++ ".hs"]
setTopLevelModules [mname]
res <- interpret "main" (as :: IO())
liftIO res
Edit2: As far as stdout/err/in go, using this specific GHC trick It looks like it would be possible to redirect the std's of the client program into the feeder program, then into some named pipes (perhaps) that the daemon is connected to at all times, and then have the daemon's stdout back to another named pipe that the feeder program is listening to. Pseudo-example:
grep ... | feeder my_script.hs | xargs ...
| ^---------------- <
V |
named pipe -> daemon -> named pipe
Here the feeder would be a small compiled harness program to just redirect the std's into and then back out of the daemon and give the name and location of the script to the daemon.

Resources