On a fresh install of Haskell Platform for Max OSX, the following code fails on import Test.HUnit when run using the runghc interpreter.
{--
- Save this file as Main.hs and run with % runghc Main.hs
-}
module Main where
import Test.HUnit
derp = test [ "a silly test" ~: 'a' ~=? 'a' ]
tests = TestList [ derp ]
main::IO()
main = (runTestTT tests) >>= (\x -> putStrLn $ show x)
However, when using ghci, doing a simple import Test.HUnit works just fine.
How can I resolve this discrepancy between ghc from the command line and the ghci REPL?
Reboot the machine. Not sure whether it's paths not containing the right directories or something else, but a fresh start after install seems to have resolved it.
Related
I use GDB to understanding how CPython executes the test.py source file and I want to stop the CPython when it starts the execution of opcode I am interested.
OS: Ubuntu 18.04.2 LTS
Debugger: GNU gdb (Ubuntu 8.1-0ubuntu3) 8.1.0.20180409-git
The first problem - many CPython's .py own files are executed before my test.py gets its turn, so I can't just break at the _PyEval_EvalFrameDefault - there are many of them, so I should distinguish my file from others.
The second problem - I can't set the condition like "when the filename is equal to the test.py", because the filename is not a simple C string, it is the CPython's Unicode object, so the standard GDB string functions can't be used for comparing.
At this moment I do the next trick for breaking the execution at the needed line of test.py source:
For example, I have the source file:
x = ['a', 'b', 'c']
# I want to set the breakpoint at this line.
for e in x:
print(e)
I add the binary left shift operator to the code:
x = ['a', 'b', 'c']
# Added for breakpoint
a = 12
b = 2 << a
for e in x:
print(e)
And then, track the BINARY_LSHIFT opcode execution in the Python/ceval.c file by this GDB command:
break ceval.c:1327
I have chosen the BINARY_LSHIFT opcode, because of its seldom usage in the code. Thus, I can reach the needed part of .py file quickly - it happens once in the all other .py modules executed before my test.py.
I look the more straightforward way of doing the same, so
the questions:
Can I catch the moment the test.py starts executing? I should mention, what the test.py filename is appearing on different stages: parsing, compilation, execution. So, it also will be good to can break the CPython execution at the any stage.
Can I specify the line of the test.py, where I want to break? It is easy for .c files, but is not for .py files.
My idea would be to use a C-extension, to make setting C-breakpoints possible in a python-script (similar to pdb.set_trace() or breakpoint() since Python3.7), which I will call cbreakpoint.
Consider the following python-script:
#example.py
from cbreakpoint import cbreakpoint
cbreakpoint(breakpoint_id=1)
print("hello")
cbreakpoint(breakpoint_id=2)
It could be used as follows in gdb:
>>> gdb --args python example.py
[gdb] b cbreakpoint
[gdb] run
Now, the debuger would stops at cbreakpoint(breakpoint_id=1) and cbreakpoint(breakpoint_id=2).
Here is proof of concept, written in Cython to avoid the otherwise needed boilerplate-code:
#cbreakpoint.pyx
cdef extern from *:
"""
long long last_breakpoint_id = -1;
void cbreakpoint(long long breakpoint_id){
last_breakpoint_id = breakpoint_id;
}
"""
void c_cbreakpoint "cbreakpoint"(long long breakpoint_id)
def cbreakpoint(breakpoint_id = 0):
c_cbreakpoint(breakpoint_id)
which can be build inplace via:
cythonize -i cbreakpoint.pyx
If Cython isn't installed, I have uploaded a version which doesn't depend on Cython (too much code for this post) on github.
It is also possible to break conditionally, given the breakpoint_id, i.e.:
>>> gdb --args python example.py
[gdb] break src/cbreakpoint.c:595 if breakpoint_id == 2
[gdb] run
will break only after hello was printed - at cbreakpoint with id=2 (while cbreakpoint with id=1 will be skipped). Depending on Cython version the line can vary, but can be found out once gdb stops at cbreakpoint.
It would also do something similar without any additional modules:
add breakpoint or import pdb; pdb.set_trace() instead of cbreakpoint
gdb --args python example.py + run
When pdb interrupts the program, hit Ctrl+C in order to interrupt in gdb.
Activate breakpoints in gdb.
continue in gdb and then in pdb (i.e. c+enter twice).
A small problem is, that after that the breakpoints might be hit while in pdb, so the first method is a little bit more robust.
I have a fairly large codebase written in numba, and I have noticed that when the cache is enabled for a function calling another numba compiled function in another file, changes in the called function are not picked up when the called function is changed. The situation occurs when I have two files:
testfile2:
import numba
#numba.njit(cache=True)
def function1(x):
return x * 10
testfile:
import numba
from tests import file1
#numba.njit(cache=True)
def function2(x, y):
return y + file1.function1(x)
If in a jupyter notebook, I run the following:
# INSIDE JUPYTER NOTEBOOK
import sys
sys.path.insert(1, "path/to/files/")
from tests import testfile
testfile.function2(3, 4)
>>> 34 # good value
However, if I change then change testfile2 to the following:
import numba
#numba.njit(cache=True)
def function1(x):
return x * 1
Then I restart the jupyter notebook kernel and rerun the notebook, I get the following
import sys
sys.path.insert(1, "path/to/files/")
from tests import testfile
testfile.function2(3, 4)
>>> 34 # bad value, should be 7
Importing both files into the notebook has no effect on the bad result. Also, setting cache=False only on function1 also has no effect. What does work is setting cache=False on all njit'ted functions, then restarting the kernel, then rerunning.
I believe that LLVM is probably inlining some of the called functions and then never checking them again.
I looked in the source and discovered there is a method that returns the cache object numba.caching.NullCache(), instantiated a cache object and ran the following:
cache = numba.caching.NullCache()
cache.flush()
Unfortunately that appears to have no effect.
Is there a numba environment setting, or another way I can manually clear all cached functions within a conda env? Or am I simply doing something wrong?
I am running numba 0.33 with Anaconda Python 3.6 on Mac OS X 10.12.3.
I "solved" this with a hack solution after seeing Josh's answer, by creating a utility in the project method to kill off the cache.
There is probably a better way, but this works. I'm leaving the question open in case someone has a less hacky way of doing this.
import os
def kill_files(folder):
for the_file in os.listdir(folder):
file_path = os.path.join(folder, the_file)
try:
if os.path.isfile(file_path):
os.unlink(file_path)
except Exception as e:
print("failed on filepath: %s" % file_path)
def kill_numba_cache():
root_folder = os.path.realpath(__file__ + "/../../")
for root, dirnames, filenames in os.walk(root_folder):
for dirname in dirnames:
if dirname == "__pycache__":
try:
kill_files(root + "/" + dirname)
except Exception as e:
print("failed on %s", root)
This is a bit of a hack, but it's something I've used before. If you put this function in the top-level of where your numba functions are (for this example, in testfile), it should recompile everything:
import inspect
import sys
def recompile_nb_code():
this_module = sys.modules[__name__]
module_members = inspect.getmembers(this_module)
for member_name, member in module_members:
if hasattr(member, 'recompile') and hasattr(member, 'inspect_llvm'):
member.recompile()
and then call it from your jupyter notebook when you want to force a recompile. The caveat is that it only works on files in the module where this function is located and their dependencies. There might be another way to generalize it.
The following code crashes when using GHC on Windows. It works perfectly on Linux.
Does this make any sense or is there a bug?
module Main where
import qualified Text.Regex as Re -- from regex-compat
import Debug.Trace
main :: IO ()
main = do
putStr $ cleanWord "jan"
putStr $ cleanWord "dec"
putStr $ cleanWord "jun" -- crashes here
cleanWord :: String -> String
cleanWord word_ =
let word = trace (show word_) word_ in
let re = Re.mkRegexWithOpts "(jun|jul|aug|sep|oct|nov|dec)" True False in
Re.subRegex re word ""
Some additional details
I'm building with stack
It crashes in both GHCi and running the compiled executable
I tried to enable profiling but can't seem to figure out how to get that to work correctly.
Shuffling the regex around avoids the bug in this case.
This works fine for me now
let re = Re.mkRegexWithOpts "(jul|aug|sep|oct|jun|nov|dec)" True False in
I.e. moving the jun towards the end of the regex. Not particularly satisfactory but it works and it means I can continue.
#Zeta's comment suggest that this is a bug with the underlying library
It's a bug, probably in regex-posix, due to its difference in the
inclusion of the underlying BSD regex library from 1994. In sslow, an
invalid memory address (-1(%rdx) with rdx = 0) will be accessed.
I have a Main.hs file with two functions.
Module Main where
import Data.List
main :: IO()
main = interact reverse
functionThatWorks = putStrLn "Ajax"
After I set the directory and load Main.hs I have no problem with calling functionThatWorks.
Except when I want to take a text file as input like so:
Main<in.txt or ./Main<in.txt
I get an error saying 'parse error on input 'in' '
Does anyone know I can make this work in the Terminal?
p.s. I use a Mac.
Unfortunately ghci doesn't understand input redirection like the shell does.
I would suggest running your program with runhaskell:
runhaskell Main.hs < in.txt
Does anyone know of a quick-starting Haskell interpreter that would be suitable for use in writing shell scripts? Running 'hello world' using Hugs took 400ms on my old laptop and takes 300ms on my current Thinkpad X300. That's too slow for instantaneous response. Times with GHCi are similar.
Functional languages don't have to be slow: both Objective Caml and Moscow ML run hello world in 1ms or less.
Clarification: I am a heavy user of GHC and I know how to use GHCi. I know all about compiling to get things fast. Parsing costs should be completely irrelevant: if ML and OCaml can start 300x faster than GHCi, then there is room for improvement.
I am looking for
The convenience of scripting: one source file, no binary code, same code runs on all platforms
Performance comparable to other interpreters, including fast startup and execution for a simple program like
module Main where
main = print 33
I am not looking for compiled performance for more serious programs. The whole point is to see if Haskell can be useful for scripting.
Why not create a script front-end that compiles the script if it hasn't been before or if the compiled version is out of date.
Here's the basic idea, this code could be improved a lot--search the path rather then assuming everything's in the same directory, handle other file extensions better, etc. Also i'm pretty green at haskell coding (ghc-compiled-script.hs):
import Control.Monad
import System
import System.Directory
import System.IO
import System.Posix.Files
import System.Posix.Process
import System.Process
getMTime f = getFileStatus f >>= return . modificationTime
main = do
scr : args <- getArgs
let cscr = takeWhile (/= '.') scr
scrExists <- doesFileExist scr
cscrExists <- doesFileExist cscr
compile <- if scrExists && cscrExists
then do
scrMTime <- getMTime scr
cscrMTime <- getMTime cscr
return $ cscrMTime <= scrMTime
else
return True
when compile $ do
r <- system $ "ghc --make " ++ scr
case r of
ExitFailure i -> do
hPutStrLn stderr $
"'ghc --make " ++ scr ++ "' failed: " ++ show i
exitFailure
ExitSuccess -> return ()
executeFile cscr False args Nothing
Now we can create scripts such as this (hs-echo.hs):
#! ghc-compiled-script
import Data.List
import System
import System.Environment
main = do
args <- getArgs
putStrLn $ foldl (++) "" $ intersperse " " args
And now running it:
$ time hs-echo.hs "Hello, world\!"
[1 of 1] Compiling Main ( hs-echo.hs, hs-echo.o )
Linking hs-echo ...
Hello, world!
hs-echo.hs "Hello, world!" 0.83s user 0.21s system 97% cpu 1.062 total
$ time hs-echo.hs "Hello, world, again\!"
Hello, world, again!
hs-echo.hs "Hello, world, again!" 0.01s user 0.00s system 60% cpu 0.022 total
Using ghc -e is pretty much equivalent to invoking ghci. I believe that GHC's runhaskell compiles the code to a temporary executable before running it, as opposed to interpreting it like ghc -e/ghci, but I'm not 100% certain.
$ time echo 'Hello, world!'
Hello, world!
real 0m0.021s
user 0m0.000s
sys 0m0.000s
$ time ghc -e 'putStrLn "Hello, world!"'
Hello, world!
real 0m0.401s
user 0m0.031s
sys 0m0.015s
$ echo 'main = putStrLn "Hello, world!"' > hw.hs
$ time runhaskell hw.hs
Hello, world!
real 0m0.335s
user 0m0.015s
sys 0m0.015s
$ time ghc --make hw
[1 of 1] Compiling Main ( hw.hs, hw.o )
Linking hw ...
real 0m0.855s
user 0m0.015s
sys 0m0.015s
$ time ./hw
Hello, world!
real 0m0.037s
user 0m0.015s
sys 0m0.000s
How hard is it to simply compile all your "scripts" before running them?
Edit
Ah, providing binaries for multiple architectures is a pain indeed. I've gone down that road before, and it's not much fun...
Sadly, I don't think it's possible to make any Haskell compiler's startup overhead any better. The language's declarative nature means that it's necessary to read the entire program first even before trying to typecheck anything, nevermind execution, and then you either suffer the cost of strictness analysis or unnecessary laziness and thunking.
The popular 'scripting' languages (shell, Perl, Python, etc.) and the ML-based languages require only a single pass... well okay, ML requires a static typecheck pass and Perl has this amusing 5-pass scheme (with two of them running in reverse); either way, being procedural means that the compiler/interpreter has a lot easier of a job assembling the bits of the program together.
In short, I don't think it's possible to get much better than this. I haven't tested to see if Hugs or GHCi has a faster startup, but any difference there is still faaar away from non-Haskell languages.
If you are really concerned with speed you are going to be hampered by re-parsing the code for every launch. Haskell doesn't need to be run from an interpreter, compile it with GHC and you should get excellent performance.
You have two parts to this question:
you care about performance
you want scripting
If you care about performance, the only serious option is GHC, which is very very fast: http://shootout.alioth.debian.org/u64q/benchmark.php?test=all&lang=all
If you want something light for Unix scripting, I'd use GHCi. It is about 30x faster than Hugs, but also supports all the libraries on hackage.
So install GHC now (and get GHCi for free).
What about having a ghci daemon and a feeder script that takes the script path and location, communicates with the already running ghci process to load and execute the script in the proper directory and pipes the output back to the feeder script for stdout?
Unfortunately, I have no idea how to write something like this, but it seems like it could be really fast judging by the speed of :l in ghci. As it seems most of the cost in runhaskell is in starting up ghci, not parsing and running the script.
Edit: After some playing around, I found the Hint package (a wrapper around the GHC API) to be of perfect use here. The following code will load the passed in module name (here assumed to be in the same directory) and will execute the main function. Now 'all' that's left is to make it a daemon, and have it accept scripts on a pipe or socket.
import Language.Haskell.Interpreter
import Control.Monad
run = runInterpreter . test
test :: String -> Interpreter ()
test mname =
do
loadModules [mname ++ ".hs"]
setTopLevelModules [mname]
res <- interpret "main" (as :: IO())
liftIO res
Edit2: As far as stdout/err/in go, using this specific GHC trick It looks like it would be possible to redirect the std's of the client program into the feeder program, then into some named pipes (perhaps) that the daemon is connected to at all times, and then have the daemon's stdout back to another named pipe that the feeder program is listening to. Pseudo-example:
grep ... | feeder my_script.hs | xargs ...
| ^---------------- <
V |
named pipe -> daemon -> named pipe
Here the feeder would be a small compiled harness program to just redirect the std's into and then back out of the daemon and give the name and location of the script to the daemon.