Inputting Data with Haskell - shell

Back Story: In an attempt to better understand Haskell and functional programming, I've given myself a few assignments. My first assignment is to make a program that can look through a data set (a set of numbers, words in a blog, etc), search for patterns or repetitions, group them, and report them.
Sounds easy enough. :)
Question: I'd like for the program to start by creating a list variable from the data in a text file. I'm familiar with the readFile function, but I was wondering if there was a more elegant way to input data.
For example, I'd like to allow the user to type something like this in the command line to load the program and the data set.
./haskellprogram textfile.txt
Is there a function that will allow this?

import System.Environment
main :: IO ()
main = do
args <- getArgs
-- args is a list of arguments
if null args
then putStrLn "usage: ./haskellprogram textfile.txt"
else do contents <- readFile $ head args
putStrLn $ doSomething contents
doSomething :: String -> String
doSomething = reverse
That should be enough to get you started. Now replace reverse with something more valuable :)
Speaking of parsing some input data, you might consider breaking your data into lines or words using respective functions from Prelude.

You're looking for getArgs function.

Subtle Array, I can never resist mentioning my favorite when I was first learning Haskell, interact:
module Main where
main = interact doSomething
doSomething :: String -> String
doSomething xs = reverse xs
you then use it as cat textfile.txt | ./haskellprogram | grep otto or whatever. There is also a variant in Data.Text which you might get to know, and a few others in other string-ish libraries.

Playing with the relatively new ReadArgs package:
{-# LANGUAGE ScopedTypeVariables #-}
import ReadArgs (readArgs)
main = do
(fname :: String, foo :: Int) <- readArgs
putStrLn fname
Testing...
$ runhaskell args.hs blahblah 3
blahblah
One irritation with readArgs is that it doesn't work if you have only a single argument. Hmmm...
Once you have the desired file name as a String, you can use readFile as usual.

Related

Why does ghci behave differently to runHaskell?

My goal is to pipe some steps for ghci to run from a bash script and then exit cleanly. The commentary online says to use runhaskell for this.
This is the command I'm trying to run:
ghci> import System.Random
ghci> random (mkStdGen 100) :: (Int, StdGen)
With expected result similar to:
(-3633736515773289454,693699796 2103410263)
When I drop this into a file randomtest.hs and execute it with runhaskell I get the following error:.
randomtest.hs:3:1: error:
Invalid type signature: random (mkStdGen 100) :: ...
Should be of form <variable> :: <type>
I need a hint to go in the right direction.
My question is: Why does ghci behave differently to runHaskell?
ghci is a REPL (Read, Eval, Print Loop). However, runhaskell is nearly the same as compiling a program into an executable, and then running it. GHCI lets us run individual functions and arbitrary expressions, wheras runhaskell just calls the main function and interprets the file, instead of compiling it, and running that.
As #AJFarmar points out, GHCI is best used to debug and test a program you're building, whilst runhaskell is a nice way to run a whole program without having to compile.
So, to fix your issue, we just need to give the program a main function. ghci calls print on the result of every expression which is typed into the interpreter and not bound to a variable.
So, our main function can just be:
main = print (random (mkStdGen 100) :: (Int, StdGen))
We still need to import System.Random, so the whole file becomes:
import System.Random
main = print (random (mkStdGen 100) :: (Int, StdGen))
Then, we can run as expected:
[~]λ runhaskell randomtest.hs
(-3633736515773289454,693699796 2103410263)
If we want to multiple commands from runhaskell we can just add more to a do block in main:
import System.Random
main = do
print (random (mkStdGen 100) :: (Int, StdGen))
let x = 5 * 5
print x
putStrLn "Hello world!"

Subtle bug in Haskell tests involving Tasty, stdout capture, and the unix terminal

I'm writing a language interpreter in Haskell, and for the most part it works (hooray!). But some of the tests fail in an odd way that depends on whether stdout is sent to the terminal or a pipe...
I'm using tasty-golden to test the REPL. There are text files with pasted REPL sessions like this:
Welcome to the ShortCut interpreter!
Type :help for a list of the available commands.
shortcut >> v1 = "one"
shortcut >> v2 = "two"
shortcut >> v3 = [v1, v2]
shortcut >> :show
v1 = "one"
v2 = "two"
v3 = [v1, v2]
shortcut >> :rdepends v1
v3 = [v1, v2]
shortcut >> :quit
Bye for now!
I have a function that reads those files and splits them into user input (anything after shortcut >>) and what the program should print to stdout. It passes the input lines to the REPL, captures stdout using the silently package, intercalates them, and checks that the combined output is the same as that file.
Everything works, but only if I pipe the overall program output through less! When printing directly to the terminal, certain lines slip through and are actually printed, then the tests fail because they aren't part of the captured output.
I think it might be a bug in the interaction between Tasty and Silently, which both do fancy things with terminal output, but have no idea how to debug it or write a reproducible example.
Here's the complete output run in three different ways:
Copied directly from the terminal
Piped through less and copied from the terminal
Piped through tee and copied from the resulting file
As you can see:
Each FAIL is preceded by printing part of the REPL output to the terminal instead of capturing it
less somehow gets around that and the tests pass
tee misses the printed bits (they end up in terminal but not the file)
Any ideas about what could be going on? I can post any parts of the code you think might be relevant, but didn't want to muddy the question with hundreds and hundreds of mostly-unrelated lines.
Update: tried replacing Silently's hCapture_ with this function from another question:
catchOutput :: IO () -> IO String
catchOutput action = do
tmpd <- getTemporaryDirectory
(tmpf, tmph) <- openTempFile tmpd "haskell_stdout"
stdout_dup <- hDuplicate stdout
hDuplicateTo tmph stdout
hClose tmph
action
hDuplicateTo stdout_dup stdout
str <- readFile tmpf
removeFile tmpf
return str
Unfortunately it does the same thing.
Update 2:
Well, figured it out. Sorry for everyone who tried to puzzle through it, as I don't think I gave enough information! The bug was in my REPL monad, which was defined like this:
type ReplM a = StateT CutState (MaybeT (InputT IO)) a
runReplM :: ReplM a -> CutState -> IO (Maybe CutState)
runReplM r s = runInputT defaultSettings $ runMaybeT $ execStateT r s
prompt :: String -> ReplM (Maybe String)
prompt = lift . lift . getInputLine
print :: String -> ReplM ()
print = lift . lift . outputStrLn
I noticed that all the erroneous prints came from functions that use print, whereas a couple cases where I used liftIO . putStrLn instead worked as expected. So I just redefined print as that:
print :: String -> ReplM ()
print = liftIO . putStrLn
I still don't really get why the other version didn't work though, so I'll give the answer to whoever can explain it. outputStrLn is defined in Haskeline.hs.

Making "trace" optimise away like "assert"?

GHC rewrites asserts when optimising as just id. Or alternatively, it's behaviour can be changed with a compiler flag. However, I noticed the same doesn't happen with trace. Is there a version of trace which just ends up as id if a flag isn't or is set?
More generally speaking, is there a way to alter the behaviour of a function based on the compiler flags used to compile the calling module (not the flags used to compile itself). Much like assert does. Or is this GHC magic that can only happen with assert?
Warning: I haven't tried this ...
You can replace the Debug.Trace module completely with compiler flags. Make another module with trivial implementations of the functions in Debug.Trace:
module NoTrace (trace) where:
trace :: String -> a -> a
{-# INLINE trace #-}
trace _message = id
...
Put this module in another package named no-trace.
Hide the Debug.Trace module in the arguments to ghc by including every module from the base package except Debug.Trace. Replace Debug.Trace with NoTrace from the no-trace package.
ghc -package="base (Control, Control.Applicative, ..., Data.Word, Foreign, ...)" \
-package="no-trace (NoTrace as Debug.Trace)" \
...
This came from the crazy idea of using the compiler flag that changes the prelude to replace the prelude with one that had rewrite rules to remove traces, but those rewrite rules would taint anything that imported a module compiled with them, even if a downstream importer still wanted to use traces. When looking up how to replace the prelude I found that ghc can replace any module instead.
No, at least not based on assert. The magic for assert works the other direction and replaces the identity function with an assertion.
Here's assert from base 4.9:
-- Assertion function. This simply ignores its boolean argument.
-- The compiler may rewrite it to #('assertError' line)#.
-- | If the first argument evaluates to 'True', then the result is the
-- second argument. Otherwise an 'AssertionFailed' exception is raised,
-- containing a 'String' with the source file and line number of the
-- call to 'assert'.
--
-- Assertions can normally be turned on or off with a compiler flag
-- (for GHC, assertions are normally on unless optimisation is turned on
-- with #-O# or the #-fignore-asserts#
-- option is given). When assertions are turned off, the first
-- argument to 'assert' is ignored, and the second argument is
-- returned as the result.
-- SLPJ: in 5.04 etc 'assert' is in GHC.Prim,
-- but from Template Haskell onwards it's simply
-- defined here in Base.lhs
assert :: Bool -> a -> a
assert _pred r = r
OK, back at my computer and finally remembered I wanted to demonstrate this. Here goes:
import Control.Exception
import Debug.Trace
import Control.Monad
traceDebug :: String -> a -> a
traceDebug msg = assert (trace msg True)
main :: IO ()
main = replicateM_ 2 $ do
print $ traceDebug "here1" ()
print $ traceDebug "here2" ()
print $ traceDebug "here3" ()
When compiled without optimizations, the output is:
here1
()
here2
()
here3
()
()
()
()
With optimizations:
()
()
()
()
()
()
So I think this addresses the request, with the standard caveat around trace that once the thunk has been evaluated, it won't be evaluated a second time (which is why the messages only happen the first time through the do-block).

How Golang implement stdin/stdout/stderr

I did a little program which was able to parse input from command line. It worked well by means of std.in. However, when I looked up the official document for further learning, I found there was too much stuff for me.
var (
Stdin = NewFile(uintptr(syscall.Stdin), "/dev/stdin")
)
I read the document of func NewFile, type uintpty, Package syscall individually but could not figure out the whole. Also, I did not know the meaning of /dev/stdin, either.
I never learned another static programming language except for go. How could I realize the magic of stdin?
From the syscall package, Stdin is just the number 0:
var (
Stdin = 0
Stdout = 1
Stderr = 2
)
This is simply because the posix standard is that stdin is attached to the first file descriptor, 0.
Since stdin is always present and open by default, os.NewFile can just turn this file descriptor into an os.File, and uses the standard Linux filepath "/dev/stdin" as an easily recognizable file name.

Remove n characters from console

I'm writing a program in D that doesn't need a GUI. I remember that in C++, there was a way to remove a number of characters from console/terminal, but I don't know how to do this in D.
How do I remove a number of characters from the console/terminal?
(This didn't fit into a comment and I think it's what you are referring to)
Do you mean getchar? You have direct access to the entire standard C library in D. For example have a look at this simple script:
void main()
{
import core.stdc.stdio : getchar;
foreach(i; 0..3)
getchar();
import std.stdio;
writeln(readln());
}
When you compile & execute this script (e.g. here with rdmd)
echo "Hello world" | rdmd main.d
it would print:
lo world
But I have to agree with Adam that just slicing readln is easier and looks nicer ;-)

Resources