Debug Haskell composition chain without converting to point-full representation - debugging

I have learned that point-free style is preferred in the Haskell community, and I often write expressions like this:
naive = (slugifyUnicode . T.take maxFilenameSize . T.pack . stripHtmlTags . T.unpack . plainToHtml) sfld
However, while debugging, I find myself repeatedly converting expressions like this into chains of $ operators in order to use some variant of trace, and I am starting to think it is preferable to forgo the point-free style and just write lots of $s to start with, in order to make debugging less cumbersome. After all, it ends up being about the same number of characters.
Does anyone know of a way to debug long chains of composed functions without de-composing them?
And more generally, any comments on this inconvenience are very welcome.

For any intermediate value that has a Show instance you can just use traceShowId inline:
naive = (slugifyUnicode . T.take maxFilenameSize . traceShowId . T.pack . stripHtmlTags . T.unpack .plainToHtml) sfld
If the intermediary value is a String you can use traceId instead.
For anything that doesn't have a Show instance you'd have to define a helper:
data CustomType = CustomType String
traceHelper :: CustomType -> CustomType
traceHelper s#(CustomType c) = trace c $ s
-- arbitrary functions we want to compose
a :: aIn -> CustomType
b :: CustomType -> bOut
c :: aIn -> bOut
c = b . traceHelper . a

Related

Haskell debugging an arbitrary lambda expression

I have a set of lambda expressions which I'm passing to other lambdas. All lambdas rely only on their arguments, they don't call any outside functions. Of course, sometimes it gets quite confusing and I'll pass an function with the incorrect number of arguments to another, creating a GHCi exception.
I want to make a debug function which will take an arbitrary lambda expression (with an unknown number of arguments) and return a string based on the structure and function of the lambda.
For example, say I have the following lambda expressions:
i = \x -> x
k = \x y -> x
s = \x y z -> x z (y z)
debug (s k) should return "\a b -> b"
debug (s s k) should return "\a b -> a b a" (if I simplified that correctly)
debug s should return "\a b c -> a c (b c)"
What would be a good way of doing this?
I think the way to do this would be to define a small lambda calculus DSL in Haskell (or use an existing implementation). This way, instead of using the native Haskell formulation, you would write something like
k = Lam "x" (Lam "y" (App (Var "x") (Var "y")))
s = Lam "x" (Lam "y" (Lam "z" (App (App (Var "x") (Var "z")
(App (Var "y") (Var "z"))))
and similarly for s and i. You would then write/use an evaluation function so that you could write
debug e = eval e
debug (App s k)
which would give you the final form in your own syntax. Additionally you would need a sort of interpreter to convert your DSL syntax to Haskell, so that you can actually use the functions in your code.
Implementing this does seem like quite a lot of (tricky) work, and it's probably not exactly what you had in mind (especially if you need the evaluation for typed syntax), but I'm sure it would be a great learning experience. A good reference would be chapter 6 of "Write you a Haskell". Using an existing implementation would be a lot easier (but less fun :)).
If this is merely for debugging purposes you might benefit from looking at the core syntax ghc compiles to. See chapter 25 of Real world Haskell, the ghc flag to use is -ddump-simpl. But this would mean looking at generated code rather than generating a representation inside your program. I'm also not sure to what extent you would be able to identify specific functions in the Core code easily (I have no experience with this so YMMV).
It would of course be pretty cool if using show on functions would give the kind of output you describe but there are probably very good reasons functions are not an instance of Show (I wouldn't be able to tell you).
You can actually achieve that by utilising pretty-printing from Template Haskell, which comes with GHC out of the box.
First, the formatting function should be defined in separate module (that's a TH restriction):
module LambdaPrint where
import Control.Monad
import Language.Haskell.TH.Ppr
import Language.Haskell.TH.Syntax
showDef :: Name -> Q Exp
showDef = liftM (LitE . StringL . pprint) . reify
Then use it:
{-# LANGUAGE TemplateHaskell #-}
import LambdaPrint
y :: a -> a
y = \a -> a
$(return []) --workaround for GHC 7.8+
test = $(showDef 'y)
The result is more or less readable, not counting fully qualified names:
*Main> test
"Main.y :: forall a_0 . a_0 -> a_0"
Few words about what's going on. showDef is a macro function which reifies the definition of some name from the environment and pretty-prints it in a string literal expression. To use it, you need to quote the name of the lambda (using ') and splice the result (which is a quoted string expression) into some expression (using $(...)).

Haskell grammar to validate a string in specific format

I would like to define a grammar in Haskell that matches a string in format "XY12XY" (some alpha followed by some numerics), eg variable names in programming languages.
customer123 is a valid variable name, but '123customer' is not a valid variable name.
I am at a loss how to define the grammar and write a validator function that would validate whether a given string is valid variable name. I have been trying to understand and adapt the parser example at: https://wiki.haskell.org/GADT but I just can't get my head around how to tweak it to make it work for my need.
If any kind fellow Haskell gurus would help me define this please:
validate :: ValidFormat -> String -> Bool
validate f [] = False
validate f s = ...
I would like to define the ValidFormat grammar as:
varNameFormat = Concat Alpha $ Concat Alpha Numeric
I'd start with a simple parser and see if that satisfies your needs, unless you can explain why this is not enough for your use case. Parsers are pretty straightforward. I'll give a very simple (and maybe incomplete) example with attoparsec:
import Control.Applicative
import Data.Attoparsec.ByteString.Char8
import qualified Data.ByteString.Char8 as B
validateVar :: B.ByteString -> Bool
validateVar bstr = case parseOnly variableP bstr of
Right _ -> True
Left _ -> False
variableP :: Parser String
variableP =
(++)
<$> many1 letter_ascii -- must start with one or more letters
<*> many (digit <|> letter_ascii) -- then can have any combination of letters/digits
<* endOfInput -- make sure we don't ignore invalid trailing chars
variableP combines parsers via <*> and will require you to handle both results of many1 letter_ascii and many (digit <|> letter_ascii). In this case we just concatenate both results via (++), check the types of many1, many, letter_ascii and digit. The <* says "parse this, but discard the result of the right hand parser" (otherwise you'd have to handle 3 results).
That means if you run the parser on "abc123" you'll get back "abc123". If you parse "1abc" the parser will fail.
Check the type of parseOnly:
parseOnly :: Parser a -> ByteString -> Either String a
We pass it our parser and the bytestring it should parse. If the parser fails we'll get Left <something went wrong>. If the parser succeeds, we'll get Right <our string>. The cool thing is... instead of just giving a string on success, we could do pretty much anything with the results in variableP, as in: use something different than (++), convert the types and whatnot (mind that the Parser type might also have to change then).
Since we only care if the parser succeeded in validateVar, we can just ignore the result in either case.
So instead of defining GADTs for your grammar, you just define Parsers.
You might also find this link useful for a tutorial: http://www.seas.upenn.edu/~cis194/fall14/spring13/lectures.html (week 10 and 11, including the assignments where you basically write your own little parser library)
I've taken this from examples of regex-applicative
import Text.Regex.Applicative
import Data.Char
import Data.Maybe
varNameFormat :: RE Char String
varNameFormat = (:) <$> psym isAlpha <*> many (psym isAlphaNum)
validate :: RE Char String -> String -> Bool
validate re str = isJust $ str =~ re
You will have
*Main> validate varNameFormat "a123"
True
*Main> validate varNameFormat "1a23"
False

How to iterate through a UTF-8 string correctly in OCaml?

Say I have some input word like "føøbær" and I want a hash table of letter frequencies s.t. f→1, ø→2 – how do I do this in OCaml?
The http://pleac.sourceforge.net/pleac_ocaml/strings.html examples only work on ASCII and https://ocaml-batteries-team.github.io/batteries-included/hdoc2/BatUTF8.html doesn't say how to actually create a BatUTF8.t from a string.
The BatUTF8 module you refer to defines its type t as string, thus there is no conversion needed: a BatUTF8.t is a string. Apparently, the module encourages you to validate your string before using other functions. I guess that a proper way of operating would be something like:
let s = "føøbær"
let () = BatUTF8.validate s
let () = BatUTF8.iter add_to_table s
Looking at the code of Batteries, I found this of_string_unsafe, so perhaps this is the way:
open Batteries
BatUTF8.iter (fun c -> …Hashtbl.add table c …) (BatUTF8.of_string_unsafe "føøbær")`
although, since it's termed "unsafe" (the doc's don't say why), maybe this is equivalent:
BatUTF8.iter (fun c -> …Hashtbl.add table c …) "føøbær"
At least it works for the example word here.
Camomile also seems to iterate through it correctly:
module C = CamomileLibraryDefault.Camomile
C.iter (fun c -> …Hashtbl.add table c …) "føøbær"
I don't know of the tradeoffs between Camomile and BatUTF8 here, though they end up storing different types (BatUChar vs C.Pervasives.UChar).

Traverse an Abstract Syntax Tree

I plunged in an attemp to translate Haskell.
I need walk the HsModule structure (returned by parseModule source),
to translate every HsIdent String, where String is an english identifier
into HsIdent String, where String is an identifier in some other natural language (i.e. italian, french, ...).
I wonder if exists some direct strategy, perhaps in TH, to walk a HsModule Structure (i.e. to apply a function to every HsIdent String), without explicit unfold-functions for the involved substructures?
I hope I was plain enough in my request; many thanks for your precious aid.
Best regards.
I found a solution in Data.Generics packages.
HsModule is an instance of Data and Typeable, so it is eligible to process it with a traverse function of a Generic package. I chose SYB because is quite well documented .
My solution is:
module Main where
import Data.Generics
import Language.Haskell.Syntax
import Language.Haskell.Parser
import Language.Haskell.Pretty
import Control.Monad
translate:: ParseResult HsModule -> Maybe String
translate r = case r of
ParseOk a -> Just (show $ prettyPrint $ translateHsIdent "_italian" a)
ParseFailed _ _ -> Nothing
translateHsIdent :: Data a => String -> a -> a
translateHsIdent k = everywhere (mkT (addStrangerIdentifier k))
where
addStrangerIdentifier :: String -> HsName -> HsName
addStrangerIdentifier s (HsIdent i) = HsIdent (i ++ s)
main = maybe (putStrLn "Parse Error") putStrLn result
where
result :: Maybe String
result = translate $ parseModule "main = putStrLn \"Just a Try\""
I hope it can be useful for someone else.

How can I improve performance of producing String for output in Haskell

I have a program that I am trying to make faster, mostly for the sake of making it faster to learn more about Haskell. For comparison I have written the same program in C and have a 4x speed improvement. I expected faster from C, but that kind of difference makes me think I have something wrong.
So I have profiled the Haskell code and over 50% of the time is spent producing the formatted String for output. So just this section takes more than my entire C program. The function is similar to this:
display :: POSIXTime -> [(Int, Centi)] -> IO()
display t l = putStrLn $ t_str ++ " " ++ l_str
where
t_str = show . timeToTimeOfDay . unsafeCoerce $ (t `mod` posixDayLength)
l_str = intercalate " " $ map displayPair l
displayPair (a,b) = show a ++ " " ++ show b
Notes about the code:
The unsafeCoerce is to convert NominalDiffTime to DiffTime which have the same type but this is faster than toRational . fromRational which I had been using.
Centi is defined in Data.Fixed and is a number with 2 decimal places
TimeOfDay is as you would expect just hours, minutes and seconds (stored with picosecond accuracy).
`mod` posixDayLength is so we just get the time of day ignoring which day it is (because that is all I care about ... it is from a timestamp and I know that it had to be today - I just care what time today!).
I have tried using ShowS (String -> String) to concatenate results and this is not significantly faster.
I have tried using Data.Text but that makes the code slower (presumably spends too much time packing strings).
I used to have the putStrLn in a separate function but it is faster here (less thunks built up? but why?).
Is there an easy way to improve output performance in Haskell that I'm missing?
For producing output the highest performance can be found by avoiding String in favour of either a ByteString or Text. In order to build the output there is a special data type called Builder. There is a good description with examples in the [ByteString] hackage description.
The resulting code looks like this:
import Data.Monoid
display :: POSIXTime -> [(Int, Centi)] -> IO()
display t l = hPutBuilder stdout $ t_str <> space <> l_str
where
space = (byteString . pack) " "
t_str = (byteString . pack . show . timeToTimeOfDay . unsafeCoerce) $ (t `mod` posixDayLength)
l_str = foldr <> space $ map displayPair l
displayPair (a,b) = intDec a <> space <> (byteString . pack . show) b
The builder data type builds up chunks that it will then concatenate in O(1) in to a buffer for the output. Unfortunately, not all types have a builder for them and only the base types. So for outputting the others the only solution is to pack the string ... or perhaps to write a function to create a builder (and add it to the library?).

Resources