Pandoc `writeMarkdown` does not include metadata, i.e. write (read a) is not a - what extension is required? - pandoc

I want to creat an utilty to transform some text in a markdown file and want to procude a new markdown file. The metadata is not changed.
The operations read and write should be inverse (or at least idempotent), but I cannot find a way to have pandoc reproduce the input file including the metadata. What combination of Extensions and Options are required?
Here my minimal working example with the newest pandoc from lst-15.13.
-- ---------------------------------------------------------------------------
--
-- Module : a test for pandoc output of yaml
-- ---------------------------------------------------------------------------
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE OverloadedStrings #-}
module Main where
import qualified Data.Text as T
import Text.Pandoc
import Control.Monad.IO.Class (liftIO )
main :: IO ()
main = do
putStrLns $ ["input", showT inputMd]
res <- sub inputMd
case res of
Left st -> putStrLns $ ["error", show st]
Right q ->
do
putStrLns $ ["result q\n", showT $ q]
putStrLns $ ["should be the same as input p\n", showT inputMd]
putStrLns $ ["same", show (inputMd == ( q))]
return ()
return ()
showT = show . T.unpack
putStrLns = putStrLn . unwords
sub :: T.Text -> IO (Either PandocError T.Text)
sub input1 = do
res <- runIO $
do
let readOptions = def{readerStandalone = True
, readerExtensions = extensionsFromList
[Ext_yaml_metadata_block ]
}
p <- readMarkdown readOptions ( input1)
-- def does not analyse metadata
-- readOptions analysis metadata
-- output is then only the markdown
-- t1 :: String <- liftIO $readFile "/home/frank/Workspace8/pandocTest/temp.tpl"
-- t2 <- compileDefaultTemplate (T.pack t1)
let writeOptions = def {writerSetextHeaders = False
, writerExtensions = extensionsFromList
[Ext_yaml_metadata_block]
-- , writerTemplate = Just t2
}
q <- writeMarkdown writeOptions p
-- def gives only the markdown part
-- but not including the titel
liftIO $ putStrLns ["AST \n", show p]
return q
res2 <- handleError res
return res
inputMd = T.unlines ["---"
,"title: The Future of AA"
,"..."
,""
,"## second level"
,"text for nothing"
] :: T.Text
The use of a template comiles but gives a runtime error:
Could not find data file /home/frank/.stack/snapshots/x86_64-linux/668b320207ef95ba5255b2b20895a7f7315ff61076bb3ab82e76f7ef56076320/8.8.3/share/x86_64-linux-ghc-8.8.3/pandoc-2.9.1.1/data/templates/default.
$if(titleblock)$ $titleblock$

Related

OCaml - How to serialize and deserialize Yaml to records

What are the current community preferred libraries to parse and work with YAML and how do you use them to serialize and deserialize a record like this:
type book = {
title: string;
authors: string list
}
This is how I got string -> record and back.
$ opam update
$ opam install yaml ppx_deriving_yaml
Update dune with the preprocess clause:
; `dune` file
(executable
(name main)
(libraries yaml)
(preprocess
(pps ppx_deriving_yaml)))
Short version:
let serialize_book (book_rec : book) : (string, string) result =
let res = Yaml.to_string (book_to_yaml book_rec) in
map_error ~f:(fun (`Msg m) -> m) res
let deserialize_book (book_str : string) : (book, string) result =
let res =
Yaml.of_string book_str >>= fun yaml_value -> book_of_yaml yaml_value
in
map_error ~f:(fun (`Msg m) -> m) res
More verbose/descriptive version:
(* Define a record *)
(* `[##deriving yaml]` generates a bunch of functions, one being `book_to_yaml` to convert the record into a Yaml type, another `book_of_yaml` to convert Yaml type to record *)
type book = {
title: string;
authors: string list
} [##deriving yaml]
let serialize =
let (v: book) = { title = "Cryptonomicon"; authors = [ "Neal Stephenson" ] } in
(* `book_to_yaml` converts from record to `yaml res` where res is a Result *)
let yaml_structure = book_to_yaml v in
(* `to_string` converts from a `yaml` type'ed data structure to string *)
match Yaml.to_string yaml_structure with
| Ok s ->
print_endline ("Serialize:");
print_endline (s)
| Error (`Msg e) -> print_endline e
let deserialize =
let str = "title: Cryptonomicon\nauthors:\n- Neal Stephenson" in
(* `of_string converts from string to a `yaml res` data structure, where `res` is Result *)
match Yaml.of_string str with
| Ok yaml_value ->
(* `book_of_yaml` is generated by `[##deriving yaml]` *)
(* `book_of_yaml` converts from `yaml` type to `book res` where res is Result *)
(match book_of_yaml yaml_value with
| Ok t ->
print_endline ("Deserialize:");
print_endline ("Title: " ^ t.title);
print_endline ("Authors: " ^ String.concat ", " t.authors);
| Error `Msg e -> print_endline ("Error - convert to book: " ^ e))
| Error `Msg e -> print_endline ("Error - parsing: " ^ e)

How to write big file efficiently in Haskell

Hello i am trying to write a ~1GB file in a timely manner.Is there any recommended method.Up until now the process takes somewhere in the order of tens of minutes . Am i wrong in using Text should i use ByteString ? (I have also used String)
pt="d:\\data2.csv"
cnt=400000000
main::IO()
main=do
let payload=dat
writeWithHandle pt dat
dat::Text
dat=Data.Text.pack "0744442339"
writeWithHandle::FilePath->Text->IO()
writeWithHandle path tx=do
handle<-openFile path WriteMode
writeTimes cnt handle dat
writeTimes::Int->Handle->Text->IO()
writeTimes cnt handle payload= forM_ ([0..cnt]) (\x->Data.Text.IO.hPutStrLn handle payload)
I do not understand why it is taking so much in the order of tens of minutes.Initially i was using writeFile but i thought that would mean continously opening and closing the file for each row so i used appendFile to no avail.
I would recommend using a Builder for this, which is an efficient way to fill up buffers and can be written directly to a Handle.
#!/usr/bin/env stack
-- stack --resolver ghc-8.6.4 script
{-# LANGUAGE OverloadedStrings #-}
import Data.ByteString.Builder (Builder, hPutBuilder)
import Data.Foldable (fold)
import System.IO (IOMode (WriteMode), withBinaryFile)
pt :: FilePath
pt = "data2.csv"
cnt :: Int
cnt = 400000000
main :: IO ()
main = writeWithHandle pt dat
dat :: Builder
dat = "0744442339"
writeWithHandle :: FilePath -> Builder -> IO ()
writeWithHandle path tx =
withBinaryFile path WriteMode $ \h ->
hPutBuilder h $ makeBuilder cnt tx
makeBuilder :: Int -> Builder -> Builder
makeBuilder cnt payload = fold $ replicate cnt $ payload <> "\n"
You can keep payload as a Text value instead if you'd like, and convert to a Builder using encodeUtf8Builder.

Programming pattern or library (i.e. idiomatic way) to handle CLI arguments semantic errors?

I have a Haskell application which uses optparse-applicative library for CLI arguments parsing. My data type for CLI arguments contains FilePaths (both files and directories), Doubles and etc. optparse-applicative can handle parse errors but I want to ensure that some files and some directories exist (or don't exist), numbers are >= 0 and etc.
What can be done is an implementation of a bunch of helper functions like these ones:
exitIfM :: IO Bool -> Text -> IO ()
exitIfM predicateM errorMessage = whenM predicateM $ putTextLn errorMessage >> exitFailure
exitIfNotM :: IO Bool -> Text -> IO ()
exitIfNotM predicateM errorMessage = unlessM predicateM $ putTextLn errorMessage >> exitFailure
And then I use it like this:
body :: Options -> IO ()
body (Options path1 path2 path3 count) = do
exitIfNotM (doesFileExist path1) ("File " <> (toText ledgerPath) <> " does not exist")
exitIfNotM (doesDirectoryExist path2) ("Directory " <> (toText skKeysPath) <> " does not exist")
exitIfM (doesFileExist path3) ("File " <> (toText nodeExe) <> " already exist")
exitIf (count <= 0) ("--counter should be positive")
This looks too ad-hoc and ugly to me. Also, I need similar functionality for almost every application I write. Are there some idiomatic ways to deal with this sort of programming pattern when I want to do a bunch of checks before actually doing something with data type? The less boilerplate involved the better it is :)
Instead of validating the options record after it has been constructed, perhaps we could use applicative functor composition to combine argument parsing and validation:
import Control.Monad
import Data.Functor.Compose
import Control.Lens ((<&>)) -- flipped fmap
import Control.Applicative.Lift (runErrors,failure) -- form transformers
import qualified Options.Applicative as O
import System.Directory -- from directory
data Options = Options { path :: FilePath, count :: Int } deriving Show
main :: IO ()
main = do
let pathOption = Compose (Compose (O.argument O.str (O.metavar "FILE") <&> \file ->
do exists <- doesPathExist file
pure $ if exists
then pure file
else failure ["Could not find file."]))
countOption = Compose (Compose (O.argument O.auto (O.metavar "INT") <&> \i ->
do pure $ if i < 10
then pure i
else failure ["Incorrect number."]))
Compose (Compose parsy) = Options <$> pathOption <*> countOption
io <- O.execParser $ O.info parsy mempty
errs <- io
case runErrors errs of
Left msgs -> print msgs
Right r -> print r
The composed parser has type Compose (Compose Parser IO) (Errors [String]) Options. The IO layer is for performing file existence checks, while Errors is a validation-like Applicative from transformers that accumulates error messages. Running the parser produces an IO action that, when run, produces an Errors [String] Options value.
The code is a bit verbose but those argument parsers could be packed in a library and reused.
Some examples form the repl:
Λ :main "/tmp" 2
Options {path = "/tmp", count = 2}
Λ :main "/tmpx" 2
["Could not find file."]
Λ :main "/tmpx" 22
["Could not find file.","Incorrect number."]

How do I execute a series of shell commands in Haskell and break on an error?

Suppose I have a list of Strings representing shell commands to execute.
commands = ["git clone https://github.com/test/repo.git", "git checkout origin"]
Also, suppose I have a command, execCommand that takes a string, executes it as a shell command, retrieves the exit code, stdout and stderr, and, if the exit code is nonzero, returns Just the concatenation of stdout and stderr; otherwise, it returns Nothing.
Now, how would I execute that list of commands sequentially while ensuring that subsequent commands do not execute after one command yields an error?
Below is the full code for execCommand.
import System.IO
import System.Process
import System.Exit
createCommand :: String -> FilePath -> CreateProcess
createCommand command curDir =
(shell command){std_out = CreatePipe, std_err = CreatePipe, cwd = Just curDir}
execCommand :: String -> FilePath -> IO (Maybe String)
execCommand command curDir = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command curDir
exitCode <- waitForProcess procHandle
stdOut <- hGetContents hout
stdErr <- hGetContents herr
if exitCode /= ExitSuccess
then return $ Just $ concat [stdOut, stdErr]
else return $ Nothing
Well, this might solve your problem:
{-# LANGUAGE FlexibleInstances #-}
import System.IO
import System.Process
import System.Exit
import Control.Exception
import Control.Monad
import Control.Monad.IO.Class
import Control.Monad.Trans.Maybe
createCommand :: CmdSpec -> FilePath -> CreateProcess
createCommand (ShellCommand command) curDir =
(shell command){std_out = CreatePipe, std_err = CreatePipe, cwd = Just curDir}
createCommand (RawCommand command arguments) curDir =
(proc command arguments){std_out = CreatePipe, std_err = CreatePipe, cwd = Just curDir}
execCommand :: CmdSpec -> FilePath -> IO ()
execCommand command curDir = do
(_, Just hout, Just herr, procHandle) <- createProcess $ createCommand command curDir
exitCode <- waitForProcess procHandle
when (exitCode /= ExitSuccess) $ do
stdOut <- hGetContents hout
stdErr <- hGetContents herr
throwIO $ stdOut ++ stdErr
instance Exception String
execList :: [(CmdSpec, FilePath)] -> MaybeT IO String
execList xs = do
out <- liftIO $ try $ mapM_ (uncurry execCommand) xs
case out of
Left c -> return c
Right _ -> mzero
Notice that this uses FlexibleInstances. This was required for making String an instance of the Exception typeclass (the problem lies in the fact that String = [Char]). You could remove the extension by creating a new type which encloses a string and making it an instance of Exception.
Well, I am pretty sure I figured it out. Instead of using "fancy" stuff, I fell back to good ol' recursion.
runCommands :: [String] -> FilePath -> IO (Maybe String)
runCommands [] _ = return Nothing
runCommands (command:rest) curDir = do
result <- execCommand command curDir
case result of
Nothing -> runCommands rest curDir
Just err -> return $ Just err
You can use mapM_, which has the type
mapM_ :: ( (CmdSpec, FilePath) -> ExceptT String IO ())
-> [(CmdSpec, FilePath)] -> ExceptT String IO ()
and appropriate short-circuiting behavior.

Streaming recursive descent of a directory in Haskell

I am trying to do a recursive descent of a directory structure using Haskell. I would like to only retrieve the child directories and files as needed (lazily).
I wrote the following code, but when I run it, the trace shows that all directories are visited before the first file:
module Main where
import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
names <- getDirectoryContents topPath
let
properNames =
filter (`notElem` [".", ".."]) $
trace ("Processing " ++ topPath) names
paths <- forM properNames $ \name -> do
let path = topPath </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then getRecursiveContents path
else return [path]
return (concat paths)
main :: IO ()
main = do
[path] <- getArgs
files <- getRecursiveContents path
forM_ files $ \file -> putStrLn $ "Found file " ++ file
How can I interleave the file processing with the descent? Is the problem that the files <- getRecursiveContents path action gets performed before the following forM_ in main?
This is exactly the kind of problem that iteratees/coroutines were designed to solve.
You can easily do this with pipes. The only change I made to your getRecursiveContents was to make it a Producer of FilePaths and to respond with the file name instead of returning it. This lets downstream handle the file name immediately instead of waiting for getRecursiveContents complete.
module Main where
import Control.Monad ( forM_, liftM )
import Control.Proxy
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
getRecursiveContents :: (Proxy p) => FilePath -> () -> Producer p FilePath IO ()
getRecursiveContents topPath () = runIdentityP $ do
names <- lift $ getDirectoryContents topPath
let properNames = filter (`notElem` [".", ".."]) names
forM_ properNames $ \name -> do
let path = topPath </> name
isDirectory <- lift $ doesDirectoryExist path
if isDirectory
then getRecursiveContents path ()
else respond path
main :: IO ()
main = do
[path] <- getArgs
runProxy $
getRecursiveContents path
>-> useD (\file -> putStrLn $ "Found file " ++ file)
This prints out each file immediately as it traverses the tree, and it does not require lazy IO. It's also very easy to change what you do with the file names, since all you have to do is switch out the useD stage with your actual file handling logic.
To learn more about pipes, I highly recommend you read Control.Proxy.Tutorial.
Using lazy IO / unsafe... is not a good way to go. Lazy IO causes many problems, including unclosed resources and executing impure actions within pure code. (See also The problem with lazy I/O on Haskell Wiki.)
A safe way is to use some iteratee/enumerator library. (Replacing problematic lazy IO was the motivation for developing these concepts.) Your getRecursiveContents would become a source of data (AKA enumerator). And the data will be consumed by some iterator. (See also Enumerator and iteratee on Haskell wiki.)
There is a tutorial on the enumerator library that just gives an example of traversing and filtering directory tree, implementing a simple find utility. It implements method
enumDir :: FilePath -> Enumerator FilePath IO b
which is basically just what you need. I believe you will find it interesting.
Also there is a nice article explaining iteratees in The Monad Reader, Issue 16: Iteratee: Teaching an Old Fold New Tricks by John W. Lato, the author of the iteratee library.
Today many people prefer newer libraries such as pipes. You may be interested in a comparison: What are the pros and cons of Enumerators vs. Conduits vs. Pipes?.
Thanks to the comment by Niklas B., here is the solution that I have:
module Main where
import Control.Monad ( forM, forM_, liftM )
import Debug.Trace ( trace )
import System.Directory ( doesDirectoryExist, getDirectoryContents )
import System.Environment ( getArgs )
import System.FilePath ( (</>) )
import System.IO.Unsafe ( unsafeInterleaveIO )
-- From Real World Haskell, p. 214
getRecursiveContents :: FilePath -> IO [FilePath]
getRecursiveContents topPath = do
names <- unsafeInterleaveIO $ getDirectoryContents topPath
let
properNames =
filter (`notElem` [".", ".."]) $
trace ("Processing " ++ topPath) names
paths <- forM properNames $ \name -> do
let path = topPath </> name
isDirectory <- doesDirectoryExist path
if isDirectory
then unsafeInterleaveIO $ getRecursiveContents path
else return [path]
return (concat paths)
main :: IO ()
main = do
[path] <- getArgs
files <- unsafeInterleaveIO $ getRecursiveContents path
forM_ files $ \file -> putStrLn $ "Found file " ++ file
Is there a better way?
I was recently looking at a very similar problem, where I'm trying to do a somewhat complicated search using the IO monad, stopping after I find the file I'm interested in. While the solutions using libraries like Enumerator, Conduit, etc. seem to be the best you could do at the time those answers were posted, I just learned IO became an instance of Alternative in GHC's base library about a year ago, which opens up some new possibilities. Here's the code I wrote to try it out:
import Control.Applicative (empty)
import Data.Foldable (asum)
import Data.List (isSuffixOf)
import System.Directory (doesDirectoryExist, listDirectory)
import System.FilePath ((</>))
searchFiles :: (FilePath -> IO a) -> FilePath -> IO a
searchFiles f fp = do
isDir <- doesDirectoryExist fp
if isDir
then do
entries <- listDirectory fp
asum $ map (searchFiles f . (fp </>)) entries
else f fp
matchFile :: String -> FilePath -> IO ()
matchFile name fp
| name `isSuffixOf` fp = putStrLn $ "Found " ++ fp
| otherwise = empty
The searchFiles function does a depth-first search of a directory tree, stopping when it finds what you're looking for, as determined by the function passed as the first argument. The matchFile function is just there to show how to construct a suitable function to use as the first argument for searchFiles; in real life you'd probably do something more complicated.
The interesting thing here is that now you can use empty to make an IO computation "give up" without returning a result, and you can chain computations together with asum (which is just foldr (<|>) empty) to keep trying computations until one of them succeeds.
I find it a little unnerving that the type signature of an IO action no longer reflects the fact that it may deliberately not produce a result, but it sure simplifies the code. I was previously trying to use types like IO (Maybe a), but doing so made it very hard to compose actions.
IMHO there's no longer much reason to use a type like IO (Maybe a), but if you need to interface with code that uses a type like that, it's easy to convert between the two types. To convert IO a to IO (Maybe a), you can just use Control.Applicative.optional, and going the other way, you can use something like this:
maybeEmpty :: IO (Maybe a) -> IO a
maybeEmpty m = m >>= maybe empty pure

Resources