pandoc-citeproc as API: processCites' does not add references - pandoc-citeproc

I have a small text file in markdown :
---
title: postWithReference
author: auf
date: 2010-07-29
keywords: homepage
abstract: |
What are the objects of
ontologists .
bibliography: "/home/frank/Workspace8/SSG/site/resources/BibTexLatex.bib"
csl: "/home/frank/Workspace8/SSG/site/resources/chicago-fullnote-bibliography-bb.csl"
---
An example post. With a reference to [#Frank2010a] and more[#navratil08].
## References
and process it in Haskell with processCites' which has a single argument, namely the Pandoc data resulting from readMarkdown. The bibliography and the csl style should be taken from the input file.
The process does not produce errors, but the result of processCites is the same text as the input; references are not treated at all. For the same input the references are resolved with the standalone pandoc (this excludes errors in the bibliography and the csl style)
pandoc -f markdown -t html --filter=pandoc-citeproc -o p1.html postWithReference.md
The issue is therefore in the API. The code I have is:
markdownToHTML4 :: Text -> PandocIO Value
markdownToHTML4 t = do
pandoc <- readMarkdown markdownOptions t
let meta2 = flattenMeta (getMeta pandoc)
-- test if biblio is present and apply
let bib = Just $ ( meta2) ^? key "bibliography" . _String
pandoc2 <- case bib of
Nothing -> return pandoc
_ -> do
res <- liftIO $ processCites' pandoc -- :: Pandoc -> IO Pandoc
when (res == pandoc) $
liftIO $ putStrLn "*** markdownToHTML3 result without references ***"
return res
htmltex <- writeHtml5String html5Options pandoc2
let withContent = ( meta2) & _Object . at "contentHtml" ?~ String ( htmltex)
return withContent
getMeta :: Pandoc -> Meta
getMeta (Pandoc m _) = m
What do I misunderstand? are there any reader options necessary for citeproc? The bibliography is a BibLatex file.
I found in hakyll code a comment, which I cannot understand in light of the code there - perhaps somebody knows what the intention is.
-- We need to know the citation keys, add then *before* actually parsing the
-- actual page. If we don't do this, pandoc won't even consider them
-- citations!

I have a workaround (not an answer to the original question, I still hope that somebody can identify my error!). It is simple to call the standalone pandoc with System.readProess and pass the text and get the result back, not even reading and writing files:
processCites2x :: Maybe FilePath -> Maybe FilePath -> Text -> ErrIO Text
-- porcess the cites in the text (not with the API)
-- using systemcall because the standalone pandoc works with
-- call: pandoc -f markdown -t html --filter=pandoc-citeproc
-- with the input text on stdin and the result on stdout
-- the csl and bib file are used from text, not from what is in the arguments
processCites2x _ _ t = do
putIOwords ["processCite2" ] -- - filein\n", showT styleFn2, "\n", showT bibfn2]
let cmd = "pandoc"
let cmdargs = ["--from=markdown", "--to=html5", "--filter=pandoc-citeproc" ]
let cmdinp = t2s t
res :: String <- callIO $ System.readProcess cmd cmdargs cmdinp
return . s2t $ res
-- error are properly caught and reported in ErrIO
t2s and s2t are conversion utilities between string and text, ErrIO is ErrorT Text a IO and callIO is essentially liftIO with handling of errors.

The original problem was very simple: I had not included the option Ext_citations in the markdownOptions. When it is included, the example works (thanks to help I received from the pandoc-citeproc issue page). The referenced code is updated...

Related

Keep custom code block attributes in pandoc when converting to Markdown

I am converting an org file to Markdown (specifically commonmark). I am adding a custom attribute to my code blocks, which the commonmark writer does not support, and strips them from the code block during conversion. I am trying to find a way to keep my custom attributes.
This is what I have:
#+begin_src python :hl_lines "2"
def some_function():
print("foo bar")
return
#+end_src
This is what I want in my .md file:
``` python hl_lines="2"
def some_function():
print("foo bar")
return
```
After doing some research, I think a filter can solve my issue: I am now playing with panflute, a python lib for writing pandoc filters.
I found some relevant questions, but they apply to other conversions (rST -> html, rst -> latex) and I don't know enough Lua to translate the code into Python and the org -> md conversion.
Thanks for any help.
I was able to write a script, posting it here for future Python-based questions about pandoc filters.
The filter below requires panflute, but there are other libs for pandoc filters in Python.
import panflute
def keep_attributes_markdown(elem, doc, format="commonmark"):
"""Keep custom attributes specified in code block headers when exporting to Markdown"""
if type(elem) == panflute.CodeBlock:
language = "." + elem.classes[0]
attributes = ""
attributes = " ".join(
[key + "=" + value for key, value in elem.attributes.items()]
)
header = "``` { " + " ".join([language, attributes]).strip() + " }"
panflute.debug(header)
code = elem.text.strip()
footer = "```"
content = [
panflute.RawBlock(header, format=format),
panflute.RawBlock(code, format=format),
panflute.RawBlock(footer, format=format),
]
return content
def main(doc=None):
return panflute.run_filter(keep_attributes_markdown, doc=doc)
if __name__ == "__main__":
main()
You can now run the following command:
pandoc --from=org --to=commonmark --filter=/full/path/to/keep_attributes_markdown.py --output=target_file.md your_file.org

How to debug with PureScript?

Issue
Following is a minimal, contrived example:
read :: FilePath -> Aff String
read f = do
log ("File: " <> f) -- (1)
readTextFile UTF8 f -- (2)
I would like to do some debug logging in (1), before a potential error on (2) occurs. Executing following code in Spago REPL works for success cases so far:
$ spago repl
> launchAff_ $ read "test/data/tree/root.txt"
File: test/data/tree/root.txt
unit
Problem: If there is an error with (2) - file is directory here - , (1) seems to be not executed at all:
$ spago repl
> launchAff_ $ read "test/data/tree"
~/purescript-book/exercises/chapter9/.psci_modules/node_modules/Effect.Aff/foreign.js:532
throw util.fromLeft(step);
^
[Error: EISDIR: illegal operation on a directory, read] {
errno: -21,
code: 'EISDIR',
syscall: 'read'
}
The original problem is more complex including several layers of recursions (see E-Book exercise 3), where I need logging to debug above error.
Questions
How can I properly log regardless upcoming errors here?
(Optional) Is there a more sophisticated, well-established debugging alternative - purescript-debugger? A decicated VS Code debug extension/functionality would be the cherry on the cake.
First of all, the symptoms you observe do not mean that the first line doesn't execute. It does always execute, you're just not seeing output from it due to how console works in the PureScript REPL. The output gets swallowed. Not the only problem with REPL, sadly.
You can verify that the first line is always executed by replacing log with throwError and observing that the error always gets thrown. Or, alternatively, you can make the first line modify a mutable cell instead of writing to the console, and then examine the cell's contents.
Finally, this only happens in REPL. If you put that launchAff_ call inside main and run the program, you will always get the console output.
Now to the actual question at hand: how to debug trace.
Logging to console is fine if you can afford it, but there is a more elegant way: Debug.trace.
This function has a hidden effect - i.e. its type says it's pure, but it really produces an effect when called. This little lie lets you use trace in a pure setting and thus debug pure code. No need for Effect! This is ok as long as used for debugging only, but don't put it in production code.
The way it works is that it takes two parameters: the first one gets printed to console and the second one is a function to be called after printing, and the result of the whole thing is whatever that function returns. For example:
calculateSomething :: Int -> Int -> Int
calculateSomething x y =
trace ("x = " <> show x) \_ ->
x + y
main :: Effect Unit
main =
log $ show $ calculateSomething 37 5
> npx spago run
'x = 37'
42
The first parameter can be anything at all, not just a string. This lets you easily print a lot of stuff:
calculateSomething :: Int -> Int -> Int
calculateSomething x y =
trace { x, y } \_ ->
x + y
> npx spago run
{ x: 37, y: 5 }
42
Or, applying this to your code:
read :: FilePath -> Aff String
read f = trace ("File: " <> f) \_ -> do
readTextFile UTF8 f
But here's a subtle detail: this tracing happens as soon as you call read, even if the resulting Aff will never be actually executed. If you need tracing to happen on effectful execution, you'll need to make the trace call part of the action, and be careful not to make it the very first action in the sequence:
read :: FilePath -> Aff String
read f = do
pure unit
trace ("File: " <> f) \_ -> pure unit
readTextFile UTF8 f
It is, of course, a bit inconvenient to do this every time you need to trace in an effectful context, so there is a special function that does it for you - it's called traceM:
read :: FilePath -> Aff String
read f = do
traceM ("File: " <> f)
readTextFile UTF8 f
If you look at its source code, you'll see that it does exactly what I did in the example above.
The sad part is that trace won't help you in REPL when an exception happens, because it's still printing to console, so it'll still get swallowed for the same reasons.
But even when it doesn't get swallowed, the output is a bit garbled, because trace actually outputs in color (to help you make it out among other output), and PureScript REPL has a complicated relationship with color:
> calculateSomething 37 5
←[32m'x = 37'←[39m
42
In addition to Fyodor Soikin's great answer, I found a variant using VS Code debug view.
1.) Make sure to build with sourcemaps:
spago build --purs-args "-g sourcemaps"
2.) Add debug configuration to VS Code launch.json:
{
"version": "0.2.0",
"configurations": [
{
"type": "pwa-node",
"request": "launch",
"name": "Launch Program",
"skipFiles": ["<node_internals>/**"],
"runtimeArgs": ["-e", "require('./output/Main/index.js').main()"],
"smartStep": true // skips files without (valid) source map
}
]
}
Replace "./output/Main/index.js" / .main() with the compiled .js file / function to be debugged.
3.) Set break points and step through the .purs file via sourcemap support.

Trying to get all paths in a YAML file

I've got an input YAML file (test.yml) as follows:
# sample set of lines
foo:
x: 12
y: hello world
ip_range['initial']: 1.2.3.4
ip_range[]: tba
array['first']: Cluster1
array2[]: bar
The source contains square brackets for some keys (possibly empty).
I'm trying to get a line by line list of all the paths in the file, ideally like:
foo.x: 12
foo.y: hello world
foo.ip_range['initial']: 1.2.3.4
foo.ip_range[]: tba
foo.array['first']: Cluster1
array2[]: bar
I've used the yamlpaths library and the yaml-paths CLI, but can't get the desired output. Trying this:
yaml-paths -m -s =foo -K test.yml
outputs:
foo.x
foo.y
foo.ip_range\[\'initial\'\]
foo.ip_range\[\]
foo.array\[\'first\'\]
Each path is on one line, but the output has all the escape characters ( \ ). Modifying the call to remove the -m option ("expand matching parent nodes") fixes that problem but the output is then not one path per line:
yaml-paths -s =foo -K test.yml
gives:
foo: {"x": 12, "y": "hello world", "ip_range['initial']": "1.2.3.4", "ip_range[]": "tba", "array['first']": "Cluster1"}
Any ideas how I can get the one line per path entry but without the escape chars? I was wondering if there is anything for path querying in the ruamel modules?
Your "paths" are nothing more than the joined string representation of the keys (and probably indices) of the
mappings (and potentially sequences) in your YAML document.
That can be trivially generated from data loaded from YAML with a recursive function:
import sys
import ruamel.yaml
yaml_str = """\
# sample set of lines
foo:
x: 12
y: hello world
ip_range['initial']: 1.2.3.4
ip_range[]: tba
array['first']: Cluster1
array2[]: bar
"""
def pathify(d, p=None, paths=None, joinchar='.'):
if p is None:
paths = {}
pathify(d, "", paths, joinchar=joinchar)
return paths
pn = p
if p != "":
pn += '.'
if isinstance(d, dict):
for k in d:
v = d[k]
pathify(v, pn + k, paths, joinchar=joinchar)
elif isinstance(d, list):
for idx, e in enumerate(d):
pathify(e, pn + str(idx), paths, joinchar=joinchar)
else:
paths[p] = d
yaml = ruamel.yaml.YAML(typ='safe')
paths = pathify(yaml.load(yaml_str))
for p, v in paths.items():
print(f'{p} -> {v}')
which gives:
foo.x -> 12
foo.y -> hello world
foo.ip_range['initial'] -> 1.2.3.4
foo.ip_range[] -> tba
foo.array['first'] -> Cluster1
array2[] -> bar
While Anthon's answer certainly produces the output you were after, I think your question was specifically about how to get the yaml-paths command to produce the desired output. I'll address that original question.
As of version 3.5.0, the yamlpath project's yaml-paths command supports a --noescape option which removes the escape symbols from output. Using your input file and the new option, you may find this output more to your liking:
$ yaml-paths --nofile --expand --keynames --noescape --values --search='=~/.*/' test.yml
foo.x: 12
foo.y: hello world
foo.ip_range['initial']: 1.2.3.4
foo.ip_range[]: tba
foo.array['first']: Cluster1
array2[]: bar
Note:
Using the --values option includes the value with each YAML Path.
For interest, I changed the --search expression to match every node in the input file rather than only the "foo" data.
The default output (without setting --noescape) produces YAML Paths which can be used as direct input into other YAML Path parsers and processors; setting --noescape changes this to render human-friendly paths which may not work as downstream YAML Path input.
Disclaimer: I am the author of the yamlpath project. Should you ever run into issues or have questions about it, please visit the project's GitHub project site and engage me via Issues (bugs and feature requests) or Discussions (questions). Thank you!

Plot code does not work in R Studio but does in R

i am trying to move plot results in to rmarkdown in R studio
the following code fails
```{r front_stuff ,echo=FALSE,fig.height=3,fig.width=4}
library(ggplot2)
library(cowplot)
library(lubridate)
library(reshape2)
library(htmlTable)
library(data.table)
library(png)
project_folder<-"C:\\Users\\jciconsult\\SkyDrive\\trial_retail\\"
load(paste0(project_folder,"sa_prov_html.RSave"))
load(paste0(project_folder,"Ontario_plot_save.RSave"))
ls()
```
`r ggdraw(cow_plot1)`
Error message is
Quitting from lines 29-29 (test1.Rmd)
Error in vapply(x, format_sci_one, character(1L), ..., USE.NAMES = FALSE) :
values must be length 1,
but FUN(X[[1]]) result is length 2
Calls: ... paste -> hook -> .inline.hook -> format_sci -> vapply
Execution halted
If I take the same code and copy it into a clear R session (eliminating the stuf for code blocks), everything works.
What I am trying to do is get a document that can convert to word. I am using the knit HTML option because that is needed to get my htmlTable output to work.
I want something that I can cut and paste into word for final formatting,
The plot cannot be drawn because it is inline code. Try using a code chunk instead:
```{r}
ggdraw(cow_plot1)
```
Also, the proper way to set a working directory with knitr (which seems to be what you want to achieve) is with the knitr option root.dir:
library(knitr)
opts_knit$set(root.dir = project_folder)
load("sa_prov_html.RSave")
load("Ontario_plot_save.RSave")

Convert markdown italics and boldface to latex

I want to be able to convert markdown italics and boldface to latex versions on the fly (i.e., give a text string(s) return a text string(s)). I thought easy. Wrong! (Which it still may be). See the sill buisness and error I tried at the bottom.
What I have (note the starting asterisk that's been escaped as in markdown):
x <- "\\*note: I *like* chocolate **milk** too ***much***!"
What I would like:
"*note: I \\emph{like} chocolate \\textbf{milk} too \\textbf{\\emph{much}}!"
I'm not attached to regex but would prefer a base solution (though not essential).
Silly business:
helper <- function(ins, outs, x) {
gsub(paste0(ins[1], ".+?", ins[2]), paste0(outs[1], ".+?", outs[2]), x)
}
helper(rep("***", 2), c("\\textbf{\\emph{", "}}"), x)
Error in gsub(paste0(ins[1], ".+?", ins[2]), paste0(outs[1], ".+?", outs[2]), :
invalid regular expression '***.+?***', reason 'Invalid use of repetition operators'
I have this toy that Ananda Mahto helped me make if it's helpful. You could access it from reports via wheresPandoc <- reports:::wheresPandoc
EDIT Per Ben's comments I tried:
action <- paste0(" echo ", x, " | ", wheresPandoc(), " -t latex ")
system(action)
*note: I *like* chocolate **milk** too ***much***! | C:\PROGRA~2\Pandoc\bin\pandoc.exe -t latex
EDIT2 Per Dason's comments I tried:
out <- paste("echo", shQuote(x), "|", wheresPandoc(), " -t latex"); system(out)
system(out, intern = T)
> system(out, intern = T)
\*note: I *like* chocolate **milk** too ***much***! | C:\PROGRA~2\Pandoc\bin\pandoc.exe -t latex
The lack of pipes on Windows made this tricky, but you can get around it using input to provide the stdin:
> x = system("pandoc -t latex", intern=TRUE, input="\\*note: I *like* chocolate **milk** too ***much***!")
> x
[1] "*note: I \\emph{like} chocolate \\textbf{milk} too \\textbf{\\emph{much}}!"
Noting I am working on windows and from ?system
This means that redirection, pipes, DOS internal commands, ... cannot be used
and the note from ?system2
Note
system2 is a more portable and flexible interface than system,
introduced in R 2.12.0. It allows redirection of output without
needing to invoke a shell on Windows, a portable way to set
environment variables for the execution of command, and finer control
over the redirection of stdout and stderr. Conversely, system (and
shell on Windows) allows the invocation of arbitrary command lines.
Using system2
system2('pandoc', '-t latex', input = '**em**', stdout = TRUE)

Resources