Why can't I include images in pdf using Pandoc? - pandoc

I can successfully produce the images when output is HTML, but errors out when attempting pdf output.
input file text for image,
![](images\icon.png "test")
Error produced,
pandoc: Error producing PDF from TeX source.
! Undefined control sequence.
images\icon
l.535 \includegraphics{images\icon.png}

Note that pandoc produces the PDF via LaTeX, as the error message reveals. Your input
![](images\icon.png "test")
is converted into LaTeX
\includegraphics{images\icon.png}
\ in LaTeX has a special meaning: it begins a control sequence. So LaTeX is looking for an \icon command here and not finding it. The fix is to use a forward slash / instead of a backslash \ as path separator. LaTeX allows you to use / for paths even in Windows.
Of course, this may cause problems in some other output formats. I suppose I should change pandoc to convert backslashes in paths to forward slashes when writing LaTeX.

I've had a similar problem on Windows. My images are stored in a subdirectory named "figures". No matter what I tried the path wasn't followed. I have solved it by including --resource-path=.;figures in the call to pandoc.

Related

Add part of filename as PDF metadata using bash script and exiftool

I have about 600 books in PDF format where the filename is in the format:
AuthorForename AuthorSurname - Title (Date).pdf
For example:
Foo Z. Bar - Writing Scripts for Idiots (2017)
Bar Foo - Fun with PDFs (2016)
The metadata is unfortunately missing for pretty much all of them so when I import them into Calibre the Author field is blank.
I'm trying to write a script that will take everything that appears before the '-', removes the trailing space, and then adds it as the author in the PDF metadata using exiftool.
So far I have the following:
for i in "*.pdf";
do exiftool -author=$(echo $i | sed 's/-.*//' | sed 's/[ \t]*$//') "$i";
done
When trying to run it, however, the following is returned:
Error: File not found - Z.
Error: File not found - Bar
Error: File not found - *.pdf
0 image files updated
3 files weren't updated due to errors
What about the -author= phrase is breaking here? Please could someone enlighten me?
You don't need to script this. In fact, doing so will be much slower than letting exiftool do it by itself as you would require exiftool to startup once for every file.
Try this
exiftool -ext pdf '-author<${filename;s/\s+-.*//}' /path/to/target/directory
Breakdown:
-ext pdf process only PDF files
-author the tag to copy to
< The copy from another tag option. In this case, the filename will be treated as a pseudo-tag
${filename;s/\s+-.*//} Copying from the filename, but first performing a regex on it. In this case, looking for 1 or more spaces, a dash, and the rest of the name and removing it.
Add -r if you want to recurse into subdirectories. Add -overwrite_original to avoid making backupfiles with _original added to the filename.
The error with your first command was that the value you wanted to assign had spaces in it and needed to be enclosed by quotes.

Append to PDF file with convert in bash

I'm basically downloading some images from a website using wget to then append them into a PDF file using the command line program "convert". But this last thing seems not to work.
I'm getting all the .jpg images and storing them into one folder with no problems, but when I try to merge them into the PDF file, it always reminds with the last appended image. I've read of the convert's -append argument, but it still won't work.
This is how my code looks like:
for file in *.jpg
do
convert "${file}" -append "myfile.pdf"
done
But as logical as it seems, myfile.pdf always ends up having only the last jpg appended image.
I know that using convert like:
convert img1.jpg img2.jpg img3.jpg myfile.pdf
Would do the trick. But as I don't know how many images will I have in the download directory, I cannot hardcode the arguments, so I guess a loop for each image in that directory as I'm trying would be the best solution.
Does anybody know how to achieve my goal? Any help will be much appreciated.
Thanks in advance.
bash automatically expands wildcard arguments (unless if they are quoted or escaped) so even if convert does not support wildcard expansion, bash does. So you could just do
convert *.jpg myfile.pdf
note that if there are too many files, this can result with "arglist too long". But that should be OK for several hundred files.
If your file name follows a pattern like img1.jpg img2.jpg ..... . Then you may also use bash range:
convert img{1..5}.jpg
this will work for img1.jpg img2.jpg img3.jpg img4.jpg img5.jpg . You can change your range as per your requirement.
For converting all the jpg files , answer is already present in other answer by #Jean-François Fabre.

Can I set command line arguments using the YAML metadata

Pandoc supports a YAML metadata block in markdown documents. This can set the title and author, etc. It can also manipulate the appearance of the PDF output by changing the font size, margin width and the frame sizes given to figures that are included. Lots of details are given here.
I'd like to use the metadata block to remember the command line arguments that I'm supposed to be using, such as --toc and --number-sections. I tried this, adding the following to the top of my markdown:
---
title: My Title
toc: yes
number-sections: yes
---
Then I used the command line:
pandoc -o guide.pdf articheck_guide.md
This did produce a table of contents, but didn't number the sections. I wondered why this was, and if there is a way I can specify this kind of thing from the document so that I don't need to add it on the command line.
YAML metadata are not passed to pandoc as arguments, but as variables. When you call pandoc on your MWE, it does not produce this :
pandoc -o guide.pdf articheck_guide.md --toc --number-sections
as we think it would. rather, it calls :
pandoc -o guide.pdf articheck_guide.md -V toc:yes -V number-sections:yes
Why, then, does you MWE produces a toc? Because the default latex template makes use of a toc variable :
~$ pandoc -D latex | grep toc
$if(toc)$
\setcounter{tocdepth}{$toc-depth$}
So setting toc to any value should produce a table of contents, at least in latex output. In this template, there is no number-sections variables, so this one doesn't work. However, there is a numbersections variable :
~$ pandoc -D latex | grep number
$if(numbersections)$
Setting numbersections to any value will produce numbering in a latex output with the default template
---
title: My Title
toc: yes
numbersections: yes
---
The trouble with this solution is that it only works with some output format. I thought I had read somewhere on the pandoc mailing-list that we soon would be able to use metadata in YAML blocks as intended (ie. as arguments rather than variables), but I can't find it anymore, so maybe it won't happen very soon.
Have a look at panzer (GitHub repository).
This was recently announced and released by Mark Sprevak -- a piece of software, that adds the notion of 'styles' to Pandoc.
It's basically a wrapper around Pandoc. It exploits the concept of YAML metadata blocks to the maximum.
The 'styles' provide a way to set all options for a Pandoc document conversion process with one line ("I want this document be an article/CV/notes/letter.").
You can regard this as more general abstraction than Pandoc templates. Styles are combinations of...
...Pandoc command line options,
...metadata settings,
...templates,
...instructions to run filters, and
...instructions to run pre/postprocessors.
These settings can be customized on a per output type as well as a per document basis. Styles can be...
...combined and
...can bear inheritance relations to each other.
panzer styles simplify Makefiles: they bundle everything concerning the look of a document in one place -- the YAML metadata (a block in the Markdown file, or a separate file).
You just add one line of metadata (style: ...) to your document, and it will be treated as a letter/article/CV/notebook or whatever.

No norwegian characters in LaTeX

I have translated a document from English to Norwegian in the LaTeX format, and while using norwegian special characters, I get an error using
\usepackage[utf8x]{inputenc}
to try and display the norwegian (scandinavian) special characters in PostScript/PDF/DVI format, saying
Package utf8x Error: MalformedUTF-8sequence.
So while that didn't work, I tried out another possible solution:
\usepackage{ucs}
\usepackage[norsk]babel
And when I tried to save that in Emacs I get this message:
These default coding systems were tried to encode text
in the buffer `lol.tex':
(utf-8-unix (905 . 4194277) (916 . 4194245) (945 . 4194278) (950
. 4194277) (954 . 4194296) (990 . 4194277) (1010 . 4194277) (1013
. 4194278) (1051 . 4194277) (1078 . 4194296) (1105 . 4194296))
However, each of them encountered characters it couldn't encode:
utf-8-unix cannot encode these: \345 \305 \346 \345 \370 \345 \345 \346 \345 \370 ...
Thanks to Emacs I have the possibility to check out the properties of those characters and the first one tells me:
character: \345 (4194277, #o17777745, #x3fffe5)
preferred charset: eight-bit (Raw bytes 128-255)
code point: 0xE5
syntax: w which means: word
buffer code: #xE5
file code: not encodable by coding system utf-8-unix
display: not encodable for terminal
Which doesn't tell me much. When I try to build this with texi2dvi --dvipdf filename.text I get a perfectly fine PDF, all without the special norwegian characters.
When I am about to save Emacs also ask me:
"Select coding system (default raw-text):"
And I type in utf-8 to choose its coding system. I have also tried to choose default raw-text to see if I get some different result. But nothing.
At last I tried
\lstset{inputencoding=utf8x, extendedchars=\true}
... a code I came over while trying to google the solution to this problem. Which gives me this error:
Undefined control sequence.
So basically, I have tried every encoding option I have been able to find and nothing works. I am desperately trying to make this work since the norwegian translation must be published before the deadline.
As an additional information I may add that I found out later on that I only had the en_US.UTF-8 in my locale, so I added nb_NO.UTF-8 and nb_NO.ISO-8859-15 and ran locale-gen + reboot without any changes.
I hope I provided enough information to get some assistance, the characters in question is æ ø å.
Apparently your emacs is having a hard time saving the file as UTF-8 (which doesn't make much sense since it should be able to represent all characters using that encoding). You should try using another editor with multiple encoding support to save the file as UTF-8.
While you're unable to save the file in UTF-8, LaTeX will not be able to correctly read it, unless you specify your current file encoding as inputenc package parameter. You may want to try to, for instance, save the file as-is in emacs but specifying \usepackage[latin1]{inputenc} which should do the trick if emacs is writing the file using something in the *iso-8859-** family.
I solved this error by setting the coding system for saving file:
C-x C-m f utf-8-unix

ruby mechanize: how read downloaded binary csv file

I'm not very familiar using ruby with binary data. I'm using mechanize to download a large number of csv files to my local disk. I then need to search these files for specific strings.
I use the save_as method in mechanize to save the file (which saves the file as binary). The content type of the file (according to mechanize) is:
application/vnd.ms-excel;charset=x-UTF-16LE-BOM
From here, I'm not sure how to read the file. I've tried reading it in as a normal file in ruby, but I just get the binary data. I've also tried just using standard unix tools (strings/grep) to try and search without any luck.
When I run the 'file' command on one of the files, I get:
foo.csv: Little-endian UTF-16 Unicode Pascal program text, with very long lines, with CRLF, CR, LF line terminators
I can see the data just fine with cat or vi. With vi I also see some control characters.
I've also tried both the csv and fastercsv ruby libraries, but I get 'IllegalFormatError' exception for these. I've also tried this solution without any luck.
Any help would be greatly appreciated. Thanks.
You can use the command 'iconv' to conver to UTF-8,
# iconv -f 'UTF-16LE' -t 'UTF-8' bad_file.csv > good_file.csv
There is also a wrapper for iconv in the standard library, you could use that to convert the file after reading it into your program.

Resources