My aim is to produce a document by mixing hand-written .tex files and .tex files generated from an Rmarkdown file using knitr with figures rendered as tikzpicture.
How do I achieve the generated .tex file to simply contain
\begin{figure}
\input{unnamed-chunk-1-1.tex}
\end{figure}
I am struggling with tikz as device. I tried the tikzDevice library as well as knitr::opts_chunk$set(dev="tikz") I managed to create a .tex file with a tikzpicture environment for every chunk. Unfortunately knitr rendered said .tex file to pdf and generated an \includegraphics command. The most annoying problem with that is, I need to copy-paste my header into options(tikzLatexPackages) redundantly or "TeX is unable to calculate metrics".
Additionally I did not manage to put the graphics into a figure environment. If I simply enclose the chunk with \begin{figure} and \end{figure}, it will not include the pdf at all anymore and replace the command with ![](figures/unnamed-chunk-1-1.pdf)<!-- -->.
After going through the Rmarkdown documentation, I tried setting the chunkoption external=TRUE, but that had no effect.
EDIT: The environment should be generated when setting the chunk option fig.env='figure', but like external=TRUE, that had no effect on the generated .tex file.
Related
I am trying to do exactly what is described in the following thread:
AppleScript/Automator: renaming PDF with extracted text content of this PDF
So I am using the Chino22's version and there are two issues with it:
First, instead of the contents of the pdf, theFileContentsText gets some metadata stuff.
Second, althought the script runs to the end, I get the following error for the last step:
error "The variable thisFile is not defined." number -2753 from "thisFile"
So, how do I get the text contents instead, and how do I define thisFile to the current pdf that is being processed in the loop?
Thanks in advance!
I would not expect the linked script to work.
Except for document metadata, extracting text content from PDF is notoriously difficult and unreliable, and not a road you want to go down if you can possibly avoid it. Adobe’s PDF file format is designed for printing, not for data processing. PDF files contain blocks of Postscript-like page drawing instructions, typically compressed, and while it’s possible for PDFs also to include the original plain text for accessibility use, most PDF generators do not do this so the only way to get the original text is by reconstructing it from those low-level drawing instructions—not a trivial job.
AppleScript’s read command only reads that raw file data; it does not parse it into drawing instructions, never mind translating those drawing instructions back into plain text. Change a PDF file’s extension to .txt and open it in a plain text editor, and you’ll see what I mean. Nasty.
If you need to work with the PDF’s original content (text, images, whatever), your best solution is to get those files before they were converted into a PDF.
If you must extract content from a PDF file, use an existing tool that knows how to do it.
For instance, if you’re lucky enough to have PDFs that contain XFDF (XML form) or accessibility data, there are 3rd-party apps and libraries to extract that content in readable form. I can’t think offhand of any that are AppleScriptable (Adobe Acrobat has only minimal AS support) so you’ll probably need to find one you can run from command line (do shell script in AS).
Or, if the PDFs have a consistent visual structure, a 3rd-party library such as Python’s PDFMiner (which I’ve used in the past) can identify blocks of characters by position and convert those back into strings with varying degrees of reliability (it has to convert font glyphs back into Unicode characters, guess at which characters are close enough to constitute a word, and where to insert space and return characters between those words). You’ll have to write some Python code to extract the bits you want, so look for tutorials to get started (or pay someone to write it for you).
But again, if you can possibly avoid having to extract text from PDF, you should. You will save yourself a lot of trouble.
I'm trying to convert my existing asciidoc documentation into pdf. Asciidoctor-pdf seems quite easy and I'm able to convert single files into pdf.
asciidoctor-pdf -a pdf-theme='./theme/styles.yml' -a pdf-fontsdir='GEM_FONTS_DIR, theme/fonts/' 01-intro.adoc
But my docs are spread across many files. I want do create a single pdf from all those files. Does anyone know how to do this?
Secondly I don't want the generated pdf to be located next du the adoc file. I want to specify a target path.
I'd appreciate every hint. Thanks and best regards. Sebastian
(Dec 26, 2021)
The easiest and most convenient way is to use the VSCode editor with the AsciiDoc extension installed. This extension is developed by the same team that develops the AsciiDoctor text processor. This is a GUI-based approach to solve all your problems so I'm pretty sure u're gonna love it.
(Step 1) After the extension is installed, use the keyboard shortcut Cmd + , to go to the settings and then enter asciidoc.use_asciidoctorpdf in the search bar and tick the check box (see the demonstration below)
(Step 2) To create a single pdf file from multiple .adoc files, just simply put all of them in a single .adoc file with include::directory-to-the-adoc-file.adoc[] (see the illustration below)
(Step 3) Press F1, then type in as pdf and hit Enter to export this single .adoc file as a single PDF file, this will allow u to specify the target export directory for the PDF. Please be patient and wait for a few seconds for the export to complete, the editor will immediately inform u as soon as the export is complete (see the image at the bottom)
Have you considered to work with includes?
Just add to your document "01-intro.adoc" an any position this line:
include::02-next-file.adoc[]
When you build the 01-intro.adoc with your regular command, the contents of 02-next-file.adoc will be put to the position of the include line. Using this method we create a file with many includes and just build that file. We're very happy with that.
This question is motivated by the answer given in this question
Using the animate package without adobe
I want to create latex beamer presentations without relying on adobe, as it is a pain.
I followed the instructions given in the post's answer, and when compiling the given example code, the output were 4 .svg files, and I have no idea on what to do with them.
Something tells me they should be embedded into an html file that produce a slide-presentation, but I'm a complete noob in html and I've not been able to find an answer on how to achieve this.
No additional wrapper for the individual .svg files is necessary. Simply open the first .svg file in your browser and use the little arrows at the top right for navigation. They automatically link to the next slide.
In LaTeX one could for example have a nice inline equation like $x^2=4$, which in docx format I would be glad to have as italic text.
Is there a way to tell Pandoc to use one of these solutions depending on the output format?
When searching for a possible solution, I realized pandoc has filters and templates. I would not really understand, which direction to follow.
But I would really like to arrive with a more general solution, that would also work for analogous tasks like, for example, smaller spaces between a number and units: In LaTeX straightforward $\;$, but including this in my Markdown document would not give me a satisfactory result in DOCX or ODT output.
This is what I found from the pandoc manual
For docx output, styles will be defined in the output file as inheriting from normal
text, if the styles are not yet in your reference.docx. If they are already defined,
pandoc will not alter the definition.
and please read the --reference-doc=FILE part of the maunal
--reference-doc=FILE
Use the specified file as a style reference in producing a docx or ODT file.
...
how to use the reference-doc in pandoc???
create a empty docx file and rename it (eg. refer.docx)
define the styles you want to display
add "--reference-doc=(refer.docx path)" into your pandoc command line .
I'm trying to use Ghostscript to append a PDF as "last page" to multiple other PDFs. The problem I'm encountering is that Ghostscript walks through the whole PDF and does a bunch of font substitution.
I'm using the following batch script:
FOR %%G IN (*.pdf) DO IF NOT %%G==lastpage.pdf gswin64c -sDEVICE=pdfwrite -sOutputFile="output\%%G" -dNOPAUSE -dBATCH "%%G" lastpage.pdf
Example Error:
Page 12
Substituting font Courier for GGCJBF+Courier.
I will also sometimes get other errors, like this:
jbig2dec FATAL ERROR decoding image: prevent DOS while decoding height classes (segment 0x00)
failed to create parsed JBIG2GLOBALS object.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
All I need gs to do is append my lastpage.pdf to the existing PDFs without walking through the entire PDF I'm appending to, especially with font substitution, because I will not have most of the fonts other people are using in their PDFs.
Is it possible in gs to simply append without walking through every page of the PDF? Is there another tool that will allow appending of PDFs in batches without this issue?
You need to be aware that Ghostscript does not simply manipulate the incoming PDF file, so you aren't 'appending' a page. What it does is interpret the incoming file into marking operations, pass those to a device, and that device takes further action on them. Rendering devices write to a bitmap, pdfwrite reassembles the marking operations into a brand new file.
That's why it 'walks through the whole file', its the way it works. There are advantages to this (its possible to alter the file contents for example) and disadvantages.
Now if you are getting a font substitution for an embedded font, there's something wrong with the embedded font (or possibly you are using a really old version of Ghostscript with a bug). You could try a newer version of Ghostscript but you're never going to get away from processing the entire input file.
Why not try pdftk.