Migrate from bookdown to pure Pandoc: split the HTML output in one page per section - pandoc

I have a book project in RMarkdown, but since I do not use Knitr or other RMarkdown specific features I am considering switching to pure Pandoc to remove the R burden from the dependencies.
For what concerns PDF and ePub output it seems all straightforward to me, but I have some troubles with the HTML output. In fact Pandoc generates a single HTML file with the entire book.
With Bookdown I used the gitbook HTML output which generates a page for each section and each page have the complete TOC on the left sidebar and its footnotes and partial bibliography on the bottom.
To achieve this I thought to write a md file for each section and convert them one by one with Pandoc (for the HTML output, and merge them to one unique file for converting to PDF and ePub), but in this way I cannot have references across sections, have a full bibliography at the end and also easily create a TOC.
So my question is if there is an easy way (e.g. a Pandoc filter or a script) to generate an HTML book (similar to gitbook in behavior, the style doesn't matter) without installing R and Bookdown?

Pandoc follows the philosophy of only writing files that have explicitly be specified on the command line. This is why no such feature is not built in.
It would be possible to do what you want with the help of a custom writer. The basic would be doable in a few lines of Lua code, but it's likely that you'd have to implement all bookdown features yourself.
The best (IMHO) alternative is to use Quarto, a standalone tool built on top of pandoc, created in part by the authors of bookdown. That way you can remove R from your dependencies but retain the features of bookdown -- and more.

Related

Asciidoc: how to get page headers & footers?

Is there a correct way to get Asciidoc to include headers and footers?
I am trying to work out whether Asciidoc is a serious contender for printed material. I know that it is supposed to be docbook compatible, but I can’t find out how to create chapters, headers and footers.
I am trying to create instructional material. Currently I am using Atom with the asciidoc plugin to create the text, and Marked 2 on the Mac to get a better look and to export it to PDF.
Running page headers and footers are not part of the AsciiDoc language but the tool you use for PDF conversion. From my view you have (at least) 2 options:
Follow the instructions in Exporting Print/PDF of your Marked 2 user manual to create page headers and footers (this might turn out difficult using the AsciiDoc processor instead of MultiMarkdown).
"You can specify headers and footers on a per-document basis using MultiMarkdown metadata at the very top of the document"
Since you probably have installed Asciidoctor anyway to support asciidoc in Marked 2 you could use an Asciidoctor PDF theme to generate PDF with headers/footers using Asciidoctor PDF. You would have to find an appropriate theme or create one yourself, though.
The most frequently used way to generate PDF output, however, seems to be generating DocBook output first and convert that to PDF using dblatex with DocBook XSL stylesheets (see AsciiDoc homepage). Maybe someone else can say more about that.

Printable PDF output with links as footnotes

I'm using sphinx to generate the documentation of a python project and I'm making heavy use of external links. I'd like to build html and latexpdf outputs with these as clickable links (which is the default), but also a PDF version that will be printed, with these links showing up in footnotes.
In short: is there a way to write external links in a .rst file like this:
Ask a question on `my favorite Q&A website <http://stackoverflow.com/>`_.
and have a special output that will interpret this as if it was a footnote written like this:
Ask a question on my favorite Q&A website [#SO]_.
.. [#SO] http://stackoverflow.com/
while keeping the normal behavior (clickable link without a footnote) in other outputs?
Jongware's comment made me look at parts of the sphinx documentation I didn't see, and I realized there actually is a configuration variable that does what I want:
latex_show_urls = 'footnote'
As I wanted to be able to generate the usual pdf and the one with the footnotes without changing the conf.py file, I left the default value and added the following rule to the sphinx's Makefile:
.PHONY: printpdf
printpdf: SPHINXOPTS+=-Dlatex_show_urls=footnote
printpdf: latexpdf
This rule calls the regular latexpdf rule, adding -Dlatex_show_urls=footnote to the options given to sphinx-build.
With this, we can generate the PDF to be printed (with footnotes) with:
make printpdf
And if we want a regular PDF, without the potentially numerous and (here) useless footnotes, the regular rule still does it:
make latexpdf

How to maintain HTML internal links when converting with Pandoc

I am trying to convert from html to pdf with Pandoc. The output is pretty nice, still with the command pandoc index.html -o output.pdfI lose all my internal links (from table of contents to chapters, from text to footnotes, etc).
In my HTML this is the outdegree link
<p class="calibre18"><span class="calibre8">CHAPTER ONE</span><br class="calibre19"></br>The Ever Expanding Domain of Computation</p>
which then lands here
Chapter 1 makes the case that because of...
and here
<p class="calibre18"><span class="calibre8">CHAPTER ONE</span><br class="calibre19"></br>The Ever Expanding Domain of Computation</p>...
Is there any way to keep all the links also in the output?
The Pandoc User's Guide section on Internal Links says
Internal links are currently supported for HTML formats (including HTML slide shows and EPUB), LaTeX, and ConTeXt.
This suggests that internal links aren't currently supported for PDF output, even though the PDF output is generated via LaTeX.
Internal links should work straightforwardly in PDF. However, for printing purposes, the default is not to color them. Have you tried clicking on the text that should be a link?

Convert HTML form to PDF

How do I convert a html form to PDF. I would like to use Prawn for the purpose.
Pointing to any relevant links or examples would be very helpful.
Why would you want to limit yourself to a technology (Prawn) not appropriate to the task (it's not geared towards using HTML to generate the PDF)?
You might want to check out PDFKit instead, as it seems specifically designed to create PDFs from HTML, using powerful existing libraries.
Super short version (two lines!):
kit = PDFKit.new("http://google.com")
kit.to_file('/path/to/save/google.pdf')
Read more about it here:
http://thinkrelevance.com/blog/2010/06/15/rethinking-pdf-creation-in-ruby.html
Check out the RailsCast: http://railscasts.com/episodes/220-pdfkit

batch html file editing

I have a collection of one thousand HTML files and need to somewhat trim them. I need to delete all the tags inside <body></body> area of those except for one, <div.pg>, to make them clean to be printed. the excess are navigation links which make the prints messy and make the pages occupy more paper. the contents are not the same so I can't find and replace the code excerpt but the tags are the same foe example there are 3 <table> tags to be deleted each with specific class. manipulate specific tags inside batch HTML files?
Any batch processing technique or software to do this job?
What an easy solution on windows?
I would use an xslt transform on each html page you have. Batch is not the tool to manipulate html files. You can use batch as a "manager" to pass the required file to the xsl transform. Also windows have a rudimentary msxml utility which you can download and install to your machine : http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=21714
That's how I would do it. I am sure there are more options.
If it is XHTML you could use XSLT to transform your HTML to "another" format. Look for example here: http://www.w3schools.com/xsl/ or here: http://help.hannonhill.com/discussions/how-do-i/269-strip-specific-html-tag-in-xslt

Resources