I'm using wkhtmltopdf to generate pdf from html pages.
My question is how to set the position of table of content page? It seems that it automatically generated in the beginning of first page. In addition, how to set the css of content of content?
There's an--xsl-style-sheet (file) parameter to wkhtmltopdf, detailed thusly in the extended command line --help (or -H).
A table of content can be added to the document by adding a toc object to
the command line. For example:
wkhtmltopdf toc http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The table of content is generated based on the H tags in the input
documents. First a XML document is generated, then it is converted to
HTML using XSLT.
The generated XML document can be viewed by dumping it to a file using
the --dump-outline switch. For example:
wkhtmltopdf --dump-outline toc.xml http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The XSLT document can be specified using the --xsl-style-sheet switch.
For example:
wkhtmltopdf toc --xsl-style-sheet my.xsl http://doc.trolltech.com/4.6/qstring.html qstring.pdf
The --dump-default-toc-xsl switch can be used to dump the default
XSLT style sheet to stdout. This is a good start for writing your
own style sheet
wkhtmltopdf --dump-default-toc-xsl
The XML document is in the namespace
http://code.google.com/p/wkhtmltopdf/outline
it has a root node called "outline" which contains a number of
"item" nodes. An item can contain any number of item. These are the
outline subsections to the section the item represents. A item node
has the following attributes:
- "title" the name of the section
- "page" the page number the section occurs on
- "link" a URL that links to the section.
- "backLink" the name of the anchor the the section will link back to.
The remaining TOC options only affect the default style sheet
so they will not work when specifying a custom style sheet.
So you define your own XSLT, possibly based on their default, and pass it in. No problemo.
If you want you can even make your customized TOC using a html file. e.g.; if you want to create TOC on names of html file name(s) which will be used in PDF creation (please note that for this you should know names in advance) then you can do this by passing a HTML file say user_toc.html. In this files you can put all your captions/css etc and make a placeholder for file name. This files needs to be parsed with server side code which should fill the file name in placeholder. Now the modified file can be used for TOC.
Example code in Perl:
my $TOCPage = "user_toc.html";
my $file1 = "Testfile1.html";
my $file2 = "Testfile2.html";
my $toc_file_lines;
# Open the user TOC for parsing and write in a buffer
open(FILE, $TOCPage);
read(FILE, $toc_file_lines, -s $TOCPage);
close(FILE);
# Substitute the placeholder with actual file name
$toc_file_lines =~ s/$file_name_placeholder/$file1/;
# Open the same file again and write back the buffer
open(FILE, ">".$TOCPage);
print FILE $toc_file_lines;
close(FILE);
# Linux path for wkhtmltopdf installation
my $wkhtmltopdf_path = '/usr/bin/wkhtmltopdf-i386';
my $command = "$wkhtmltopdf_path --margin-top 20mm --margin-bottom 15mm
--margin-left 15mm --margin-right 15mm
$TOCPage $file1 $file2 $pdf_manual_name";
`$command 2>&1`;
More over you can add a lot other info in TOC like chapter no/total page in chapter so on.
Hope this helps.
Related
I have a book made of several ordered Markdown files. I am using Pandoc to convert those into an epub file, and things are mostly okay. I can embed the font I like and provide my own CSS, etc. The problem is that the output file contains an element that is not present in the Markdown (as a "#" header element). This element is then being picked up by the ToC function and inserted into the Table of Contents. I didn't ask for the element to be present, and I can't find an option to turn it off.
Here's how to reproduce, with a much simpler case than my actual one, but it's sufficient to demonstrate the problem. I have the following file structure:
- pandoctest/
- src/
- file1.md
- file2.md
- epub.yml
The contents are as follows:
file1.md:
Here is some text.
file2.md:
# Chapter one
The chapter goes here.
epub.yml:
---
title:
- type: main
text: A Book
creator:
- role: author
text: Some Dude
---
And the pandoc command I'm running is:
pandoc -o output.epub epub.yml --toc src/*
The end result is something like this:
Page 1: An appropriate title page using the title and author elements from epub.yml
Page 2: The table of contents page. At the top, the title from epub.yml. Beneath that are two ToC entries. The first is the title of the book and refers to the element I don't want present on the next page. The second is "Chapter One" which refers to the # Chapter One element from my Markdown (this is appropriate).
Page 3: First, the undesired element, which, in the raw XML looks like this:
<h1 class="unnumbered" data-number="">A Book</h1>
Then, "Here is some text", a paragraph that I did indeed tell it to put there.
Page 4: A correctly rendered "Chapter One" page.
The question here is how to get pandoc to not render the "unnumbered" header element that is not present in the Markdown. It screws up the Table of Contents and I never asked for it to be there.
For reference, here is the epub that is rendered from my little test here: https://www.dropbox.com/s/dj4jo08g7q4f9i2/output.epub?dl=0
Using Sphinx documentation generator (with pdflatex), I am creating pdf files and have added links to some of the internal files using label and ref markups like this:
In the called file (xyz.rst)
.. _called-file-label:
In the calling file(abc.rst) I am adding a reference to the label like this:
:ref:`Get Info <called-file-label>`
With the above arrangement, I am able to generate pdf file using pdflatex. However, I find that the called file is also added to the pdf file's bookmarks section which feels somewhat clumsy.
I understand I need to add both the source files in the .. toctree:: section for the hyperlink to appear in the pdf file (I have added the called file using :hidden: directive to prevent the file from showing up in the html document's ToC tree).
My question is: What do I need to do in order that the called file (xyz.rst) does not figure in the bookmarks section of the generated pdf file?
If after .. _called-file-label: label is section:
.. _called-file-label:
Foo Bar
======
Then, the section title "Foo Bar" will always become a bookmark in PDF.
The :hidden: option of toctree is not to hide documents, but to don't show ToC on the place with toctree. I.e. it is to hide toctree, not its documents. Documents in hidden toctree will still be visible in HTML sidebars, PDF bookmarks, etc.
It looks like you need rubric directive. Rubric is like a section, but doesn't make up the table of contents.
I have just recently (yesterday) started using sphinx and read the docs for my project.
Till now I am happy with the Html documentation but the pdf version includes all the articles That appear in the index within the Contents heading. And the Documents orignal content/index is simply comprised of two links.
Please help.
The documentation is at http://todx.rtfd.io and the pdf is here.
When generating the PDF, Sphinx is always adding the content that is referenced via a .. toctree:: directive exactly where the directive is placed.
In contrast, when generating HTML, Sphinx is placing the content after the file that contains the toctree.
In this aspect, PDF generation and HTML generation are simply inconsistent.
Thus, you should place all content of the index page before the table of contents.
In case you want to provide an overview of the most important sections inline, you can use a list of references. Then, you might want to hide the toctree using the hidden property, for example like this:
Contents
--------
- :ref:`quickstart`
- :ref:`userguide`
Features
--------
- Fast
- Simple
- Inituitive
- Easy to Use
- Offline
- Open Source
.. toctree::
:hidden:
quickstart
userguide
I would like to have a two page Indesign document. First page has text + image and second page has 2 images. The images should come from a csv file that gets data merged with the Indesign document. Is this achievable. I have only been able to do a data merge when I have one page, but then all pages have the same layout. Is this possible and how do I do it? Thanks.
The solution is to work with XML files (File -> Import XML) instead of CSV. You can make any type of document and put XML objects in text fields, images, ... XML is much more flexible than CSV.
a CSV field that is named #photo1 will link to a file location
column F (for instance)
#photo1
c:\foldername\filename.jpg
c:\folder2name\subfolder\otherfile.png
f:\file3.tiff
I'm trying to change a indl file. The indl file is a file created by Adobe Indesign to keep the structure of a document, and is basically an XML. I want to use Nokogiri to find some selected XML nodes and replace the text with my text, saving then the xml to another file.
The XML of course is strange: i find some document to retrieve HTML tag with Nokogiri changing text but I don't know How I can manage a piece of XML like this:
<cflo>
<txsr prst="o_u5084" crst="o_u5085" trak="D_10">
<pcnt>c_tEST</pcnt>
</txsr>
<txsr prst="o_u5086" crst="o_u5c" trak="D_20">
<pcnt>c_Titolo titolo titolo</pcnt>
</txsr>
<cflo>
Basically I need to look for a combination of prst and crst attribute and replace the content inside the pcnt node.
I try with this
#doc.xpath("//txsr[prst='o_u5086' and crst='o_u5085']")
but I don't know how I can change ther text inside the pcnt node.
That's not the correct XPath. The correct XPath will look like this:
#doc.xpath("//txsr[#prst='o_u5086'][#crst='o_u5085']")
You should just take the first node from a set and use the inner_html= method to replace the text value.
Full code may be found here: https://gist.github.com/kaineer/7673698