How to exclude some entries from TOC - wkhtmltopdf

I'm using the wkhtmltopdf to create some documents that require a TOC; this is done using Hx tags and it works properly (including a modified XSL file). The input HTML is generated by my own code, so I have full control over it.
Now, I need to exclude some, but not all entries at a certain level, like in the sample below
<h1>First</h1>
<h2>First of first</h2> <- exclude
<h2>Second of first</h2> <- exclude
<h1>Second</h1>
<h2>First of second</h2>
<h2>Second of second</h2>
The documentation explains how to customize the XSL; so I have generated the outline for the document and looked at the XML file.
It contains, as described in the manual, elements with four attributes : title, page, link and backLink.
<outline xmlns="http://wkhtmltopdf.org/outline">
<item title="First" page="0" link="__WKANCHOR_0" backLink="__WKANCHOR_1">
<item title="First of first" page="1" link="__WKANCHOR_2" backLink="__WKANCHOR_3"/>
<item title="Second of first" page="1" link="__WKANCHOR_4" backLink="__WKANCHOR_5">
... and so on
I guess that in order to get the desired result there could be two ways :
alter the way the outline is created, or
alter the way the outline is processed
I could not find a way to achieve the first option, and the outline file has not enough information to use in the XSL file.
A couple of notes :
I know I can exclude by title attribute, but the documents may have a lot of them, so excluding by title isn't really an option.
another reason it can't be done is because I may exclude an entry at a certain location but need to include one with the same title elsewhere.
obviously I cannot exclude by page name, since I cannot possibly know in advance where the pages will break.
it would be nice to have those entries in the document outline but not in the TOC, so probably the second way to achieve this would be the proper way.
... so any help is appreciated. Thanks.

After a bit of struggling, here's a solution.
Since I cannot alter the outline file or creation method without tapping into wkhtmltopdf's source files, I tried to find a way to tell apart the TOC entries I want from those I don't want, based on the only available properties; I chose the title since it's the most easy to control.
Enter the "zero-width non-joiner" : ‌
I added this special character as the first element of the TOC entries I don't want, then used the following test within the TOC XSL file.
<xsl:if test="not(starts-with(#title, '‌'))">
This simple trick seems to do the job properly.

Related

Create Index page for ASCII doc

I have a lot of ASCII docs at different locations and I want to create an index page which should render these documents. But the condition here is that I want to list all the document link on the index page and if the user clicks on any link then only the document should be displayed. I don't want to display the documents below the table of content. I just want to display the table of content on the index page.
Is there any way to do this?
If I understand you correctly, you wish to generate a multi-document website, but you want an index page that displays just the TOC, with the other documents served elsewhere. I believe the best way to get this effect would be to generate chunked XHTML output using the DocBook toolchain. I believe this should be possible with Asciidoctor tools, but I have only implemented this particular post-rendering toolchain with the original (Python-based) AsciiDoc rendering tool, as documented here. This setup is configurable to generate a TOC index page that links to chunked output (you can configure the level of chunking).
As you have already figured out, AsciiDoc's automated TOC generation only works on the present document, which requires including the subordinate document to get their headings for the TOC. I can think of ways to sort of game this, such as to include just the heading of the included document (include::path/to/document.adoc[lines=1]) and then hiding even those headings with CSS or something. The problem is, the links in the TOC will be pointing internally, so you'd need to handle that somehow.
Another way is to use any of the static-site generators that support or can be readily extended to support AsciiDoc. What you're talking about is not an out-of-the-box feature that I'm aware of, but they all at least make it possible to generate an organized TOC-type navigation.

How do I insert front matter in latexpdf output in Sphinx

We are considering using Sphinx where I work and it appears to do everything we need. However, I am having issues getting it to match the required corporate template, which requires there to be some front matter pages inserted between the title page and table of contents.
If text is text is placed above the master table of contents in the .rst file, then it is placed above the TOC in the HTML output, but it is moved to below the TOC in the pdf output. I've also tried adding a hidden toc, but that didn't work either. The content also gets placed after the non-hidden toc.
.. toctree::
:hidden:
frontmatter
.. toctree::
:maxdepth: 2
contents_of_document
I know this has to be possible since people have published books using this tool, but I can't figure out how to do it.
I've tried this with sphinx 1.4.0 and 1.4.1. Is this something I need to add a latex sty or cls file to make it work? I would prefer not to since we would like to use both the HTML and PDF outputs.
Thanks
It looks like I need to RTFM. It is in chapter 10 to of the sphinx manual:
’tableofcontents’ “tableofcontents” call, default ’\tableofcontents’. Override if you want to generate a different table of contents or put content between the title page and the TOC.
So it order to do this, you need to learn some LaTeX as you will have to manually (or programmatically) write the from matter separately from the reST documentation.

CKEDITOR How to find and wrap text in span

I am writing a CKEDITOR plugin that needs to wrap certain pieces of text in a tag. From a webservice, I have an array of items that need to be wrapped. The array is just the plain text strings. Such as:
"[best buy", "horrible migraine", "eat cake"]
I need to find the instances of this text in the editor and wrap them in a span tag.
This is further complicated because the text may be marked up. So the HTML for "best buy" might be
"<strong>best</strong> buy"
but the text returned from the web service is stripped of any markup.
I started trying to use a CKEDITOR.htmlParser() object, and that seems like it is moderately successful. I am able to catch the parser.onText event and check if the text contains anything in my array.
But then I cannot modify that text. Modifications are not persisted back to the source html. So I think using the htmlParser() is a dead-end.
What is the best way to accomplish this task?
Oh, and as a bonus, I also do not want to lose my user's current cursor position when the changes are displayed.
Here is what I wound up doing and it seems to be working so far.
I created a text filter rule that searches through my array of items for any item that is contained (or partially contained) in the text. If so, it wraps the element in my span.
A drawback here is that I wind up with two spans for items with markup. But in my usecase, this is tolerable.
Then I set the results using:
editor.document.getBody().setHtml(results);
Because of this, I also have to strip this markup back out when this text gets read. I do this using an elements filter on editor.dataProcessor.htmlFilter.
This seems to be working well for my (so far limited) test cases.

Best Way to Modify Another Programmer's Navigation

"Change an item in the navigation? Sure I can do that in 15 minutes."
So I am trying to update the navigation on a site that I inherited only to find out that the previous programmer was a college student and was using this site as a project of some sort. Needless to say there are zero comments and the code calls function after function and I just can't follow the logic.
I am looking for a roundabout way to update the navigation. I tried using Dreamweaver to search through all of the files in the site and look for any files that contain the name of the page or the url (hoping to find some sort of included file). There was none. I did file text files that control the main navigation but none for the subnavigation.
There is no database.
If it helps here is the site. http://bit.ly/jbs639
And if you want to look at the interesting text file that is parsed to create the main navigation you can find it here: http://bit.ly/m3erna
Hmmm.... Interesting indeed. You have my sympathy.
One thing that I would look at... The file that gets parsed for the main navigation appears to be a simple delimited file. Sure, the delimiter is a rather unusual +++, but that choice means it avoids conflict with things like commas that might be desirable in the link text. It looks as if the last element indicates what type of resource is being accessed (file or directory, although I don't know what - if any - effect that has on the final output). It also appears that there are similar text files (in the framework/cfg/nav/ folder... which should probably not be generally accessible BTW) for the sub-menus. (E.g. the file stores.txt appears to contain the additional navigation items associated with the stores sub-navigation).
You don't mention which sub-menu you're trying to change. I suspect it is the "About TTO" one, which I can't find an entry for... but I'd look to see if there are any similar navigation text files in the /content/about/ folder.
Good Luck!
Of course it was as simple as a function that reads all of the files in the directory and the name of the file. I guess that in this case there was no shortcut.

extract xpath

I want to retrieve the xpath of an attribute (example "brand" of a product from a retailer website).
One way of doing it is using addons like xpather or xpath checker to firefox, opening up the website using firefox and right clicking the desired attrbute I am interested in. This is ok. But I want to capture this information for many attributes and right clicking each and every attribute maybe time consuming. Also, the other problem I have is that attributes I maybe interested in will be there for one product. The other attributes maybe for some other product. So, I will have to go that product & then do it manually again.
Is there an automated or programatic way of retrieving the xpath of the desired attributes from a website rather than having to do this manually?
You must notice that not all websites use valid XML that you can use xpath on...
That said, you should check out some HTML parsers that will allow you to use xpath on HTML even if it is not a valid XML.
Since you did not specify the technology you are working with - I'll suggest the .NET HTML Agility Pack, if you need others, search for questions dealing with this here on SO.
The solution I use for this kind of thing is to write an xpath something like this:
//*[text()="Brand"]/following-sibling::*
//*[text()="Color"]/following-sibling::*
//*[text()="Size"]/following-sibling::*
//*[text()="Material"]/following-sibling::*
It works by finding all elements (labels) with the text you want and then looking to the next sibling in the HTML. Without a specific URL to see I can't help any further.
This is a generalised version you can make more specific versions by replacing the asterisks is tag types, and you can navigate differently by replacing the axis following sibling with something else.
I use xPaths in import.io to make APIs for this kind of thing all the time, It's just a matter of finding a xPath that's generic enough to find the HTML no matter where it is on the page, but being specific enough to get the right data.

Resources