additional settings for wkhtmltopdf? - wkhtmltopdf

I am converting some docs to pdf using wkhtmltopdf (currently using perl and the command line versions). Is it possible to change the "PDF Producer", "PDF Version" and "Fast Web View" fields? The current defaults are "wkhtmltopdf", "1.4 (Acrobat 5.x)", and "No", respectively. I didn't see anything in the wiki page

Pass the following with the command line to see supported features: " --extended-help"
Not sure if those specific params are supported or not.

I patched wkhtmltopdf to support an additional flag recently, and it would be quite easy to add parameters to change those. I don't believe they are supported currently, though.

PDF Producer: Nope. Most apps want folks to know that particular app generated the PDF.
PDF Version: Nope, but trivial. The version number at the beginning of the file is just a courtesy really. What exactly are you after with this? Chances are you don't really need it. The PDF generated isn't going to acquire any features automagically just because the PDF claims to be this version or that. It's only really used so a viewer opening a newer PDF can say something like "I don't support this version, some stuff might not work". Because everything will work regardless (unless someone happens to have a VERY old version of Acrobat/Reader), I don't see the issue.
Fast Web View: Nope, and decidedly non-trivial. "Fast Web View" means everything needed to display the first page of the PDF is sorted to the front of the file, and there are various "hints" on where an app downloading the PDF can find this or that. It's not just a flag, not by a long stretch.
Zero for three. Sorry.

Related

QT Application UTF8-strings in translation get displayed erroneous

We are using QT 5.5 successfully throughout our VC++ projects in VS2015.
Now, i am adding i18n thereto, using QTs Linguist tools to create my strings 2b translated and the resulting .qm files. I load the files through QTranslator object, the translation itself seems to work, but they get displayed wrongly.
As german is my mother tongue, I have to type several umlauts, beside any other special unicode-characters I definitely want to support.
As en example, I use linguist to translate over to über, and the resulting text in my application reads über. What I can surely recognize as an encoding mismatch.
I already had a look on the i18n example, which displays correctly for all of the provided languages, so I right now do not know what's wrong after I checked all file encodings.
Anyone any ideas? Or even has the same problems? Or had them but solved? Any suggestions were greatly appreciated!
This seems to be a Windows-specific problem.
Instead of using QString.toStdString() (what breaks the correct string), better use QString.toLatin1() at least for the languages to support yet.

Wkhtmltopdf version, first page and TOC

Some questions for this very nifty tool, unfortunately lacking many usage examples.
Manual speaks of a possible “Reduced Functionality” for wkhtmltopdf. I have version wkhtmltox-0.11.0_rc1-installer.exe, by running wkhtmltopdf --version what should I read to understand whether my version is the reduced one or not?
Currently I like wkhtmltopdf for webpages I want to read later and/or store. To mirror webpages I use httrack, then I generate the PDF with wkhtmltopdf *.html offline.pdf. How can I set/specify the first PDF page from the *.html list? Currently they seem to be converted in alphabetical order.
If I run wkhtmltopdf toc http://qt-project.org/doc/qt-4.8/qstring.html qstring.pdf I simply get a leading blank page, no TOC. What’s wrong?
Thanks for helping
EDIT:
#Nenotlep:
Your TOC trick works perfectly.
As for the first page, I don’t need an actual cover.
What I need is a way to download/convert a given page www.site.com/foo.html and all the linked pages (A.html, B.html ...) up to a certain depth level. Then I want a single PDF starting with foo.html and containing also the pages A.html, B.html ... (with relative links).
I don’t think there is an option to download and insert the linked pages in the final PDF (please, correct me if I am wrong). So I use httrack.com to download and wkhtmltopdf to convert. Given the alphabetical behaviour of wkhtmltopdf, the best now seems to rename the target page, downloaded with httrack, something like !foo.html.
Please, let me know of possible alternatives.
For part 3 of the question which is blank TOC, the latest stable version 0.12.5 also does not generate it. The pre-release version 0.12.6-dev has fixed this problem in Mac.
I think all available precompiled wkhtmltopdf's are compiled with the patched QT, they are not reduced. The reduced functionality means that it was compiled without a special patched version of QT. I use the windows version and it isn't reduced.
I think the cover command line argument would work for you. I can't test at the moment, but try a command like wkhtmltopdf cover derpy.html toc --xsl-style-sheet default.xsl rarity.html twilight.html spike.html equestriadaily.pdf
At least in Linux, I think the asterix *.html simply explodes into all the html files before the command is performed, so if you select one html file for the cover and then do *.html in the same folder you will get the file twice. Getting around this issue might need some command line sorcery or a batch file or some other trickery.
This is a bug in wkhtmltopdf. The workaround is to manually set a tocfile. You can get the default tocfile with wkhtmltopdf.exe --dump-default-toc-xsl. Then you can save the output as a file and use it like wkhtmltopdf.exe toc --xsl-style-sheet default.xsl www.stackoverflow.com so.pdf.

Generate a business letter for 100+ leads using wkhtmltopdf

We need to print Business Letter for a given list with mail merge facilities.
My client is not willing to spend $$ on a paid ASP.NET control to make PDF. So I opted in for WKHTMLtoPDF and it works fine for us until one day the client tried to get a PDF of 100+ leads, resulting in complete failure of PDF generation. It works just fine with a 10-20 page PDF, but not for 100.
Are there any tips & tricks to improve performance? We are using Cloud-hosted IIS 7 with ASP.NET 4 if that matters.
PDFSharp library is really a nice one!
I have used it for quite a while now, and I find it flexible enough to fulfill your needs.
However there are some aspects of using it as a "standalone library" - e.g creating tables is a headache and there aren't much text formatting options. It is much better to mix it together with MigraDoc (an extension library for PDFSharp).
If you're looking for a really free (as in "free of worries") library, choose iTextPDF versions prior to version 4.1.7, as they state in the ByteScout blog.
From the ByteScout blog:
iTextSharp 4.1.6 DLL only: itextsharp-4.1.6-dll.zip
iTextSharp 4.1.6 Source Code (C#): itextsharp-4.1.6.zip
I'm not sure I understand your problem but couldn't you generate docx documents and get the same results?
For all, I use http://wkhtmltopdf.org/ to create HTML to PDF, my ASP.NET code generate the HtML file then I create HTML to PDF and it is done, much easier than using itextpdf's Table and td structure to get things in better space. I found it easy and fast once you get your stuff aligned properly.
library has improved since original question asked and it performs better now.
here is good tutorial http://www.codeproject.com/Articles/20640/Creating-PDF-Documents-in-ASP-NET

Ghostscript prints question mark when converting digitally signed PDF to image

Ghostscript 9.0 doesn't support validation of the digital signatures in PDF document when doing PDF to image conversion. Instead, there's a question mark on the digital signature, and Ghostscript reports "Sig is not yet implemented". I'm thinking to modify the source code to get rid of the question mark, but I don't have any ideas to where I should modify in thesource code. Could any one give the hints for that? Any response will be appreciated highly, thanks.
Have you already tested the very latest release (which is v9.02)? If so, have you also tested the current 'HEAD' revision of their source code?
If your problem persists with these versions, the preparatory thing to start with is to download the (current, which is v9.02) Ghostscript source code from here or even check it out from their Git repository.
What you are trying to do can only be located in one of the following two modules of the Ghostscript source code:
the (PDF) interpreter
the (image) output devices
So I would first recursively grep the sources for "not yet implemented" or similar expressions, taking into account that there may even be line breaks within the string. (***I doubt the quote you gave in your initial version of the question is accurate, because it contained at least one typo.)
If I didn't find anything in the first step, I'd get into touch with the Ghostscript developers themselves. They usually hang around in IRC on Freenode, channel #ghostscript. In general they are a very friendly and helpful bunch, and they'll surely be able to give you some hints about how to solve your problem if you know how to ask...

Saving PDF files with Chickenfoot

I'm writing a web-crawler using Chickenfoot and need to save PDF files. I can either click the link on the page or grab the PDF's URL and use
go("http://www.whatever.com/file.pdf")
and I get the firefox "Opening file.pdf" dialog box, but can't click the "OK" button to actually save the file.
I've tried using other means to download the files (wget, python's urllib2, twill), but the PDF files are gated so none of those will work.
Any help is appreciated.
This example of how to save a target in the Mozilla developer documents looks like it should do exactly what you want. I've tested a Chickenfoot example that is very similar that gets the temp environment variable, and that worked well for me in Chickenfoot.
https://developer.mozilla.org/en/XPCOM_Interface_Reference/nsIWebBrowserPersist#Example
You might have to play with the application associations in Tools, Options, Applications to make sure the action is set to Save File, but those settings might not apply to these functions.
End Answer, begin related grumblings...
I sure wish someone would fix the many bugs in Chickenfoot, and write a nice Cookbook programming guide. I've been using it for years, and there are still many basic things I've not been able to figure out how to do. I finally broke down and subscribed to the mailing list, as the archives have some decent script examples. It takes a lot of searching through the pdf references, blogs, etc. as the web API reference is very sparse.
I love how simple Chickenfoot can make automating some tasks, but it takes me days of searching javascript, DOM, and Firefox documents to find ways to do some of the things it can't, since I'm not really a web programmer. The goal of Chickenfoot seems to be that I shouldn't have to be, but unfortunately few are refining the proof of concept, as MIT has dropped the project.
I tried to do this several ways using only Chickenfoot commands and confirmed they don't work with the latest Firefox 3 and Chickenfoot 1.0.7.
I hope this helps! Good luck. Sorry I only ran across your question yesterday, but found it too interesting to leave alone.
You won't be able to click on Firefox dialogs for the sake of security.
The best way to download the content of a URL is to read then write the content of the URL.
// Chickenfoot 1.0.7 Javascript Code to download the content of a url.
include( "fileio.js" ); // enables the write function.
var url = "http://google.com",
saveFileTo = "c://chickenfoot-google.com";
write( saveFileTo, read( url ) );
You might find it helpful to use jquery with chickenfoot.
http://groups.csail.mit.edu/uid/chickenfoot/scripts/index.php?title=Using_jQuery,_jQuery_UI_and_similar_libraries
This has worked for me to save Excel files from NCES portal.
http://muaz-khan.blogspot.com/2012/10/save-files-on-disk-using-javascript-or.html
I was using Firefox 3.0 and the "old syntax" version of the code. I also stripped code intended for IE and "(window.URL || window.webkitURL).revokeObjectURL(save.href);" which generated an error.

Resources