How do I create a pdf with html (DOM) content accounting for the css styles of html?
HTML has MathJax javascript rendered formulae in it. I would like the pdf to show the formulae as they are visible in browser using MathJax.
Also, HTML contains few images.
Which open source java library helps to create such a pdf from html content?
I looked into Apache PdfBox but did not find a solution.
Solution has to be open source library to be used in commercial application.
EDIT
DOM content is dynamic generated on server side and not pushed to browser.
FlyingSaucer is close to what I need. However this does not support java script i.e html with MathJax can not be (properly) converted to pdf.
There are various PDF renderers based on WebKit that support JavaScript. The best I've found is the screen capture feature of PhantomJS.
http://phantomjs.org/screen-capture.html
You'll have to write a bit of code though, and make sure you don't take the screen grab until the JS has finished doing its thing.
Update
Here's a really simple example that reads HTML from stdin and saves a PDF file to disk:
// render.js
var page = new WebPage();
page.content = require("system").stdin.read();
setTimeout( function(){
page.render( 'export.pdf' );
phantom.exit(0);
}, 100 ); // <- wait for JS
You can execute this from whatever server-side runtime you're using, as long as it can execute a shell. To test it from the command line with a static file, you can do:
~/bin/phantomjs render.js < sample.html
Related
I've tried Webbrowser1.Document.body.OuterHTML and Webbrowser1.Document.body.innerHTML, but both are missing JS links and CSS stylings, any way to get the full html, seems it is just grabbing the html within the body not the full source.
Found out...
WebBrowser1.Document.documentElement.outerHTML
I have a site - www.jcrocetta.com.
On this site I have 2 pdf files. One file has blurred data and the other is clear, both files were created with pdftk.
In order to blur out some personal data in the pdf I used Inkscape. But Inkscape only opens/edits one PDF page at a time. After I made my edits in Inkscape I saved the files as .pdf formatted files. At that point I had three separate pdf files, pages 1 through 3. I then used pdftk to concatenate the 3 files into one.
The final pdftk-produced files are on www.jcrocetta.com. Just click the public information button.
In Chrome viewing inline works fine.
Downloading the file from Firefox works fine too.
But viewing inline on Firefox it renders blank pages. How can I fix this?
Also, I know that pdf files not produced with pdftk will render correctly on both Chrome and Firefox.
Thanks for your help.
FireFox has a lovely new feature: It now uses the PDF.js library to render PDF files, instead of calling out to an Adobe Reader plugin, or forcing you to save the file to disk. Unfortunately, it seem that PDF.js isn't quite perfect yet. A quick search shows that other people have the same issue, but the only "solution" I've seen offered boils down to "file a bug report at https://github.com/mozilla/pdf.js/issues or https://bugzilla.mozilla.org/enter_bug.cgi?product=Firefox&component=PDF+Viewer".
Also: Do the three individual PDF files render in FireFox, before you use pdftk to concatenate them?
Has anyone been able to find a way to test pdf's with ruby within the browser? I have tried a few different ways and the only way I have been able to get any pdf testing to work is to save off the pdf and use the pdf_reader gem. This only seems to work on pdf's that, when the link is clicked, opens up a dialog box with the options to open or save the pdf. Unfortunately I have not been able to find a way to do anything like this with pdf's that are opened in browser, with no dialog box options to save it. Any ideas?
Maybe testing it in the browser isnt the best way. When you say test the pdf what are you trying to do? I wouldnt test the pdf in the browser if I was you.
Try docsplit, if you want to verify its contents.
Docsplit is a command-line utility and Ruby library for splitting apart documents into their component parts: searchable UTF-8 plain text via OCR if necessary, page images or thumbnails in any format, PDFs, single pages, and document metadata (title, author, number of pages...)
You are not inventing a browser, or a PDF generator.
Use unit tests to check your back-end modules can take data in, and write PDF out, then serve the PDF in a website and let the browser do its thing. Test (as what Rails calls a "functional test") that the MVC will produce a web page containing a link to the PDF, and you are done.
You can use gem 'mechanize' to download an online PDF (the PDF with in a browser) on your computer and then read it via gem PDF reader.
I want to write a viewer that convert in-design output format to html5 format and all the user design in adobe indesign can display in browser but i do not know which output is suitable for me, i think i can retrieve all info about the adobe indesign in idml export,but the problem is parsing such XML and display the tags in html5 format,i want to know is it possible the simple way to convert the output format into html5?
is it possible to download the adobe indesign SDK and use its method to this purpose?
You can use in5 to export HTML5 (layout intact) from InDesign.
Full disclosure: I am the creator of in5.
Exporting to EPUB would result in XHTML 1.1. The Epub file that InDesign generates is a zip file, in which you will find a number of files. (At least) one of them is an XHTML file.
XHTML 1.1 would surely be an easier source to use than the idml, however you will have to make sure that the ePub export is good enough to start with (the pages won't come out exactly the same as in InDesign).
Would that be a solution?
EPub export is supported from InDesign CS4 (JavaScript based export option, outside the object model, as I understand it and a built-in export option, part of the object model, from CS5).
You don't mention what version of InDesign you are using. CS5, CS5.5 and CS6 all allow you to export to HTML. The problem is that the HTML is version 4 and it create badly written CSS. What I like to do is to use XML to build my own HTML. Just create a set of HTML5 tags you want to use and then Map the existing Paragraph and Character styles to the XML tags.
When you're done you will have a basic content structure. Then I use the Structure pane to add different elements as needed. You can add Parents or children as you need to right there and then export to XML. When you save the file, just change its name to .HTML and edit the code to remove the one reference to "xml".
It takes a little time, but it is very doable.
I have an old program that shows an embedded browser using the HTML Rendering library from Carbon. I am migrating it from codewarrior to Xcode, using the 10.4 SDK. While the HTML is displayed correctly, including links, the images just don't show up. I can see the alt content, and dimensions are properly set with the width and height fields.
I am doing the initialization with:
OSErr err = HRNewReference(m_HRRef, kHRRendererHTML32Type,
GetWindowPort((WindowRef) m_pWindow));
And then I open my local HTML file with:
err = HRGoToFSRef(m_HRRef, &f, false, false);
My images are also stored locally but just do not appear, it was working fine previously on my ppc-only codewarrior compilation.
I tried with web pages on Internet with HRGoToURL, and I tried replacing my pictures src fields with http:// or file:// links to images, in jpg, gif and png, always with the same result.
Are you aware of any issue like this? I know I could, and probably should, migrate to WebKit but that would me more involved.
Sylvain
This is a shot in the dark (I am completely unfamiliar with HTMLRenderingLib), but it reminds me of this. Maybe here too it doesn't load images, even local ones, asynchronously; have you tried letting the run loop loop (whichever way is most appropriate to your app: WaitNextEvent, return back to the main runloop, spin a sub event loop, …) and see if the images load?