Convert HTML form to PDF - ruby

How do I convert a html form to PDF. I would like to use Prawn for the purpose.
Pointing to any relevant links or examples would be very helpful.

Why would you want to limit yourself to a technology (Prawn) not appropriate to the task (it's not geared towards using HTML to generate the PDF)?
You might want to check out PDFKit instead, as it seems specifically designed to create PDFs from HTML, using powerful existing libraries.
Super short version (two lines!):
kit = PDFKit.new("http://google.com")
kit.to_file('/path/to/save/google.pdf')
Read more about it here:
http://thinkrelevance.com/blog/2010/06/15/rethinking-pdf-creation-in-ruby.html
Check out the RailsCast: http://railscasts.com/episodes/220-pdfkit

Related

Migrate from bookdown to pure Pandoc: split the HTML output in one page per section

I have a book project in RMarkdown, but since I do not use Knitr or other RMarkdown specific features I am considering switching to pure Pandoc to remove the R burden from the dependencies.
For what concerns PDF and ePub output it seems all straightforward to me, but I have some troubles with the HTML output. In fact Pandoc generates a single HTML file with the entire book.
With Bookdown I used the gitbook HTML output which generates a page for each section and each page have the complete TOC on the left sidebar and its footnotes and partial bibliography on the bottom.
To achieve this I thought to write a md file for each section and convert them one by one with Pandoc (for the HTML output, and merge them to one unique file for converting to PDF and ePub), but in this way I cannot have references across sections, have a full bibliography at the end and also easily create a TOC.
So my question is if there is an easy way (e.g. a Pandoc filter or a script) to generate an HTML book (similar to gitbook in behavior, the style doesn't matter) without installing R and Bookdown?
Pandoc follows the philosophy of only writing files that have explicitly be specified on the command line. This is why no such feature is not built in.
It would be possible to do what you want with the help of a custom writer. The basic would be doable in a few lines of Lua code, but it's likely that you'd have to implement all bookdown features yourself.
The best (IMHO) alternative is to use Quarto, a standalone tool built on top of pandoc, created in part by the authors of bookdown. That way you can remove R from your dependencies but retain the features of bookdown -- and more.

Asciidoc: how to get page headers & footers?

Is there a correct way to get Asciidoc to include headers and footers?
I am trying to work out whether Asciidoc is a serious contender for printed material. I know that it is supposed to be docbook compatible, but I can’t find out how to create chapters, headers and footers.
I am trying to create instructional material. Currently I am using Atom with the asciidoc plugin to create the text, and Marked 2 on the Mac to get a better look and to export it to PDF.
Running page headers and footers are not part of the AsciiDoc language but the tool you use for PDF conversion. From my view you have (at least) 2 options:
Follow the instructions in Exporting Print/PDF of your Marked 2 user manual to create page headers and footers (this might turn out difficult using the AsciiDoc processor instead of MultiMarkdown).
"You can specify headers and footers on a per-document basis using MultiMarkdown metadata at the very top of the document"
Since you probably have installed Asciidoctor anyway to support asciidoc in Marked 2 you could use an Asciidoctor PDF theme to generate PDF with headers/footers using Asciidoctor PDF. You would have to find an appropriate theme or create one yourself, though.
The most frequently used way to generate PDF output, however, seems to be generating DocBook output first and convert that to PDF using dblatex with DocBook XSL stylesheets (see AsciiDoc homepage). Maybe someone else can say more about that.

Convert HTML to PDF using TCPDF

I need to convert on the fly an HTML into PDF. I've been using the old HTMLDoc library for a while, but now I need to print SVG graphics and HTMLDoc doesn't support SVG nor base64-enconded images.
So far it seems to me that TCPDF (or a tool based on it) is a good way. The only problem is that I don't want to "build" the PDF document in PHP, as the HTML is dynamic.
Is there any way to write a script that simply takes a portion of HTML page and returns a PDF? That would imply some sort of way to specify start/stop markers in the HTML page (just like in HTMLDoc).
Thanks for any advice,
Thomas
TCPDF has got very limited css support and some bugs with " quotes, so from any html to pdf TCPDF is not an option for you unfortunetly
TCPDF not supports all css functionality, mpdf 6.0 is best ever pdf library for convert html to pdf, It also supports almost required css as well. I prefers htmlcanvas jquery for convert html to image.

Passing Markdown Content to Ruby Function With Jekyll/Liquid

I am trying to write a jekyll plugin that will take a normal markdown file and provide some extra functionality on top of it. In particular, I need to do some (not actually) fancy things with tables. I know you can write straight HTML into a markdown file, but there is a requirement that the content folks don't want to / can't edit HTML.
As an extra wrench in the works, the mobile layout has a UX requirement that I essentially have to render the table as a group of divs as opposed to a table.
My initial thought was to pass the {{page.content}} variable to a ruby function extending Liquid::Tag. From there I was planning on parsing the markdown file and either:
1. If normal non-table markdown, use as normal
2. If table markdown, look for custom identifier in markdown, do what needs to be done (e.g. add class, etc)
If I do something like this:
def render(context)
content = Liquid::Template.parse(#markup).render context
end
It renders the context as a normal markdown file. However, I want to break up the context variable and work with the pieces before rendering. I've tried a few different approaches that I've gotten from the jekyll docs and Stack Overflow and gotten nowhere.
Does anyone have any experience with this? I am heading down the right path? For what it's worth, Ruby/Jekyll/Liquid is fairly new to me, so if you think I may have missed something fairly basic and obvious then please let me know.
A markdown table tool for editors !
markdownify your table in http://www.tablesgenerator.com/markdown_tables
paste the markdown result in http://prose.io/
done
I don't know other way to simplify editor's work on Jekyll, but I'll be very interested in earing from your project. Good luck.

What algorithms could I use to identify content on a web page

I have a web page loaded up in the browser (i.e. its DOM and element positioning are both accessible to me) and I want to find the block element (or a sorted list of these elements), which likely contains the most content (as in a continuous block of text). The goal is to exclude things like menus, headers, footers and such.
This is my personal favorite: VIPS: a Vision-based Page Segmentation Algorithm
First, if you need to parse a web page, I would use HTMLAgilityPack to transform it to an XML. It will speed everything and will enable you, using a simple XPath to go directly to the BODY.
After that, you have to run on all the divs (You can get all the DIV elements in a list from the agility pack), and get whatever you want.
There's a simple technique to do this,based on analysing how "noisy" HTML is, i.e., what is the ratio of markup to displayed text through an html page. The Easy Way to Extract Useful Text from Arbitrary HTML describes this tex, giving some python code to illustrate.
Cf. also the HTML::ContentExtractor Perl module, which implements this idea. It would make sense to clean the html first, if you wanted to use this, using beautifulsoup.
I would recommend Vit Baisa's thesis on Web Content Cleaning, I think he has some code too, but I can't find a link for it. There is also a discussion of the very same problem on the natural language processing LingPipe blog.

Resources