Save desired pdf page from pdf file using ruby

Save desired pdf page from pdf file using ruby - ruby

I have a pdf file. I want to save a given page (page #5, for intsance) as other pdf file. How do I accomplish this?
So far I've used pdf-reader gem – but it's suitable for reading pdfs (though I can get to the page I want), and I gem prawn – this one's only for writing pdf (I can only create empty pdf file).

It was here: Statically compile pdftk for Heroku. Need to split PDF into single page files
Try something like:
require 'prawn'
Prawn::Document.generate('new.pdf') do |pdf|
pdf.start_new_page(:template => "input.pdf", :template_page => 5)
end

Don't know on what system you are, but on my ubuntu12.04 box, the pdftk program worked:
pdftk A=your_beautiful.pdf cat A3 output page3.pdf
So you could use backticks for this, and read up on http://www.linuxjournal.com/content/tech-tip-extract-pages-pdf

Related

Migrate from bookdown to pure Pandoc: split the HTML output in one page per section

I have a book project in RMarkdown, but since I do not use Knitr or other RMarkdown specific features I am considering switching to pure Pandoc to remove the R burden from the dependencies.
For what concerns PDF and ePub output it seems all straightforward to me, but I have some troubles with the HTML output. In fact Pandoc generates a single HTML file with the entire book.
With Bookdown I used the gitbook HTML output which generates a page for each section and each page have the complete TOC on the left sidebar and its footnotes and partial bibliography on the bottom.
To achieve this I thought to write a md file for each section and convert them one by one with Pandoc (for the HTML output, and merge them to one unique file for converting to PDF and ePub), but in this way I cannot have references across sections, have a full bibliography at the end and also easily create a TOC.
So my question is if there is an easy way (e.g. a Pandoc filter or a script) to generate an HTML book (similar to gitbook in behavior, the style doesn't matter) without installing R and Bookdown?

Pandoc follows the philosophy of only writing files that have explicitly be specified on the command line. This is why no such feature is not built in.
It would be possible to do what you want with the help of a custom writer. The basic would be doable in a few lines of Lua code, but it's likely that you'd have to implement all bookdown features yourself.
The best (IMHO) alternative is to use Quarto, a standalone tool built on top of pandoc, created in part by the authors of bookdown. That way you can remove R from your dependencies but retain the features of bookdown -- and more.

fenced_divs pandoc extension in RMarkdown

Is there a way, either in YAML or within an R script/Rmd, to turn on the fenced_divs pandoc extension?
If possible, I would prefer being able to turn on fenced_divs without having to specify it inside each individual output format in the YAML block but rather once, globally.
The reason is that I want to have within-document links to items that are not headers using the same code for .docx and .html.
Thanks.

How can I the image start from a new line when using dita-ot-3.0 pdf plug outputting PDF?

hope someone to help me, thanks!
When I use dita-ot-3.0 to output pdf, I find in many places, the images are displayed in the same line with text.
I try two plugins:
- default pdf2 plugin
- a customization pdf plugin using PDF Plugin Generator at https://github.com/jelovirt/dita-generator (I didnot find any place to set image attribute)
So here is my question, I write xml like this:
source xml file
When I output pdf, I get this:
image is shown in the same line with text
there are many other images showing like that. How can I make the image show starting from a new line? Am I misusing the element？ Is the difference between inline element or block?

You should be able to do this by setting the attribute placement="break" to the <image> element. This is explained in the spec.

How to avoid img size tags on markdown when converting docx to markdown?

I'm converting docx files using pandoc 1.16.0.2 and everything works great except right after each image, the size attributes are showing as text in teh
![](./media/media/image4.png){width="3.266949912510936in"
height="2.141852580927384in"}
So it shows the image fine in the md but also the size tag as plain text right behind/after/below each image. The command I'm using is:
pandoc --extract-media ./media2 -s word.docx markdown -o exm_word2.md
I've read the manual as best I can but don’t see any flags to use to control this. Also most searches are coming up where people want to have the attributes and control them.
Any suggestions to kill the size attributes or is my markdown app (MarkdownPad2 - v-2.5.x) reading this md wrong?

Use -w gfm as argument in the command line to omit the dimensional of Images.

You could write a filter to do this. You'll need to install panflute. Save this as remove_img_size.py:
import panflute as pf
def change_md_link(elem, doc):
if isinstance(elem, pf.Image):
elem.attributes.pop('width', None)
elem.attributes.pop('height', None)
return elem
if __name__ == "__main__":
pf.run_filter(change_md_link)
Then compile with
pandoc word.docx -F remove_img_size.py -o exm_word2.md

There are two ways to do this: either remove all image attributes with a Lua filter or choose an output format that doesn't support attributes on images.
Output format
The easiest (and most standard-compliant) method is to convert to commonmark. However, CommonMark allows raw HTML snippets, so pandoc tries to be helpful and creates an HTML <img> element for images with attributes. We can prevent that by disabling the raw_html format extension:
pandoc --to=commonmark-raw_html ...
If you intend to publish the document on GitHub, then GitHub Flavored Markdown (gfm) is a good choice.
pandoc --to=gfm-raw_html ...
For pandoc's Markdown, we have to also disable the link_attributes extension:
pandoc --to=markdown-raw_html-link_attributes ...
This last method is the only one that works with older (pre 2.0) pandoc version; all other suggestions here require newer versions.
Lua filter
The filter is straight-forward, it simply removes all attributes from all images
function Image (img)
img.attr = pandoc.Attr{}
return img
end
To apply the filter, we need to save the above into a file no-img-attr.lua and pass that file to pandoc with
pandoc --lua-filter=no-img-attr.lua ...

Prevent asciidoc from converting a file path into a link

I'm manually converting a MS Word document to asciidoc format.
By doing so I ran into an issue that I can't work around yet.
There is an example where I want to show the reader of how the syntax of a file link should look like.
So I used this as an example:
file:///<Path>/<to>/<Keytab>
Asciidoc now renders this pseudo link into an actual link and warns me about this while converting my asciidoc document into HTML and PDF.
Usually, I would simply use the [source] element to prevent the link rendering. But the file link is part of a table.
[options="header,footer",cols="15%,85%"]
|=======================
|parameter|usage
|keyTabLocation |file:///<Path>/<to>/<Keytab>
|=======================
Is there a way to prevent the rendering/convertion of the file link?

Okay, I found the solution. I had to escape the whole macro using a \ at the beginning.
So this did the trick:
[options="header,footer",cols="15%,85%"]
|=======================
|parameter|usage
|keyTabLocation |\file:///<Path>/<to>/<Keytab>
|=======================

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Save desired pdf page from pdf file using ruby - ruby

It was here: Statically compile pdftk for Heroku. Need to split PDF into single page files Try something like: require 'prawn' Prawn::Document.generate('new.pdf') do |pdf| pdf.start_new_page(:template => "input.pdf", :template_page => 5) end

Don't know on what system you are, but on my ubuntu12.04 box, the pdftk program worked: pdftk A=your_beautiful.pdf cat A3 output page3.pdf So you could use backticks for this, and read up on http://www.linuxjournal.com/content/tech-tip-extract-pages-pdf

Related

Migrate from bookdown to pure Pandoc: split the HTML output in one page per section

fenced_divs pandoc extension in RMarkdown

How can I the image start from a new line when using dita-ot-3.0 pdf plug outputting PDF?

How to avoid img size tags on markdown when converting docx to markdown?

Prevent asciidoc from converting a file path into a link

Categories

Resources