wkhtmltopdf with '?' in URL - wkhtmltopdf

My question is about use of '?' with wkhtmltopdf
Using StackOverflow advice, I have wkhtmltopdf working as invoked from a
php webpage...for example this works as expected:
$exec_string = "xvfb-run -a -s "."\"-screen 0, 1024x768x24 \""."
wkhtmltopdf http://example.com temp.pdf";
exec($exec_string);
However, if I add to the URL like this:
http://example.com/?page=clients
wkhtmltopdf ignores the page=clients and produces a pdf identical to the above result. I even tried surrounding with " as
...\"http://example.com/?page=clients \"...
but still no good.
How can I force wkhtmltopdf to pickup the the ?page=clients piece?

I've discovered my problem is more complicated than simply wkhtmltopdf not recognizing ?page=clients.
The website that the page belongs to does 'security' checks prior to displaying a page like this. An experiment done outside this framework shows me that wkhtmltopdf does indeed pickup the page=clients specification.

Related

Combination of --window-status and --javascript_delay in wkhtmltopdf

I want to use wkhtmltopdf to render both pages I control (in which case I can set the window.status when done rendering) and (occasionally) pages I don't control. According to this thread on the mailing list I should be able to set --window-status to some value, and --javascript-delay as well, and rendering starts as soon as either of these conditions is met. That's not my experience; the command wkhtmltopdf --javascript-delay 10000 --window-status imdone http://www.google.com/ /tmp/google.pdf waits forever (version 0.12.3, both on OSX and linux). How can I get the behaviour as described on the mailinglist?
One workaround is to use the --run-script tag to set the window.status after some time manually -- this works both on the version that uses the patched and that that uses the unpatched QT. Note however that --run-script seems to have a small bug in escaping its parameter. Therefore the following line will give you the behaviour requested:
wkhtmltopdf --window-status imdone --run-script \
'window.setTimeout(function(){window.status="imdone";},1000);' \
http://google.com/ /tmp/google.pdf
Note that because of aforementioned bug, it doesn't work if one puts spaces in the --run-script argument, hence the following will not work
wkhtmltopdf --window-status imdone --run-script \
'window.setTimeout(function (){window.status = "imdone";}, 1000);' \
http://google.com/ /tmp/google.pdf

How to resemble the PDF-output of AsciidocFX using a Gradle build-script?

I have the following Asciidoc-document:
= Test
:doctype: article
:notitle:
:!toc:
AsciidocFX shows links in PDFs as footnotes http://stackoverflow.com[SO].
.Asciidoc in PDF does not work in Asciidoctor, but works in AsciidocFX.
[cols="2,5a"]
|===
|Line with Asciidoc code
|here comes a list:
* item 1
* item 2
* item 3
http://stackoverflow.com[Get Answers]!
|Line
|with a footnotefootnote:[footnotes do work in AsciidocFX's PDF output (but not in the preview).]
|===
When generating a PDF using asciidoctor, the output is as follows:
The problems are:
footnotes are shown inline (see: https://github.com/asciidoctor/asciidoctor-pdf/issues/73)
Asciidoc-content in tables cells is not interpreted: https://github.com/asciidoctor/asciidoctor-pdf/issues/6
Link targets are not shown as Footnotes (this would be nice to have)
Using https://github.com/asciidocfx/AsciidocFX shows everything correctly:
Now, I'd like to have the same output that AsciidocFX produces, but still like to use my Gradle build-script.
From https://github.com/asciidoctor/asciidoctor-pdf/issues/73#issuecomment-224327058 I learned, that AsciidoctorFX uses https://github.com/asciidoctor/asciidoctor-fopub[asciidoctor-fopub] under the hood. But how can I this pipeline in my build.gradle. Do I have to generate epub in a first task and use the output in another task? Or is there a direkt way?
Sorry that I am a tad late (almost 7 years!!) to answer your question, but perhaps it will help others.
Perhaps you need to upgrade. When I run your .adoc verbatim, the foootnotes come out perfectly. In fact the output is exactly as you posted correct version of output. Here is the syntax that I use:
asciidoctor-pdf -a pdf-themesdir=/path/to/themes -a pdf-theme=your-pdf-theme-file.yml -a pdf-fontsdir=/path/to/your/fonts/directory/ your_test_file.adoc
I put this syntax in a bash script with the adoc file as an argument.
I am using:
linux Pop!_OS 22.04 LTS (close derivative of ubuntu)
ruby 3.1.2p20
asciidoctor-pdf-2.3.0b
Ironically, I am amazed with is your AsciidoctorFX output. AsciidoctorFX pdf output looks horrible for me and there is no simple way of changing the output style, like editing the asciidoctor-pdf yaml.
Cheers, Joe

How can I determine what the current stable version of Ruby is?

I want to write a Ruby method that does two things:
Determine what the current stable version of Ruby is. My first thought is to get the response from https://www.ruby-lang.org/en/downloads/ and use RegEx to isolate the phrase The current stable version is [x]. Is there is an API I'm not aware of?
Get the URL to download the .tar.gz of that release. For this I was thinking the same thing, get it from the output of the site URL.
I'm looking for advice about the best way to go about it, or direction if there's something in place I might use to determine my desired results.
Ruby code to fetch the download page, then parse the current version and the link URL:
html = Net::HTTP.get(URI("https://www.ruby-lang.org/en/downloads/"))
vers = html[/http.*ruby-(.*).tar.gz/,1]
link = html[/http.*ruby-.*.tar.gz/]
GitHub code: ruby-stable-version.rb
Shell code:
ruby-stable-version
If you are using rbenv you can use ruby-build to get a list of ruby versions and then grep against that.
ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'
You can then use that within your code like so:
version = `ruby-build --definitions | tail -r | grep -x -G -m 1 '[0-9]\.[0-9].[0-9]\-*[p0-9*]*'`.strip
You can then use this value to get the download URL.
url = "http://cache.ruby-lang.org/pub/ruby/#{version[0..2]}/ruby-#{version}.tar.gz"
And then download the file:
require 'open-uri'
open("ruby-#{version}.tar.gz", 'wb') do |file|
file << open(url).read
end
Learn more about rbenv here and ruby-build here.
Another possibility would be to use the Ruby source repository. Check version.h in every branch, filter by RUBY_PATCHLEVEL > -1 (-1 is used for -dev versions), sort by RUBY_VERSION and take the latest one.
You can use:
Ruby's built-in OpenURI, and Nokogiri, to read a page, parse it, search for certain tags, extract a parameter such as a "src" or "href".
OpenURI to read the URL, or curl or wget at the command-line to retrieve the file.
Nokogiri's tutorials including showing how to use OpenURI to retrieve the page and hand it off to Nokogiri.
OpenURI's docs show how to "open" URLs and retrieve their content using read. Once you've done that, the data will be easy to save to disk using something like this for text files:
File.write('some_file', open('http://www.example.com/').read)
or for binary:
File.open('some_file', 'wb') { |fo| fo.write(open('http://www.example.com/').read) }
There are examples of using both Nokogiri and OpenURI for this all over Stack Overflow.

Calling system from a post in sinatra

I made a very small app for the raspberry pi, that uses Sinatra:
https://github.com/khebbie/SpeakPi
The app lets the user input some text in a textarea and asks Google to create an mp3 file for it.
In there I have a shell script called speech2.sh which calls Google and plays the mp3 file:
#!/bin/bash
say() {
wget -q -U Mozilla -O out.mp3 "http://translate.google.com/translate_tts?tl=da&q=$*";
local IFS=+;omxplayer out.mp3;
}
say $*
When I call speech.sh from the commandline like so:
./speech2.sh %C3%A6sel
It pronounces %C3%A6 like the danish letter 'æ', which is correct!
I call speech2.sh from a Sinatra route like so:
post '/say' do
message = params[:body]
system('/home/pi/speech2.sh '+ message)
haml :index
end
And when I do so Google pronounces some very weird chars like 'a broken pipe...' which is wrong!
All chars a-z are pronounced correctly
I have tried some URL encoding and decoding, nothing worked.
I tried outputting the message to the command-line and it was exactly "%C3%A6" that certainly did not make sense.
Do you have any idea what I am doing wrong?
EDIT
To Sum it up and simplify - if I type like so in bash:
./speech2.sh %C3%A6sel
It works
If I start an irb session and type:
system('/home/pi/speech2.sh', '%C3%A6sel')
It does not work!
Since it is handling UTF-8, make sure that the encoding remains right the way through the process by adding the # encoding: UTF-8 magic comment at the top of the Ruby script and passing the ie=UTF-8 parameter in the query string when calling Google Translate.

ruby + save web page

To save the HTML of a web page using Ruby, it's very easy.
One way to do is by using rio:
require 'rubygems'
require 'rio'
rio('http://www.google.com') > rio('google.html')
Is it possible to do the same for by parsing the html, requesting again the different images, javascript, css and then save each of them?
I think it is not very efficient.
So, is there a way to save a web page + all the images, css, and javascript that are related to that page, and all this automatically?
what about system("wget -r -l 1 http://google.com")
Most time we can use the system's tools. Like dimus said, you can use the wget to download page.
And there are many useful api for solving the Net problem. Such as net/ftp, net/http or net/https.
You can see the document for detail.
Net/HTTP
.But these methods only get the response, what we need do more is parsing the HTML document. Even more using the mozilla's lib is a good way.
url = "docs.zillabyte.com"
output_dir = "/tmp/crawl"
# -E = adjust malformed extensions (e.g. /some_image/ -> /some_image.gif)
# -H = span hosts (e.g. include assets from other domains)
# -p = download all assets associated with the page
# -P = output prefix (a.k.a the directory to dump the assets)
system("wget -E -H -p '#{url}' -P '#{output_dir}'")
# read files from 'output_dir'

Resources