How can I access the raw request body in ruby CGI scripts? - ruby

In a ruby script that I run as a CGI program, I need to access the body of a HTTP POST request. The request body contains JSON data:
{"data":"a"}
I want to take the whole body and parse it with JSON.parse to process it. What's the canonical way to do this? The Ruby docs don't mention the request body.
I only found a hint in a blog post that
CGI tries to parse the request body as form parameters so a blob of JSON awkwardly ends up as the one and only parameter key.
This approach seems to work
puts cgi.params.keys.first # prints {"data":"a"}
but fails as soon as the value for data is a base64 encoded string that contains an = for padding: Using this body
{"data":"a="}
results in the following output (characters missing at the end):
puts cgi.params.keys.first # prints {"data":"a
What's the correct approach to solve this?

As you might already know, when parameters and their values are urlencoded they are delimited with an =: name=Theo&language=ruby and so on.
This is why the name of the first parameter stops at the character before the =. The approach of using the first key, as describe in that blog post, isn't really reliable.
Instead, in a CGI script you can read the request body directly from stdin e.g.
request_body = $stdin.read
Note, when you instantiate a CGI object it will read in everything from stdin and attempt to parse it into the params hash.
This means that if you'd still like to use the cgi library for building your response you'll need to read from stdin earlier in the code, before creating the CGI object. e.g.
# minimal example that just outputs the request body
require 'cgi'
request_body = $stdin.read
cgi = CGI.new
cgi.out("status" => "OK", "type" => "text/plain", "connection" => "close") do
request_body
end

Apparently there is no easy solution for this in Ruby.
But there are two ways you can achieve this.
Redefine CGI::parse(params) method.
This method in CGI module is responsible for parsing both POST and GET parameters into params hash. You can redefine this method in your code so that it add an extra parameter called RAW_DATA in params hash.
def CGI::parse(query)
params = {}
query.split(/[&;]/).each do |pairs |
key, value = pairs.split('=', 2).collect {
| v | CGI::unescape(v)
}
next unless key
params[key] || = []
params[key].push(value) if value
end
#Add RAW_DATA to params
params[:RAW_DATA] = query
params.default = [].freeze
params
end
Use $stdin.read() before creating CGI instance.
But this may prevent you from making use of other CGI features.
So you may replace $stdin temporarily with a StringIO object.
require 'cgi'
require 'stringio'
raw_data = $stdin.read()
real_stdin = $stdin
$stdin = StringIO.new(raw_data)
STDIN = $stdin
cgi = CGI.new
#Your CGI code here
#........
$stdin = real_stdin
STDIN = $stdin

Related

Ruby's Faraday - Send optional parameters in get method

I have an endpoint with multiple optional parameters.
def get_customers(params=nil)
if params.nil?
customer_url = "#{#url}/customers"
# call api
response = connection.get(customer_url)
else
# I do not know how to write this part
end
end
Could you please help me in order to write a call to and endpoint with optional parameters. The params argument is a hash (key, pair value). The query can have 8 parameters. I do not know how to concatenate the params to the url. I am stack in this section. I am a rookie at ruby and faraday.
Thanks in advance
You don't have to concatenate params with the url on your own. Faraday can accept a hash of params (or nil). You can pass them to the get method like so:
response = connection.get(customer_url, params)
Have into the "GET, HEAD, DELETE, TRACE" section of the documentation for more examples.
Side note: you don't event have to concatenate url and the path. You can pass them as separate arguments.

Using parsed response in separate GET call

I'm new to Ruby and API, so my apologies if this is super simple...
I need to have script that will first POST to initiate the creation of an export file, and then have a GET call to retrieve the file. The GET call needs to use part of the POST json response.
I'm using the httparty gem.
I think I need to create a variable that equals the parsed json, and then make that variable part of the GET call, but I'm not clear on how to do that.
Help is appreciated.
require 'httparty'
url = 'https://api.somewhere.org'
response = HTTParty.post(url)
puts response.parse_response
json response:
export_files"=>
{"id"=> #####,
"export_id"=> #####,
"status"=>"Queued"}}
In my GET call I need to use the export_id number in the url.
HTTParty.get('https://api.somewhere.org/export_id/####')
As described in the comments but a bit more verbose and skeleton for error:
require 'httparty'
require 'json'
url = 'https://api.somewhere.org'
response = HTTParty.post(url)
if hash = JSON.parse(response.body)
if export_id = hash[:export_files][:export_id]
post = HTTParty.post("https://api.somewhere.org/export_id/#{export_id}")
end
else
# handle error
end

How to get Mechanize to auto-convert body to UTF8?

I found some solutions using post_connect_hook and pre_connect_hook, but it seems like they don't work. I'm using the latest Mechanize version (2.1). There are no [:response] fields in the new version, and I don't know where to get them in the new version.
https://gist.github.com/search?q=pre_connect_hooks
https://gist.github.com/search?q=post_connect_hooks
Is it possible to make Mechanize return a UTF8 encoded version, instead of having to convert it manually using iconv?
Since Mechanize 2.0, arguments of pre_connect_hooks() and post_connect_hooks() were changed.
See the Mechanize documentation:
pre_connect_hooks()
A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
 
post_connect_hooks()
A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
Now you can't change the internal response-body value because an argument is not array. So, the next best way is to replace an internal parser with your own:
class MyParser
def self.parse(thing, url = nil, encoding = nil, options = Nokogiri::XML::ParseOptions::DEFAULT_HTML, &block)
# insert your conversion code here. For example:
# thing = NKF.nkf("-wm0X", thing).sub(/Shift_JIS/,"utf-8") # you need to rewrite content charset if it exists.
Nokogiri::HTML::Document.parse(thing, url, encoding, options, &block)
end
end
agent = Mechanize.new
agent.html_parser = MyParser
page = agent.get('http://somewhere.com/')
...
I found a solution that works pretty well:
class HtmlParser
def self.parse(body, url, encoding)
body.encode!('UTF-8', encoding, invalid: :replace, undef: :replace, replace: '')
Nokogiri::HTML::Document.parse(body, url, 'UTF-8')
end
end
Mechanize.new.tap do |web|
web.html_parser = HtmlParser
end
No issues were found yet.
In your script, just enter: page.encoding = 'utf-8'
However, depending on your scenario, you may alternatively need to enter the reverse (the encoding of the website Mechanize is working with) instead. For that, open Firefox, open the website you want Mechanize to work with, select Tools in the menubar, and then open Page Info. Determine what the page is encoded in from there.
Using that info, you would instead enter what the page is encoded in (such as page.encoding = 'windows-1252').
How about something like this:
class Mechanize
alias_method :original_get, :get
def get *args
doc = original_get *args
doc.encoding = 'utf-8'
doc
end
end

In Ruby/Rails, how can I encode/escape special characters in URLs?

How do I encode or 'escape' the URL before I use OpenURI to open(url)?
We're using OpenURI to open a remote url and return the xml:
getresult = open(url).read
The problem is the URL contains some user-input text that contains spaces and other characters, including "+", "&", "?", etc. potentially, so we need to safely escape the URL. I saw lots of examples when using Net::HTTP, but have not found any for OpenURI.
We also need to be able to un-escape a similar string we receive in a session variable, so we need the reciprocal function.
Don't use URI.escape as it has been deprecated in 1.9.
Rails' Active Support adds Hash#to_query:
{foo: 'asd asdf', bar: '"<#$dfs'}.to_query
# => "bar=%22%3C%23%24dfs&foo=asd+asdf"
Also, as you can see it tries to order query parameters always the same way, which is good for HTTP caching.
Ruby Standard Library to the rescue:
require 'uri'
user_text = URI.escape(user_text)
url = "http://example.com/#{user_text}"
result = open(url).read
See more at the docs for the URI::Escape module. It also has a method to do the inverse (unescape)
The main thing you have to consider is that you have to escape the keys and values separately before you compose the full URL.
All the methods which get the full URL and try to escape it afterwards are broken, because they cannot tell whether any & or = character was supposed to be a separator, or maybe a part of the value (or part of the key).
The CGI library seems to do a good job, except for the space character, which was traditionally encoded as +, and nowadays should be encoded as %20. But this is an easy fix.
Please, consider the following:
require 'cgi'
def encode_component(s)
# The space-encoding is a problem:
CGI.escape(s).gsub('+','%20')
end
def url_with_params(path, args = {})
return path if args.empty?
path + "?" + args.map do |k,v|
"#{encode_component(k.to_s)}=#{encode_component(v.to_s)}"
end.join("&")
end
def params_from_url(url)
path,query = url.split('?',2)
return [path,{}] unless query
q = query.split('&').inject({}) do |memo,p|
k,v = p.split('=',2)
memo[CGI.unescape(k)] = CGI.unescape(v)
memo
end
return [path, q]
end
u = url_with_params( "http://example.com",
"x[1]" => "& ?=/",
"2+2=4" => "true" )
# "http://example.com?x%5B1%5D=%26%20%3F%3D%2F&2%2B2%3D4=true"
params_from_url(u)
# ["http://example.com", {"x[1]"=>"& ?=/", "2+2=4"=>"true"}]
Ruby has the built-in URI library, and the Addressable gem, in particular Addressable::URI
I prefer Addressable::URI. It's very full featured and handles the encoding for you when you use the query_values= method.
I've seen some discussions about URI going through some growing pains so I tend to leave it alone for handling encoding/escaping until these things get sorted out:
http://osdir.com/ml/ruby-core/2010-06/msg00324.html
http://osdir.com/ml/lang-ruby-core/2009-06/msg00350.html
http://osdir.com/ml/ruby-core/2011-06/msg00748.html

How to access html request parameters for a .rhtml page served by webrick?

I'm using webrick (the built-in ruby webserver) to serve .rhtml
files (html with ruby code embedded --like jsp).
It works fine, but I can't figure out how to access parameters
(e.g. http://localhost/mypage.rhtml?foo=bar)
from within the ruby code in the .rhtml file.
(Note that I'm not using the rails framework, only webrick + .rhtml files)
Thanks
According to the source code of erbhandler it runs the rhtml files this way:
Module.new.module_eval{
meta_vars = servlet_request.meta_vars
query = servlet_request.query
erb.result(binding)
}
So the binding should contain a query (which contains a hash of the query string) and a meta_vars variable (which contains a hash of the environment, like SERVER_NAME) that you can access inside the rhtml files (and the servlet_request and servlet_response might be available too, but I'm not sure about them).
If that is not the case you can also try querying the CGI parameter ENV["QUERY_STRING"] and parse it, but this should only be as a last resort (and it might only work with CGI files).
This is the solution:
(suppose the request is http://your.server.com/mypage.rhtml?foo=bar)
<html>
<body>
This is my page (mypage.rhtml, served by webrick)
<%
# embedded ruby code
servlet_request.query ["foo"] # this simply prints bar on console
%>
</body>
</html>
You don't give much details, but I imagine you have a servlet to serve the files you will process with erb, and by default the web server serves any static file in a public directory.
require 'webrick'
include WEBrick
require 'erb'
s = HTTPServer.new( :Port => 8080,:DocumentRoot => Dir::pwd + "/public" )
class MyServlet < HTTPServlet::AbstractServlet
def do_GET(req, response)
File.open('public/my.rhtml','r') do |f|
#template = ERB.new(f.read)
end
response.body = #template.result(binding)
response['Content-Type'] = "text/html"
end
end
s.mount("/my", MyServlet)
trap("INT"){
s.shutdown
}
s.start
This example is limited, when you go to /my always the same file is processed. Here you should construct the file path based on the request path. Here I said a important word: "request", everything you need is there.
To get the HTTP header parameters, use req[header_name]. To get the parameters in the query string, use req.query[param_name]. req is the HTTPRequest object passed to the servlet.
Once you have the parameter you need, you have to bind it to the template. In the example we pass the binding object from self (binding is defined in Kernel, and it represents the context where code is executing), so every local variable defined in the do_GET method would be available in the template. However, you can create your own binding for example passing a Proc object and pass it to the ERB processor when calling 'result'.
Everything together, your solution would look like:
def do_GET(req, response)
File.open('public/my.rhtml','r') do |f|
#template = ERB.new(f.read)
end
foo = req.query["foo"]
response.body = #template.result(binding)
response['Content-Type'] = "text/html"
end
Browsing the documentation, it looks like you should have an HTTPRequest from which you can get the query string. You can then use parse_query to get a name/value hash.
Alternatively, it's possible that just calling query() will give you the hash directly... my Ruby-fu isn't quite up to it, but you might want to at least give it a try.

Resources