OpenURI fails to follow URLs that have %20 [duplicate] - ruby

This question already has answers here:
Does a `+` in a URL scheme/host/path represent a space?
(6 answers)
Closed 3 years ago.
I am having some issues with Ruby's OpenURI follow redirect functionality.
When going to a URL that contains %20 in it, and that redirects with a 30x, Ruby's OpenURI fails.
The exact same URL, with a + instead of %20 works.
Both the %20 and + versions work properly with curl -L (follow).
Code
require 'open-uri'
base = "http://software-engineering-handbook.com/Handbook"
puts "===> PASS: URI Open +"
result = open "#{base}/Video+Series"
p result.status
puts "===> PASS: Curl +"
puts `curl -LIsS "#{base}/Video+Series" | grep HTTP`
puts "===> PASS: Curl %20"
puts `curl -LIsS "#{base}/Video%20Series" | grep HTTP`
puts "===> FAIL: URI Open %20"
begin
result = open "#{base}/Video%20Series"
p result.status
rescue => e
puts "#{e.class} #{e.message}"
end
Output
===> PASS: URI Open +
["200", "OK"]
===> PASS: Curl +
HTTP/1.1 200 OK
===> PASS: Curl %20
HTTP/1.1 303 See Other
HTTP/1.1 200 OK
===> FAIL: URI Open %20
OpenURI::HTTPError 302 Found (Invalid Location URI)
I am not sure what is going on here. Tried HTTParty (although I know it is just a wrapper), hoping to see a different behavior, but it also fails.

The server is responding with an redirect to an invalid URI. curl is being lax about it, but Ruby is being strict.
If we print out the e.cause we get more information.
#<URI::InvalidURIError: bad URI(is not URI?): "http://software-engineering-handbook.com/Handbook/Video Series/">
And also by looking at the headers from curl -I 'http://software-engineering-handbook.com/Handbook/Video%20Series'...
HTTP/1.1 303 See Other
Server: Cowboy
Date: Sat, 28 Dec 2019 21:41:28 GMT
Connection: keep-alive
Content-Type: text/html;charset=utf-8
Location: http://software-engineering-handbook.com/Handbook/Video Series/
And, indeed, the server is returning an invalid URI. Spaces are not allowed in a URI path. Ruby's URI class will not parse it.
> URI("http://software-engineering-handbook.com/Handbook/Video Series/")
URI::InvalidURIError: bad URI(is not URI?): "http://software-engineering-handbook.com/Handbook/Video Series/"
from /Users/schwern/.rvm/rubies/ruby-2.6.5/lib/ruby/2.6.0/uri/rfc3986_parser.rb:67:in `split'

Related

Cannot extract data from an HTTP PUT request in Ruby

I am trying to implement a simple server in Ruby, but somehow I can't get the data from a put request.
curl request that I am making:
curl -v -X PUT localhost:2016/api/kill -d {"connId" : 1}
The server seems to be reading the request alright.
The code:
while line = socket.gets
puts line.chomp
request << line.chomp
break if line =~ /^\s*$/
end
produces the output:
PUT /api/kill HTTP/1.1
User-Agent: curl/7.35.0
Host: localhost:2016
Accept: */*
Content-Length: 7
Content-Type: application/x-www-form-urlencoded
But I don't see the data anywhere?
Am I supposed to see it?
Is something wrong with the curl request?
You need single quotes around the body.
curl -v -X PUT localhost:2016/api/kill -d '{"connId" : 1}'

cURL: Malformed encoding found in chunked-encoding, why?

I am experimenting with CGI and the chunked encoding ("Transfer-Encoding: chunked" HTTP header field.) This way files can be sent without a content-length header. I wrote a minimalistic CGI application in Ruby, to try it out. My code is the following (chunked.rb):
#!/usr/bin/ruby
puts "Date: Fri, 28 Nov 2015 09:59:59 GMT"
puts "Content-Type: application/octet-stream; charset=\"ASCII-8BIT\""
puts "Content-Disposition: attachment; filename=image.jpg"
puts "Transfer-Encoding: chunked"
puts
File.open("image.jpg","rb"){|f|
while data=f.read(32)
STDOUT.puts data.size.to_s(16)
STDOUT.puts data
end
STDOUT.puts "0"
STDOUT.puts
}
I took the idea and chunked format example from here: https://www.jmarshall.com/easy/http/
HTTP/1.1 200 OK
Date: Fri, 31 Dec 1999 23:59:59 GMT
Content-Type: text/plain
Transfer-Encoding: chunked
1a; ignore-stuff-here
abcdefghijklmnopqrstuvwxyz
10
1234567890abcdef
0
some-footer: some-value
another-footer: another-value
[blank line here]
As my CGI app resides in Apache cgi-bin directory, I can issue cURL:
curl http://example.com/cgi-bin/chunked.rb -O -J
cURL should reassamble the original image.jpg file from the chunks, but unfortunately the saved file isn't complete, it is smaller than the original, and I get an error message too from cURL:
curl: (56) Malformed encoding found in chunked-encoding
However when I change line data=f.read(32) to something like data=f.read(1024*50), then file is saved correctly. Using another, bigger file from the server make the CGI app useless again, I got the same error message again. What can I do to make my CGI app working, and to send the file correctly?
So the working example:
puts "Date: Fri, 28 Nov 2015 09:59:59 GMT"
puts "Content-Type: application/octet-stream; charset=\"ASCII-8BIT\""
puts "Content-Disposition: attachment; filename=image.jpg"
puts "Transfer-Encoding: chunked"
puts
File.open("image.jpg","rb"){|f|
while data=f.read(32)
STDOUT.puts data.size.to_s(16)
STDOUT.print data
STDOUT.puts
end
STDOUT.puts "0"
STDOUT.puts
}

Replace BASH curl with Ruby equivalent

I am trying to replace piece of shell curl in my ruby with something more native like 'open-uri', but failing and getting: '401 Authorization Required'
I'm trying to replace this:
status = system("curl -Is -w '%{http_code}\\n' --digest -u #{usr}:#{psw} https://#{source}/ -o /dev/null")
With this:
require 'open-uri'
status = open("https://#{source}/", :http_basic_authentication=>[usr, psw])
But still getting 401. Any idea?
Thank you
If you hit any redirects, this could be your problem:
if redirect
...
if options.include? :http_basic_authentication
# send authentication only for the URI directly specified.
options = options.dup
options.delete :http_basic_authentication
end
end

WEBrick: log POST data

I'm running a simple WEBrick server to debug POST data. I'd like to output the POST data to the log.
My code is:
server.mount_proc '/' do |req, res|
res.body = "Web server response:\n"
# Output POST data here...
end
where server is simply a WEBrick server.
Any suggestions?
Access raw post data using req.body.
server.mount_proc '/' do |req, res|
res.body = "Web server response:\n"
p req.body # <---
end
If you want parsed data (as hash), use req.query instead.
UPDATE
Customize :AccessLog:
require 'webrick'
log = [[ $stderr, WEBrick::AccessLog::COMMON_LOG_FORMAT + ' POST=%{body}n']]
server = WEBrick::HTTPServer.new :Port => 9000, :AccessLog => log
server.mount_proc '/' do |req, res|
req.attributes['body'] = req.body
res.body = "Web server response:\n"
end
server.start
Have you ever tried netcat? To see if you have it do:
$ man nc
Then you can start a server like this:
$ nc -l 8080 (-l act as a server, listening on port 8080)
(hangs)
If I send a post request with the data 'a=10&b=20' to http://locahost:8080, netcat outputs:
$ nc -l 8080
POST / HTTP/1.1
Host: localhost:8080
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:23.0) Gecko/20100101 Firefox/23.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: null
Accept-Encoding: gzip, deflate
DNT: 1
Content-Length: 9
Content-Type: text/plain; charset=UTF-8
Connection: keep-alive
Pragma: no-cache
Cache-Control: no-cache
a=10&b=20

Simple TCPSocket server in Ruby exhibits a HTTP header issue

I'm benchmarking some simple HTTP server implementations in Ruby (no threads, threaded, fibers and eventmachine) but this simple piece of code fails using threads:
#!/usr/bin/env ruby
require 'socket'
server = TCPServer.new("127.0.0.1", 8080)
puts "Listening on 127.0.0.1:8080"
while true
Thread.new(server.accept) do |client|
msg = client.readline
headers = [
"",
"HTTP/1.1 200 OK",
"Date: Fri, 30 Sep 2011 08:11:27 GMT",
"Server: TCP socket test",
"Content-Type: text/html; charset=iso-8859-1",
"Content-Length: #{msg.length}\r\n\r\n"].join("\r\n")
client.write headers
client.write ">>> Data sent:\n #{msg}"
client.close
end
end
A simple curl http://localhost:8080/ works fine, when the first element in the array is "" or other String, but not the "HTTP/1.1 200 OK" response directly. Why is this?

Resources