Ruby: decode_www_form_component fails with invalid %-encoding

Ruby: decode_www_form_component fails with invalid %-encoding - ruby

The following URL is valid, but decode_www_form_component method fails to decode it.
irb(main):001:0> url = "https://ru.wikipedia.org/wiki/%D0%A0%D0%B0%D0%B4%D0%BE%D0%BC%D1%8B%D1%81%D0%BB%D0%B5%D0%BD%D1%81%D0%BA%D0%B8%D0%B9,_%D0%95%D0%B2%D0%B3%D0%B5%D0%BD%D0%B8%D0%B9_%D0%92%D0%B5%D0%BD%D0%B8%D0%B0%D0%BC%D0%B"
irb(main):002:0> URI.decode_www_form_component(url)
Traceback (most recent call last):
1: from (irb):13
ArgumentError (invalid %-encoding (https://ru.wikipedia.org/wiki/%D0%A0%D0%B0%D0%B4%D0%BE%D0%BC%D1%8B%D1%81%D0%BB%D0%B5%D0%BD%D1%81%D0%BA%D0%B8%D0%B9,_%D0%95%D0%B)
2%D0%B3%D0%B5%D0%BD%D0%B8%D0%B9_%D0%92%D0%B5%D0%BD%D0%B8%D0%B0%D0%BC%D0%B)
Any idea how to avoid this error?

My mistake, I didn't see the URL is invalid, it is truncated to 200 chars.
irb(main):001:0> url = "https://ru.wikipedia.org/wiki/%D0%93%D0%BE%D0%BB%D1%83%D0%B1%D0%B5%D0%B2,_%D0%90%D0%BB%D0%B5%D0%BA%D1%81%D0%B5%D0%B9_%D0%9A%D0%BE%D0%BD%D1%81%D1%82%D0%B0%D0%BD%D1%82%D0%B8%D0%BD%D0%BE%D0%B2%D0%B8%D1%87"
irb(main):002:0> URI.decode_www_form_component(url)
=> "https://ru.wikipedia.org/wiki/Голубев,_Алексей_Константинович"

Related

How do I properly ask a user for a URL and check to see if that site is up?

What I'm trying to do is ask a user for a URL, then query that URL to see if it's up, and respond with "Response OK!" as the title describes.
require 'net/http'
require 'url'
Here I'm asking a user for a url and attempting to define that string as a variable:
puts "\n\nWhat website would you like to check?\n\n"
userinput = gets.chomp
Here I'm checking for an HTTP response with that variable $userinput
def main
while true
uri = URI.parse($userinput)
response = Net::HTTP::get_response(uri)
if response.code == "200"
puts "Response OK!"
else
puts "Received #{response.code} code. Probing again in 15s..."
end
sleep(15)
end
end
# Exit on CTRL-C SIGINT
Signal.trap("INT") {
puts "\nUser exited."
exit
}
main ()
Here is the code in action. I don't know how else to paste this:
What website would you like to check?
http://www.reddit.com
Traceback (most recent call last):
1: from prober.rb:33:in `<main>'
prober.rb:14:in `main': wrong number of arguments (given 1, expected 0) (ArgumentError)
Removing the space between main () results in this:
Traceback (most recent call last):
5: from prober.rb:33:in `<main>'
4: from prober.rb:16:in `main'
3: from /usr/local/Cellar/ruby/2.5.1/lib/ruby/2.5.0/uri/common.rb:237:in `parse'
2: from /usr/local/Cellar/ruby/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:73:in `parse'
1: from /usr/local/Cellar/ruby/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:15:in `split'
/usr/local/Cellar/ruby/2.5.1/lib/ruby/2.5.0/uri/rfc3986_parser.rb:18:in `rescue in split': bad URI(is not URI?): (URI::InvalidURIError)

For this ArgumentError problem:
Traceback (most recent call last):
1: from prober.rb:33:in `<main>'
prober.rb:14:in `main': wrong number of arguments (given 1, expected 0)
remove the space between main and the parenthesis. Ruby, due to the flexibility on accepting method calls without parenthesis is using the empty parenthesis as a method argument.
If you want to pass the user input to your main method, then you need to declare a parameter for it (better than using global variables).
You can use gets.chomp to store the user input, and when you call the main function, you pass that as argument.
So that can be like:
def main(user_input)
while true
uri = URI.parse(user_input)
response = Net::HTTP.get_response(uri)
...
end
end
puts "\n\nWhat website would you like to check?\n\n"
userinput = gets.chomp
main(userinput)
Notice the use of () in "void context" is interpreted as a nil object. Which is clearly valid for throwing an ArgumentError exception.
p ()
# nil

Net::HTTP vs REST Client gem: How do they handle bad websites / 404

I was trying to access some websites using rest-client gem and I found a behavior that was puzzling to me. It has to do with using rest-client with a bad website, in this case, www.google.com/this_does_not_exist.
What I expected: That the code would run and the response object will have a 404 response code.
What actually happened: There was an exception and the code was terminated prematurely.
When I tried the same thing with the Net::HTTP library, I did get the expected result.
The question is: Is this behavior expected in rest-client? If so, how would you get back an object with a 404 response code when using with bad websites.
Here is the code from my irb:
2.2.1 :045 > uri = URI('http://www.google.com')
=> #<URI::HTTP http://www.google.com>
2.2.1 :046 > response = Net::HTTP.get_response(uri)
=> #<Net::HTTPOK 200 OK readbody=true>
2.2.1 :047 > response.code
=> "200"
2.2.1 :048 > uri = URI('http://www.google.com/this_does_not_exist')
=> #<URI::HTTP http://www.google.com/this_does_not_exist>
2.2.1 :049 > response = Net::HTTP.get_response(uri)
=> #<Net::HTTPNotFound 404 Not Found readbody=true>
2.2.1 :050 > response.code
=> "404"
2.2.1 :051 > uri = URI('http://www.google.com')
=> #<URI::HTTP http://www.google.com>
2.2.1 :052 > response = RestClient.get('http://www.google.com')
=> <RestClient::Response 200 "<!doctype h...">
2.2.1 :053 > response.code
=> 200
2.2.1 :054 > response = RestClient.get('http://www.google.com/this_does_not_exist')
RestClient::NotFound: 404 Not Found
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/abstract_response.rb:223:in `exception_with_response'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/abstract_response.rb:103:in `return!'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/request.rb:860:in `process_result'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/request.rb:776:in `block in transmit'
from /Users/piperwarrior/.rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/net/http.rb:853:in `start'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/request.rb:766:in `transmit'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/request.rb:215:in `execute'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient/request.rb:52:in `execute'
from /Users/piperwarrior/.rvm/gems/ruby-2.2.1/gems/rest-client-2.0.0/lib/restclient.rb:67:in `get'
from (irb):54
from /Users/piperwarrior/.rvm/rubies/ruby-2.2.1/bin/irb:11:in `<main>'
2.2.1 :055 >

From the GitHub README:
for result codes between 200 and 207, a RestClient::Response will be returned
for result codes 301, 302 or 307, the redirection will be followed if the request is a GET or a HEAD
for result code 303, the redirection will be followed and the request transformed into a GET
for other cases, a RestClient::Exception holding the Response will be raised; a specific exception class will be thrown for known error codes
call .response on the exception to get the server's response
So yes, this is expected behavior, the response object can be retrieved with e.response.

RestClient bug on response redirect encoding

When trying to get this page:
resp = RestClient.get("http://www.radios.com.br/aovivo/XXXX/24924")
I get this error:
URI::InvalidURIError: bad URI(is not URI?): http://www.radios.com.br/aovivo/Radio-Gospel-Ajduk?s/24924
from /Users/danicuki/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/uri/common.rb:176:in `split'
from /Users/danicuki/.rvm/rubies/ruby-2.0.0-p353/lib/ruby/2.0.0/uri/common.rb:211:in `parse'
I think this is happening because the response redirect url has encoding problem. How to fix it?

Non-ASCII characters in URIs must be urlencoded:
url = "http://www.radios.com.br/aovivo/XXXX/24924"
resp = RestClient.get(URI::encode(str))

You need to apply patch for RestClient: (in version 2.1.0 it is not fixed yet)
RestClient::AbstractResponse.module_eval do
alias _origin_follow_redirection _follow_redirection
def _follow_redirection(new_args, &block)
# cannot follow redirection if there is no location header
raise exception_with_response unless headers[:location]
# Fix URI::InvalidURIError: URI must be ascii only
headers[:location] = URI::encode headers[:location]
_origin_follow_redirection new_args, &block
end
end

Ruby - Microsoft Translator unexpected token error

I'm using the following method to translate a simple word from English to Russian by calling:
translate("hello")
This is my method:
def translate(text)
begin
uri = "http://api.microsofttranslator.com/V2/Ajax.svc/GetTranslations?appId=#{#appid}&text=#{text.strip}&from=en&to=ru&maxTranslations=1"
page = HTTParty.get(uri).body
show_info = JSON.parse(page) # this line throws the error
rescue
puts $!
end
end
The JSON output:
{"From":"en","Translations":[{"Count":0,"MatchDegree":100,"MatchedOriginalText":"","Rating":5,"TranslatedText":"Привет"}]}
The error:
unexpected token at '{"From":"en","Translations":[{"Count":0,"MatchDegree":100,"MatchedOriginalText":"","Rating":5,"TranslatedText":"Привет"}]}'
Not sure what it means by unexpected token. It's the only error I'm receiving. Unfortunately I can't modify the JSON output as it's returned by the API itself.
UPDATE:
Looks like the API is returning some illegal characters (bad Microsoft):
'´╗┐{"From":"en","Translations":[{"Count":0,"MatchDegree":0,"Matched OriginalText":"","Rating":5,"TranslatedText":"Hello"}]}'
Full error:
C:/Ruby193/lib/ruby/1.9.1/json/common.rb:148:in `parse': 743: unexpected token at '´╗┐{"From":"en","Translations":[{"Count":0,"MatchDegree":0,"Matched
OriginalText":"","Rating":5,"TranslatedText":"Hello"}]}' (JSON::ParserError)
from C:/Ruby193/lib/ruby/1.9.1/json/common.rb:148:in `parse'
from trans.rb:13:in `translate'
from trans.rb:17:in `<main>'

Try ensuring UTF-8 encoding and stripping any leading BOM indicators in the string:
# encoding: UTF-8
# ^-- Make sure this is on the first line!
def translate(text)
begin
uri = "http://api.microsofttranslator.com/V2/Ajax.svc/GetTranslations?appId=#{#appid}&text=#{text.strip}&from=en&to=ru&maxTranslations=1"
page = HTTParty.get(uri).body
page.force_encoding("UTF-8").gsub!("\xEF\xBB\xBF", '')
show_info = JSON.parse(page) # this line throws the error
rescue
puts $!
end
end
Sources:
Ruby 1.9's String
Wikipedia: Byte order mark
Using awk to remove the Byte-order mark

Ruby: JSON.parse returns undefined method `bytesize' for

response = Typhoeus::Request.get("http://localhost:3000/api/api_email/#{#api_id}.json")
JSON.parse(response.body)
The response is a JSON object but I get an error when trying to parse it.
undefined method `bytesize' for
I want to get access to the JSON object.
Error:
NoMethodError at /api/v1/a71040739d6cc50e89aff56601af67/2011-10-1
undefined method `bytesize' for {"xpto"=>{"email
"=>"test#gmail.com"}}:Hash
file: utils.rb location: bytesize line: 239
BacktracE:
/Users/donald/.rvm/rubies/ruby-1.9.2-rc2/lib/ruby/1.9.1/webrick/httpserver.rb in service
si.service(req, res)
/Users/donald/.rvm/rubies/ruby-1.9.2-rc2/lib/ruby/1.9.1/webrick/httpserver.rb in run
server.service(req, res)
/Users/donald/.rvm/rubies/ruby-1.9.2-rc2/lib/ruby/1.9.1/webrick/server.rb in block in start_thread
block ? block.call(sock) : run(sock)
This is how is being generated:
#api_id = params[:api_id]
#bucket = Bucket.where(:api => #api_id)
respond_with(#bucket, :only => [:email])
The .json file being returned contains:
[{"xpto":{"email":"test#gmail.com"}}]

It's weird, as it seems that the response.body is already a Hash! (i.e. parsed JSON string), or, maybe you're seeing this in your webrick's log, and thus the problem is with generating the JSON response, rather than parsing it. The backtrace doesn't make sense :(

This is probably compatibility issue. You will probably not have that problem if you are using ruby1.9, but if you are using other versions/implementations such as ruby 1.8, or IronRuby, String#bytesize might not be defined.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Ruby: decode_www_form_component fails with invalid %-encoding - ruby

Related

How do I properly ask a user for a URL and check to see if that site is up?

Net::HTTP vs REST Client gem: How do they handle bad websites / 404

RestClient bug on response redirect encoding

Ruby - Microsoft Translator unexpected token error

Ruby: JSON.parse returns undefined method `bytesize' for

Categories

Resources