Ruby unescape HTML string - ruby

Any idea how I can unescape the following string in Ruby?
C:\inetpub\wwwroot\adminWeb
to
C:\inetpub\wwwroot\adminWeb
or to
C%3A%5Cinetpub%5Cwwwroot%5CadminWeb
Tried with URI.decode with no success.

The CGI library is one option:
require 'cgi'
CGI.unescapeHTML('C:\inetpub\wwwroot\adminWeb')
# => "C:\\inetpub\\wwwroot\\adminWeb"

One more variant is HTMLEntities
HTMLEntities.new.decode "C:\inetpub\wwwroot\adminWeb"
# => "C:\\inetpub\\wwwroot\\adminWeb"
I prefer to use it because it deals with rare cases aså and — which CGI.unescapeHTML does not

An alternative is using the standard lib's URI module:
require 'uri'
URI.unescape "C%3A%5Cinetpub%5Cwwwroot%5CadminWeb" # => "C:\\inetpub\\wwwroot\\adminWeb"

Related

How to cgi escape ruby credentials

I am running the command bundle install and keep getting the following error
Please CGI escape your usernames and passwords before setting them for authentication.
I am unsure how I could go about CGI escaping my credentials- any ideas? Thanks
You can do this in irb with Ruby's CGI::Util module:
$ irb
irb(main):001:0> require "cgi"
=> true
irb(main):002:0> CGI.escape "foo#example.com"
=> "foo%40example.com"

Is there a way to combine multiple regular expressions for a substring command?

Is there a way to combine these two regular expressions I am using to convert multi-platform file paths to a URL?
#image_file = "#{request.protocol}#{request.host}/#{#image_file.path.sub(/^([a-z]):\//,"")}".sub(/^\//,"")
This handles both my Windows and *IX platforms for file path conversion to a URL. For example, both of the following file path strings are handled properly:
- "c:\users\docs\pictures\image.jpg" goes to "http://localhost/users/docs/pictures/image.jpg"
- "\home\usr_name\pictures\image.jpg" goes to "http://localhost/usr_name/pictures/image.jpg"
I would prefer not to have to use two sub calls on a string if there is a way to combine them properly.
Suggestions and feedback from the community welcome!
The regex you are looking for is /^([a-z]:)?\//:
"c:/users/docs/pictures/image.jpg".sub(/^([a-z]:)?\//, '')
=> "users/docs/pictures/image.jpg"
"/home/usr_name/pictures/image.jpg".sub(/^([a-z]:)?\//, '')
=> "home/usr_name/pictures/image.jpg"
As some background on working with filenames and URLs...
First, Ruby doesn't require you to use reversed-slashes in Windows filenames, so if you're generating them don't bother. Instead, rely on the fact that the IO class knows what OS you're on and will auto-sense the path separator and convert things for you on the fly. This is from the IO documentation:
Ruby will convert pathnames between different operating system conventions if possible. For instance, on a Windows system the filename "/gumby/ruby/test.rb" will be opened as "\gumby\ruby\test.rb". When specifying a Windows-style filename in a Ruby string, remember to escape the backslashes:
"c:\\gumby\\ruby\\test.rb"
Our examples here will use the Unix-style forward slashes; File::ALT_SEPARATOR can be used to get the platform-specific separator character.
If you're receiving the paths from another source, this makes it easy to normalize them into something Ruby likes:
path = "c:\\users\\docs\\pictures\\image.jpg" # => "c:\\users\\docs\\pictures\\image.jpg"
puts path
# >> c:\users\docs\pictures\image.jpg
path.gsub!(/\\/, '/') if path['\\']
path # => "c:/users/docs/pictures/image.jpg"
puts path
# >> c:/users/docs/pictures/image.jpg
For convenience, write a little helper method:
def normalize_path(p)
p.gsub(/\\/, '/')
end
normalize_path("c:\\users\\docs\\pictures\\image.jpg") # => "c:/users/docs/pictures/image.jpg"
normalize_path("/users/docs/pictures/image.jpg") # => "/users/docs/pictures/image.jpg"
Ruby's File and Pathname classes are very helpful when dealing with paths:
foo = normalize_path(path) # => "c:/users/docs/pictures/image.jpg"
File.dirname(foo) # => "c:/users/docs/pictures"
File.basename(foo) # => "image.jpg"
and:
File.split(foo) # => ["c:/users/docs/pictures", "image.jpg"]
path_to_file, filename = File.split(foo)
path_to_file # => "c:/users/docs/pictures"
filename # => "image.jpg"
Alternately there's the Pathname class:
require 'pathname'
bar = Pathname.new(foo)
bar.dirname # => #<Pathname:c:/users/docs/pictures>
bar.basename # => #<Pathname:image.jpg>
Pathname is an experimental class in Ruby's standard library that wraps up all the convenience methods from File, FileUtils and Dir into one umbrella class. It's worth getting to know:
The goal of this class is to manipulate file path information in a neater way than standard Ruby provides. The examples below demonstrate the difference.
All functionality from File, FileTest, and some from Dir and FileUtils is included, in an unsurprising way. It is essentially a facade for all of these, and more.
Back to your question...
Ruby's standard library also contains the URI class. It's well tested and is a better way to build URLs than simple string concatenation due to idiosyncrasies that can occur when characters need to be encoded.
require 'uri'
url = URI::HTTP.build({:host => 'www.foo.com', :path => foo[/^(?:[a-z]:)?(.+)/, 1]})
url # => #<URI::HTTP:0x007fe91117a438 URL:http://www.foo.com/users/docs/pictures/image.jpg>
The build method applies syntax rules to make sure the URL is valid.
If you need it, at this point you can tack on to_s to get the stringified version:
url.to_s # => "http://www.foo.com/users/docs/pictures/image.jpg"

Ruby/Rails: how to handle incoming URLs with ruby 1.8 UTF-8 encodings (like \xc3\xa1)

We're cleaning up some errors on our site after migration from ruby 1.8.7 to 1.9.3, Rails 3.2.12. We have one encoding error left -- Bing is sending requests for URLs in the form
/search?q=author:\"Andr\xc3\xa1s%20Guttman\"
(This reads /search?q=author:"András Guttman", where the á is escaped).
In fairness to Bing, we were the ones that gave them those bogus URLs, but ruby 1.9.3 isn't happy with them any more.
Our server is currently returning a 500. Rails is returning the error "Encoding::CompatibilityError: incompatible character encodings: UTF-8 and ASCII-8BIT"
I am unable to reproduce this error in a browser, or via curl or wget from OS X or Linux command line.
I want to send a 301 redirect back with a properly encoded URL.
I am guessing that I want to:
detect that the URL has old UTF-8 then if it is malformed, only
use String#encode to get from old to new UTF-8
use CGI.escape() to %-encode the URL
301 redirect to the corrected URL
So I have read a lot and am not sure how (or if) I can detect this bogus URL. I need to detect because otherwise I would have to 301 everything!
When I try in irb I get these results:
1.9.3p392 :015 > foo = "/search?q=author:\"Andr\xc3\xa1s%20Guttman\""
=> "/search?q=author:\"András%20Guttman\""
1.9.3p392 :016 > "/search?q=author:\"Andr\xc3\xa1s%20Guttman\"".encoding
=> #<Encoding:UTF-8>
1.9.3p392 :017 > foo.encoding
=> #<Encoding:UTF-8>
I have read this SO post but I am not sure if I have to go this far or even if this applies.
[Update: since posting, we have added a call to the code in the SO post linked above prior to all requests.]
So the question is: how can I detect the old-style encoding so that I can do the other steps.
First, let's look at the string manipulation side of things. It looks to like using the URI module and unescaping then re-escaping will just work:
2.0.0p0 :007 > foo = "/search?q=author:\"Andr\xc3\xa1s%20Guttman\""
=> "/search?q=author:\"András%20Guttman\""
2.0.0p0 :008 > URI.unescape foo
=> "/search?q=author:\"András Guttman\""
2.0.0p0 :009 > URI.escape URI.unescape foo
=> "/search?q=author:%22Andr%C3%A1s%20Guttman%22"
So the next question is where to do that? I'd say the problem with trying to detect string with the \x escape character is that you can't GUARANTEE those strings were not supposed to be slash-x versus escaped (although, in practice, maybe that is an okay assumption).
You might consider just adding a small rack middleware that does this. See this Railscast for more on rack. Assuming you only get these in the parameters (i.e., after the ? in the URL), then your middleware would look something like (untested, just for illustration; place in your /lib folder as reescape_parameters.rb):
require 'uri' # possibly not needed?
class ReescapeParameters
def initialize(app)
#app = app
end
def call(env)
env['QUERY_STRING'] = URI.escape URI.unescape env['QUERY_STRING']
status, headers, body = #app.call(env)
[status, headers, body]
end
end
Then you use the middleware by adding a line to your application config or an initializer. For example, in /config/application.rb (or, alternatively, in an initializer):
config.middleware.use "ReescapeParameters"
Note that you will probably need to catch theme parameters before any parameter handling by Rails. I'm not sure where in the Rack stack you'll need to put it, but you will more likely need:
config.middleware.insert_before ActionDispatch::ParamsParser, ReescapeParameters
Which would put it in the stack before ActionDispatch::ParamsParser. You'll need to figure out the correct module to put it after. This is just a guess. (FYI: There is an insert_after as well.)
UPDATE (REVISED)
If you MUST detect these and then send a 301, you could try:
def call(env)
if env['QUERY_STRING'].encoding.name == 'ASCII-8BIT' # could be 'ASCII_8BIT' ?
location = URI.escape URI.unescape env['QUERY_STRING']
[301, {'Content-Type' => 'text','Location' => location}, '']
else
status, headers, body = #app.call(env)
[status, headers, body]
end
end
This is a trial -- it might match everything. But hopefully, "regular" strings are being encoded as something else (and hence you only get the error for the ASCII-8BIT encoding).
Per one of the comments, you could also convert instead of unescape and escape:
location = env['QUERY_STRING'].encode('UTF-8')
but you might still need to URI escape the resulting string anyway (not sure, depends on your circumstances).
Please use CGI::unescapeHTML(string)

Match regex works for one search, but scan does not

The following gets me one match:
query = http://0.0.0.0:9393/review?first_name=aoeu&last_name=rar
find = /(?<=(\?|\&)).*?(?=(\&|\z))/.match(query)
When I examine 'find' I get:
first_name=aoeu
I want to match everything between a '?' and a '&', so I tried
find = query.scan(/(?<=(\?|\&)).*?(?=(\&|\z))/)
But yet when I examine 'find' I now get:
[["?", "&"], ["&", ""]]
What do I need to do to get:
[first_name=aoeu][last_name=rar]
or
["first_name=aoeu","last_name=rar"]
?
Use String#split.
query.split(/[&?]/).drop(1)
or
query[/(?<=\?).*/].split("&")
But if your real purpose is to extract the parameters from url, then question and its answer.
Use other module provided by ruby or rails will make your code more maintainable and readable.
require 'uri'
uri = 'http://0.0.0.0:9393/review?first_name=aoeu&last_name=rar'
require 'rack'
require 'rack/utils'
Rack::Utils.parse_query(URI.parse(uri).query)
# => {"first_name"=>"aoeu", "last_name"=>"rar"}
# or CGI
require 'cgi'
CGI::parse(URI.parse(uri).query)
# => {"first_name"=>["aoeu"], "last_name"=>["rar"]}
If you need extract query params from URI, please, check thread "How to extract URL parameters from a URL with Ruby or Rails?". It contains a lot of solutions without using regexps.

Equivalent of cURL for Ruby?

Is there a cURL library for Ruby?
Curb and Curl::Multi provide cURL bindings for Ruby.
If you like it less low-level, there is also Typhoeus, which is built on top of Curl::Multi.
Use OpenURI and
open("http://...", :http_basic_authentication=>[user, password])
accessing sites/pages/resources that require HTTP authentication.
Curb-fu is a wrapper around Curb which in turn uses libcurl. What does Curb-fu offer over Curb? Just a lot of syntactic sugar - but that can be often what you need.
HTTP clients is a good page to help you make decisions about the various clients.
You might also have a look at Rest-Client
If you know how to write your request as a curl command, there is an online tool that can turn it into ruby (2.0+) code: curl-to-ruby
Currently, it knows the following options: -d/--data, -H/--header, -I/--head, -u/--user, --url, and -X/--request. It is open to contributions.
the eat gem is a "replacement" for OpenURI, so you need to install the gem eat in the first place
$ gem install eat
Now you can use it
require 'eat'
eat('http://yahoo.com') #=> String
eat('/home/seamus/foo.txt') #=> String
eat('file:///home/seamus/foo.txt') #=> String
It uses HTTPClient under the hood. It also has some options:
eat('http://yahoo.com', :timeout => 10) # timeout after 10 seconds
eat('http://yahoo.com', :limit => 1024) # only read the first 1024 chars
eat('https://yahoo.com', :openssl_verify_mode => 'none') # don't bother verifying SSL certificate
Here's a little program I wrote to get some files with.
base = "http://media.pragprog.com/titles/ruby3/code/samples/tutthreads_"
for i in 1..50
url = "#{ base }#{ i }.rb"
file = "tutthreads_#{i}.rb"
File.open(file, 'w') do |f|
system "curl -o #{f.path} #{url}"
end
end
I know it could be a little more eloquent but it serves it purpose. Check it out. I just cobbled it together today because I got tired of going to each URL to get the code for the book that was not included in the source download.
There's also Mechanize, which is a very high-level web scraping client that uses Nokogiri for HTML parsing.
Adding a more recent answer, HTTPClient is another Ruby library that uses libcurl, supports parallel threads and lots of the curl goodies. I use HTTPClient and Typhoeus for any non-trivial apps.
To state the maybe-too-obvious, tick marks execute shell code in Ruby as well. Provided your Ruby code is running in a shell that has curl:
puts `curl http://www.google.com?q=hello`
or
result = `
curl -X POST https://www.myurl.com/users \
-d "name=pat" \
-d "age=21"
`
puts result
A nice minimal reproducible example to copy/paste into your rails console:
require 'open-uri'
require 'nokogiri'
url = "https://www.example.com"
html_file = URI.open(url)
doc = Nokogiri::HTML(html_file)
doc.css("h1").text
# => "Example Domain"

Resources