Ruby: How to escape url with square brackets [ and ]? - ruby

This url:
http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-[nsfw]
should be:
http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-%5Bnsfw%5D
But when I pass the first one into URI.encode, it doesn't escape the square brackets. I also tried CGI.escape, but that escapes all the '/' as well.
What should I use to escape URLS properly? Why doesn't URI.encode escape square brackets?

You can escape [ with %5B and ] with %5D.
Your URL will be:
URL.gsub("[","%5B").gsub("]","%5D")
I don't like that solution but it's working.

encode doesn't escape brackets because they aren't special -- they have no special meaning in the path part of a URI, so they don't actually need escaping.
If you want to escape chars other than just the "unsafe" ones, pass a second arg to the encode method. That arg should be a regex matching, or a string containing, every char you want encoded (including chars the function would otherwise already match!).

If using a third-party gem is an option, try addressable.
require "addressable/uri"
url = Addressable::URI.parse("http://[::1]/path[]").normalize!.to_s
#=> "http://[::1]/path%5B%5D"
Note that the normalize! method will not only escape invalid characters but also perform casefolding on the hostname part, unescaping on unnecessarily escaped characters and the like:
uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F").normalize!
url = uri.to_s #=> "http://example.org/path%5B%5D?query%5B%5D=/"
So, if you just want to normalize the path part, do as follows:
uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F")
uri.path = uri.normalized_path
url = uri.to_s #=> "http://Example.ORG/path%5B%5D?query[]=%2F"

According to new IP-v6 syntax there could be urls like this:
http://[1080:0:0:0:8:800:200C:417A]/index.html
Because of this we should escape [] only after host part of the url:
if url =~ %r{\[|\]}
protocol, host, path = url.split(%r{/+}, 3)
path = path.gsub('[', '%5B').gsub(']', '%5D') # Or URI.escape(path, /[^\-_.!~*'()a-zA-Z\d;\/?:#&%=+$,]/)
url = "#{protocol}//#{host}/#{path}"
end

Related

Serialize hash as string similar to json with single quotes in Ruby

Is there a way to convert a hash, possibly nested:
{:event=>"subscribe", :channel=>"data_channel", :parameters=>{:api_key=>"XXX", :sign=>"YYY"}}
into a string in specified format as below?
"{'event':'subscribe', 'channel':'data_channel', 'parameters': {'api_key':'XXX', 'sign':'YYY'}}"
EDIT
The format reminds JSON, but practically is not due to single quotes.
Make JSON, then fix it up:
require 'json'
hash = {:event=>"subscribe", :channel=>"data_channel",
:parameters=>{:api_key=>"XXX", :sign=>%q{Miles "Chief" O'Brien}}}
puts hash.to_json.gsub(/"((?:\\[\"]|[^\"])*)"/) { |x|
%Q{'#{$1.gsub(/'|\\"/, ?' => %q{\'}, %q{\\"} => ?")}'}
}
# => {'event':'subscribe','channel':'data_channel',
# 'parameters':{'api_key':'XXX','sign':'Miles "Chief" O\'Brien'}}
EDIT: The first regex says: match a double quote, then a sequence of either escaped double quotes/backslashes, or non-double-quote/backslash characters, then a double quote again. This makes sure we only find strings, and not accidental half-strings like "Miles \". For each such string, we surround the bit that was inside the double quotes ($1) with single quotes, and run a sub-replacement on it that will find escaped double quotes and unescaped single quotes, unescape the former and escape the latter.
Also, sorry about wonky highlighting, seems StackOverflow syntax highlighter can't deal with alternate forms of Ruby quoting, but they're so convenient when you're working with quote characters...
Your desired output looks like a JSON. Try
require 'json'
JSON.dump(hash)
=> "{\"event\":\"subscribe\",\"channel\":\"data_channel\",\"parameters\":{\"api_key\":\"XXX\",\"sign\":\"YYY\"}}"
To have single quotes you can try something like:
JSON.dump(hash).gsub('"', '\'')
It returns:
{'event':'subscribe','channel':'data_channel','parameters':{'api_key':'XXX','sign':'YYY'}}

How to replace the last "/" with a dash in a URL

I have this URL in a Sinatra-based application:
<li><a href="/blog/<%= blog.title.tr(' ', '-') %>/<%= blog.slug %>"
method="get">Show</a></li>
When I click on it, the URL looks like this:
http://127.0.0.1:9292/blog/A-lovely-day/654790
I am trying to make the last / also a - too, so it will be:
http://127.0.0.1:9292/blog/A-lovely-day-654790
How do I replace it after the URL has been rendered?
Given that you started with:
The slash is not part of the title, but simply the character in red. Replace it with - in the code:
You can specify more than one character to transform
blog.title.tr(" /", "-")
r = /
.* # match any character zero or more times (greedily)
\K # forget all matches so far
\/ # match a forward slash
/x # free-spacing regex definition mode
To return a new string with the replacement:
blog.title.sub(r, '-')
To make the replacement in the existing string:
blog.title.sub!(r, '-')
One could use capture groups in place of \K:
blog.title.sub(/(.*)\/(.*)/, '\1-\2')
Another way to make the replacement in the existing string:
blog.title[blog.title.rindex('/')] = '-'
Here's how I'd go about this:
require 'uri'
title = 'A lovely day'
slug = '654790'
uri = URI.parse('http://127.0.0.1:9292/blog/')
[*title.split, slug].join('-') # => "A-lovely-day-654790"
uri.path += [*title.split, slug].join('-')
uri.to_s # => "http://127.0.0.1:9292/blog/A-lovely-day-654790"
Generate the URL in the controller and only output the variable in the view.
It's always good to use the built-in tools. URI helps when manipulating URLs/URIs, and understands appropriate encoding if necessary.
Also, it's useful to remember that the path is actually a file pathname, so sometimes the File package can be very useful for manipulating/splitting/joining. This wasn't a good example, but it's come in very handy.
'http://127.0.0.1:9292/blog/A-lovely-day/654790'.
sub(/\/(?!.*\/)/,'-') # match a / that is not followed by another /
#=> "http://127.0.0.1:9292/blog/A-lovely-day-654790"

How to extract href from a tag using ruby regex?

I have this link which i declare like this:
link = "H.R.11461"
The question is how could I use regex to extract only the href value?
Thanks!
If you want to parse HTML, you can use the Nokogiri gem instead of using regular expressions. It's much easier.
Example:
require "nokogiri"
link = "H.R.11461"
link_data = Nokogiri::HTML(link)
href_value = link_data.at_css("a")[:href]
puts href_value # => https://www.congress.gov/bill/93rd-congress/house-bill/11461
You should be able to use a regular expression like this:
href\s*=\s*"([^"]*)"
See this Rubular example of that expression.
The capture group will give you the URL, e.g.:
link = "H.R.11461"
match = /href\s*=\s*"([^"]*)"/.match(link)
if match
url = match[1]
end
Explanation of the expression:
href matches the href attribute
\s* matches 0 or more whitespace characters (this is optional -- you only need it if the HTML might not be in canonical form).
= matches the equal sign
\s* again allows for optional whitespace
" matches the opening quote of the href URL
( begins a capture group for extraction of whatever is matched within
[^"]* matches 0 or more non-quote characters. Since quotes inside HTML attributes must be escaped this will match all characters up to the end of the URL.
) ends the capture group
" matches the closing quote of the href attribute's value
In order to capture just the url you can do this:
/(href\s*\=\s*\\\")(.*)(?=\\)/
And use the second match.
http://rubular.com/r/qcqyPv3Ww3

Why does my regex capture the surrounding characters?

I'm trying to separate out path elements from a URL in ruby. My URL is:
/cm/api/m_deploymentzones_getDeploymentZones.htm
And using a regex:
/\/([^\/]*)\//
The match is /cm/. I expected to see just cm, given that the / characters are explicitly excluded from the capturing group. What am I doing wrong?
I am also trying to get the next path element with:
/\/[^\/]*\/([^\/]*)\//
And this returns: /cm/api/
Any ideas?
It would be easier to use URI.parse():
require 'uri'
url = '/cm/api/m_deploymentzones_getDeploymentZones.htm'
parts = URI.parse(url).path.split('/') # ["", "cm", "api", "m_deploymentzones_getDeploymentZones.htm"]
Here's a regex matching the path components:
(?<=/).+?(?=/|$)
But you'd be better off just splitting the string on / characters. You don't need regex for this.

regex replace [ with \[

I want to write a regex in Ruby that will add a backslash prior to any open square brackets.
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
# desired out = "my.name\[0].hello.line\[2]"
I've tried multiple combinations of backslashes in the substitution string and can't get it to leave a single backslash.
You don't need a regular expression here.
str = "my.name[0].hello.line[2]"
puts str.gsub('[', '\[')
# my.name\[0].hello.line\[2]
I tried your code and it worked correct:
str = "my.name[0].hello.line[2]"
out = str.gsub(/\[/,"\\[")
puts out #my.name\[0].hello.line\[2]
If you replace putswith p you get the inspect-version of the string:
p out #"my.name\\[0].hello.line\\[2]"
Please see the " and the masked \. Maybe you saw this result.
As Daniel already answered: You can also define the string with ' and don't need to mask the values.

Resources