I need to send an email as a parameter in the query string.
Non of the standard functions i have tried is able to encode the '.' (dot).
CGI.escape('my.fake#email.com')
=> "my.fake%40email.com"
URI.escape('my.fake#email.com')
=> "my.fake#email.com"
URI.encode('my.fake#email.com')
=> "my.fake#email.com"
ERB::Util.url_encode('my.fake#email.com')
=> "my.fake%40email.com"
I can easily do a function myself, but i just wanted to know if is there already any method.
As tadman pointed here, you will have a problem with dots if the route is something like get "users/:email". Because rails considers dots as separators. As pointed in the docs, you will then need to add a constraint like:
get "users/:email", to: "users#show", constraints: { email: /.*/ }
And with that you don't need to escape dots in the url.
You actually don't need to encode the dot. After the ? in the url, / and . don't have any specific meaning.
You can escape characters that wouldn't normally be escaped by providing a regex as the 2nd argument to URI.escape
URI.escape('a#b.c', /\./)
=> "a#b%2Ec"
(keep in mind that providing your own regex overrides the default, so now nothing but . will be encoded)
Related
XPath has the function encode-for-uri() that makes a string safe for use in a URI path segment:
encode-for-uri('AC/DC') => AC%2FDC
But it also %-encodes international characters:
encode-for-uri('汉/语') => %E6%B1%89%2F%E8%AF%AD
This is indeed necessary for URIs, but it is not necessary for IRIs, which are allowed to include these characters.
Is there a way to achieve the effect of encode-for-uri() in XPath while keeping i18n characters unencoded? Like this:
???('汉/语') => 汉%2F语
Perhaps the iri-to-uri() function does what you are looking for.
However, it doesn't escape "/" - it's designed to operate on an entire IRI, not on a segment of the path.
Using regex, how could I remove everything before the first path / in a URL?
Example URL: https://www.example.com/some/page?user=1&email=joe#schmoe.org
From that, I just want /some/page?user=1&email=joe#schmoe.org
In the case that it's just the root domain (ie. https://www.example.com/), then I just want the / to be returned.
The domain may or may not have a subdomain and it may or may not have a secure protocol. Really ultimately just wanting to strip out anything before that first path slash.
In the event that it matters, I'm running Ruby 1.9.3.
Don't use regex for this. Use the URI class. You can write:
require 'uri'
u = URI.parse('https://www.example.com/some/page?user=1&email=joe#schmoe.org')
u.path #=> "/some/page"
u.query #=> "user=1&email=joe#schmoe.org"
# All together - this will only return path if query is empty (no ?)
u.request_uri #=> "/some/page?user=1&email=joe#schmoe.org"
require 'uri'
uri = URI.parse("https://www.example.com/some/page?user=1&email=joe#schmoe.org")
> uri.path + '?' + uri.query
=> "/some/page?user=1&email=joe#schmoe.org"
As Gavin also mentioned, it's not a good idea to use RegExp for this, although it's tempting.
You could have URLs with special characters, even UniCode characters in them, which you did not expect when you wrote the RegExp. This can particularly happen in your query string. Using the URI library is the safer approach.
The same can be done using String#index
index(substring[, offset])
str = "https://www.example.com/some/page?user=1&email=joe#schmoe.org"
offset = str.index("//") # => 6
str[str.index('/',offset + 2)..-1]
# => "/some/page?user=1&email=joe#schmoe.org"
I strongly agree with the advice to use the URI module in this case, and I don't consider myself great with regular expressions. Still, it seems worthwhile to demonstrate one possible way to do what you ask.
test_url1 = 'https://www.example.com/some/page?user=1&email=joe#schmoe.org'
test_url2 = 'http://test.com/'
test_url3 = 'http://test.com'
regex = /^https?:\/\/[^\/]+(.*)/
regex.match(test_url1)[1]
# => "/some/page?user=1&email=joe#schmoe.org"
regex.match(test_url2)[1]
# => "/"
regex.match(test_url3)[1]
# => ""
Note that in the last case, the URL had no trailing '/' so the result is the empty string.
The regular expression (/^https?:\/\/[^\/]+(.*)/) says the string starts with (^) http (http), optionally followed by s (s?), followed by :// (:\/\/) followed by at least one non-slash character ([^\/]+), followed by zero or more characters, and we want to capture those characters ((.*)).
I hope that you find that example and explanation educational, and I again recommend against actually using a regular expression in this case. The URI module is simpler to use and far more robust.
I am using Sinatra and get parameters from the url using the get '/foo/:bar' {} method. Unfortunately, the value in :bar can contain nasty things like / which leads to an 404, since no route matches /foo/:bar/baz/. I use URI.escape to escape the URL paramter, but it considers / valid a valid character. As it is mentioned here this is because the default Regexp to check against does not differentiate between unsafe and reserved characters. I would like to change this and did this:
URI.escape("foo_<_>_&_3_#_/_+_%_bar", Regexp.union(URI::REGEXP::UNSAFE, '/'))
just to test it.
URI::REGEXP::UNSAFE is the default regexp to match against according to the Ruby 1.9.3 Documentaton:
escape(*arg)
Synopsis
URI.escape(str [, unsafe])
Args
str
String to replaces in.
unsafe
Regexp that matches all symbols that must be replaced with
codes. By default uses REGEXP::UNSAFE. When this argument is
a String, it represents a character set.
Description
Escapes the string, replacing all unsafe characters with codes.
Unfortunatelly I get this error:
uninitialized constant URI::REGEXP::UNSAFE
And as this GitHub Issue suggests, this Regexp was removed from Ruby with 1.9.3. Unfortunately, the URI modules documentation is generally kind of bad, but I really cannot figure this out. Any hints?
Thanks in advance!
URI#escape is not what you are looking for. You want CGI#escape:
require 'cgi'
CGI.escape("foo_<_>_&_3_#_/_+_%_bar")
# => "foo_%3C_%3E_%26_3_%23_%2F_%2B_%25_bar"
This will properly encode it to allow Sinatra to retrieve it.
Perhaps you would have better luck with CGI.escape?
>> require 'uri'; URI.escape("foo_<_>_&_3_#_/_+_%_bar")
=> "foo_%3C_%3E_&_3_%23_/_+_%25_bar"
>> require 'cgi'; CGI.escape("foo_<_>_&_3_#_/_+_%_bar")
=> "foo_%3C_%3E_%26_3_%23_%2F_%2B_%25_bar"
I want to extract #hashtags from a string, also those that have special characters such as #1+1.
Currently I'm using:
#hashtags ||= string.scan(/#\w+/)
But it doesn't work with those special characters. Also, I want it to be UTF-8 compatible.
How do I do this?
EDIT:
If the last character is a special character it should be removed, such as #hashtag, #hashtag. #hashtag! #hashtag? etc...
Also, the hash sign at the beginning should be removed.
The Solution
You probably want something like:
'#hash+tag'.encode('UTF-8').scan /\b(?<=#)[^#[:punct:]]+\b/
=> ["hash+tag"]
Note that the zero-width assertion at the beginning is required to avoid capturing the pound sign as part of the match.
References
String#encode
Ruby's POSIX Character Classes
This should work:
#hashtags = str.scan(/#([[:graph:]]*[[:alnum:]])/).flatten
Or if you don't want your hashtag to start with a special character:
#hashtags = str.scan(/#((?:[[:alnum:]][[:graph:]]*)?[[:alnum:]])/).flatten
How about this:
#hashtags ||=string.match(/(#[[:alpha:]]+)|#[\d\+-]+\d+/).to_s[1..-1]
Takes cares of #alphabets or #2323+2323 #2323-2323 #2323+65656-67676
Also removes # at beginning
Or if you want it in array form:
#hashtags ||=string.scan(/#[[:alpha:]]+|#[\d\+-]+\d+/).collect{|x| x[1..-1]}
Wow, this took so long but I still don't understand why scan(/#[[:alpha:]]+|#[\d\+-]+\d+/) works but not scan(/(#[[:alpha:]]+)|#[\d\+-]+\d+/) in my computer. The difference being the () on the 2nd scan statement. This has no effect as it should be when I use with match method.
In Ruby I have an arbitrary string, and I'd like to convert it to something that is a valid Unix/Linux filename. It doesn't matter what it looks like in its final form, as long as it is visually recognizable as the string it started as. Some possible examples:
"Here's my string!" => "Heres_my_string"
"* is an asterisk, you see" => "is_an_asterisk_you_see"
Is there anything built-in (maybe in the file libraries) that will accomplish this (or close to this)?
By your specifications, you could accomplish this with a regex replacement. This regex will match all characters other than basic letters and digits:
s/[^\w\s_-]+//g
This will remove any extra whitespace in between words, as shown in your examples:
s/(^|\b\s)\s+($|\s?\b)/\\1\\2/g
And lastly, replace the remaining spaces with underscores:
s/\s+/_/g
Here it is in Ruby:
def friendly_filename(filename)
filename.gsub(/[^\w\s_-]+/, '')
.gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
.gsub(/\s+/, '_')
end
First, I see that it was asked purely in ruby, and second that it's not the same purpose (*nix filename compatible), but if you are using Rails, there is a method called parameterize that should help.
In rails console:
"Here's my string!".parameterize => "here-s-my-string"
"* is an asterisk, you see".parameterize => "is-an-asterisk-you-see"
I think that parameterize, as being compliant with URL specifications, may work as well with filenames :)
You can see more about here:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize
There's also a whole lot of another helpful methods.