I have this URL in a Sinatra-based application:
<li><a href="/blog/<%= blog.title.tr(' ', '-') %>/<%= blog.slug %>"
method="get">Show</a></li>
When I click on it, the URL looks like this:
http://127.0.0.1:9292/blog/A-lovely-day/654790
I am trying to make the last / also a - too, so it will be:
http://127.0.0.1:9292/blog/A-lovely-day-654790
How do I replace it after the URL has been rendered?
Given that you started with:
The slash is not part of the title, but simply the character in red. Replace it with - in the code:
You can specify more than one character to transform
blog.title.tr(" /", "-")
r = /
.* # match any character zero or more times (greedily)
\K # forget all matches so far
\/ # match a forward slash
/x # free-spacing regex definition mode
To return a new string with the replacement:
blog.title.sub(r, '-')
To make the replacement in the existing string:
blog.title.sub!(r, '-')
One could use capture groups in place of \K:
blog.title.sub(/(.*)\/(.*)/, '\1-\2')
Another way to make the replacement in the existing string:
blog.title[blog.title.rindex('/')] = '-'
Here's how I'd go about this:
require 'uri'
title = 'A lovely day'
slug = '654790'
uri = URI.parse('http://127.0.0.1:9292/blog/')
[*title.split, slug].join('-') # => "A-lovely-day-654790"
uri.path += [*title.split, slug].join('-')
uri.to_s # => "http://127.0.0.1:9292/blog/A-lovely-day-654790"
Generate the URL in the controller and only output the variable in the view.
It's always good to use the built-in tools. URI helps when manipulating URLs/URIs, and understands appropriate encoding if necessary.
Also, it's useful to remember that the path is actually a file pathname, so sometimes the File package can be very useful for manipulating/splitting/joining. This wasn't a good example, but it's come in very handy.
'http://127.0.0.1:9292/blog/A-lovely-day/654790'.
sub(/\/(?!.*\/)/,'-') # match a / that is not followed by another /
#=> "http://127.0.0.1:9292/blog/A-lovely-day-654790"
Related
I have this link which i declare like this:
link = "H.R.11461"
The question is how could I use regex to extract only the href value?
Thanks!
If you want to parse HTML, you can use the Nokogiri gem instead of using regular expressions. It's much easier.
Example:
require "nokogiri"
link = "H.R.11461"
link_data = Nokogiri::HTML(link)
href_value = link_data.at_css("a")[:href]
puts href_value # => https://www.congress.gov/bill/93rd-congress/house-bill/11461
You should be able to use a regular expression like this:
href\s*=\s*"([^"]*)"
See this Rubular example of that expression.
The capture group will give you the URL, e.g.:
link = "H.R.11461"
match = /href\s*=\s*"([^"]*)"/.match(link)
if match
url = match[1]
end
Explanation of the expression:
href matches the href attribute
\s* matches 0 or more whitespace characters (this is optional -- you only need it if the HTML might not be in canonical form).
= matches the equal sign
\s* again allows for optional whitespace
" matches the opening quote of the href URL
( begins a capture group for extraction of whatever is matched within
[^"]* matches 0 or more non-quote characters. Since quotes inside HTML attributes must be escaped this will match all characters up to the end of the URL.
) ends the capture group
" matches the closing quote of the href attribute's value
In order to capture just the url you can do this:
/(href\s*\=\s*\\\")(.*)(?=\\)/
And use the second match.
http://rubular.com/r/qcqyPv3Ww3
I need to extract a string 'MT/23232' I have written the below code, but
it's not working, Can any one help me here?
'Policy created with MT/1212'
'Policy created with MT/121212'
'Policy created with MT/21212121212'
I have written this code
msg="MT/33235"
id = msg.scan(/MT/\d+/\d+/)[0]
But it's not working for me, Can any one help me to extract this string?
You need to escape the forward slash which exists next to MT in your regex and you don't need to have a forward slash after \d+ . And also i suggest you to add a lookbehind, so that you get a clean result. (?<=\s) Positive lookbehind which asserts that the match must be preceded by a space character.
msg.scan(/(?<=\s)MT\/\d+/)[0]
If you don't care about the preceding character then the below regex would be fine.
msg.scan(/MT\/\d+/)[0]
Example:
> msg = 'Policy created with MT/21212121212'
=> "Policy created with MT/21212121212"
> msg.scan(/(?<=\s)MT\/\d+/)[0]
=> "MT/21212121212"
> msg.match(/(?<=\s)MT\/\d+/)[0]
=> "MT/21212121212"
your_string.scan(/\sMT.*$/).last.strip
If your required substring can be anywhere in the string, then:
your_string.scan(/\bMT\/\d+\b/).last.strip # "\b" is for word boundaries
Or you can specify the acceptable digits this way:
your_string.scan(/\bMT\/[0-9]+\b/).last.strip
Lastly, if the string format is going to remain as you specified, then:
your_string.split.last
I'm trying to separate out path elements from a URL in ruby. My URL is:
/cm/api/m_deploymentzones_getDeploymentZones.htm
And using a regex:
/\/([^\/]*)\//
The match is /cm/. I expected to see just cm, given that the / characters are explicitly excluded from the capturing group. What am I doing wrong?
I am also trying to get the next path element with:
/\/[^\/]*\/([^\/]*)\//
And this returns: /cm/api/
Any ideas?
It would be easier to use URI.parse():
require 'uri'
url = '/cm/api/m_deploymentzones_getDeploymentZones.htm'
parts = URI.parse(url).path.split('/') # ["", "cm", "api", "m_deploymentzones_getDeploymentZones.htm"]
Here's a regex matching the path components:
(?<=/).+?(?=/|$)
But you'd be better off just splitting the string on / characters. You don't need regex for this.
This url:
http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-[nsfw]
should be:
http://gawker.com/5953728/if-alison-brie-and-gillian-jacobs-pin-up-special-doesnt-get-community-back-on-the-air-nothing-will-%5Bnsfw%5D
But when I pass the first one into URI.encode, it doesn't escape the square brackets. I also tried CGI.escape, but that escapes all the '/' as well.
What should I use to escape URLS properly? Why doesn't URI.encode escape square brackets?
You can escape [ with %5B and ] with %5D.
Your URL will be:
URL.gsub("[","%5B").gsub("]","%5D")
I don't like that solution but it's working.
encode doesn't escape brackets because they aren't special -- they have no special meaning in the path part of a URI, so they don't actually need escaping.
If you want to escape chars other than just the "unsafe" ones, pass a second arg to the encode method. That arg should be a regex matching, or a string containing, every char you want encoded (including chars the function would otherwise already match!).
If using a third-party gem is an option, try addressable.
require "addressable/uri"
url = Addressable::URI.parse("http://[::1]/path[]").normalize!.to_s
#=> "http://[::1]/path%5B%5D"
Note that the normalize! method will not only escape invalid characters but also perform casefolding on the hostname part, unescaping on unnecessarily escaped characters and the like:
uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F").normalize!
url = uri.to_s #=> "http://example.org/path%5B%5D?query%5B%5D=/"
So, if you just want to normalize the path part, do as follows:
uri = Addressable::URI.parse("http://Example.ORG/path[]?query[]=%2F")
uri.path = uri.normalized_path
url = uri.to_s #=> "http://Example.ORG/path%5B%5D?query[]=%2F"
According to new IP-v6 syntax there could be urls like this:
http://[1080:0:0:0:8:800:200C:417A]/index.html
Because of this we should escape [] only after host part of the url:
if url =~ %r{\[|\]}
protocol, host, path = url.split(%r{/+}, 3)
path = path.gsub('[', '%5B').gsub(']', '%5D') # Or URI.escape(path, /[^\-_.!~*'()a-zA-Z\d;\/?:#&%=+$,]/)
url = "#{protocol}//#{host}/#{path}"
end
I am trying to write a method that is the same as mysqli_real_escape_string in PHP. It takes a string and escapes any 'dangerous' characters. I have looked for a method that will do this for me but I cannot find one. So I am trying to write one on my own.
This is what I have so far (I tested the pattern at Rubular.com and it worked):
# Finds the following characters and escapes them by preceding them with a backslash. Characters: ' " . * / \ -
def escape_characters_in_string(string)
pattern = %r{ (\'|\"|\.|\*|\/|\-|\\) }
string.gsub(pattern, '\\\0') # <-- Trying to take the currently found match and add a \ before it I have no idea how to do that).
end
And I am using start_string as the string I want to change, and correct_string as what I want start_string to turn into:
start_string = %("My" 'name' *is* -john- .doe. /ok?/ C:\\Drive)
correct_string = %(\"My\" \'name\' \*is\* \-john\- \.doe\. \/ok?\/ C:\\\\Drive)
Can somebody try and help me determine why I am not getting my desired output (correct_string) or tell me where I can find a method that does this, or even better tell me both? Thanks a lot!
Your pattern isn't defined correctly in your example. This is as close as I can get to your desired output.
Output
"\\\"My\\\" \\'name\\' \\*is\\* \\-john\\- \\.doe\\. \\/ok?\\/ C:\\\\Drive"
It's going to take some tweaking on your part to get it 100% but at least you can see your pattern in action now.
def self.escape_characters_in_string(string)
pattern = /(\'|\"|\.|\*|\/|\-|\\)/
string.gsub(pattern){|match|"\\" + match} # <-- Trying to take the currently found match and add a \ before it I have no idea how to do that).
end
I have changed above function like this:
def self.escape_characters_in_string(string)
pattern = /(\'|\"|\.|\*|\/|\-|\\|\)|\$|\+|\(|\^|\?|\!|\~|\`)/
string.gsub(pattern){|match|"\\" + match}
end
This is working great for regex
This should get you started:
print %("'*-.).gsub(/["'*.-]/){ |s| '\\' + s }
\"\'\*\-\.
Take a look at the ActiveRecord sanitization methods: http://api.rubyonrails.org/classes/ActiveRecord/Base.html#method-c-sanitize_sql_array
Take a look at escape_string / quote method in Mysql class here