Ruby: replace a given URL in an HTML string - ruby

In Ruby, I want to replace a given URL in an HTML string.
Here is my unsuccessful attempt:
escaped_url = url.gsub(/\//,"\/").gsub(/\./,"\.").gsub(/\?/,"\?")
path_regexp = Regexp.new(escaped_url)
html.gsub!(path_regexp, new_url)
Note: url is actually a Google Chart request URL I wrote, which will not have more special characters than /?|.=%:

The gsub method can take a string or a Regexp as its first argument, same goes for gsub!. For example:
>> 'here is some ..text.. xxtextxx'.gsub('..text..', 'pancakes')
=> "here is some pancakes xxtextxx"
So you don't need to bother with a regex or escaping at all, just do a straight string replacement:
html.gsub!(url, new_url)
Or better, use an HTML parser to find the particular node you're looking for and do a simple attribute assignment.

I think you're looking for something like:
path_regexp = Regexp.new(Regexp.escape(url))

Related

Ruby string interpolation with substitution

I have a given method that adds keys to urls with:
url % {:key => key}
But for one url I need the key to be escaped with CGI.escape. I cannot change the method, I can only change the url, but substitution does not work:
"https://www.example.com?search=#{CGI.escape(%{key})}"
Is there a way to achieve this only by changing the url string? I cannot use additional variables or change the method, thus I cannot do the escaping in the method and send the escaped key to the url string.
It isn't clear how your given method is supposed to work. Can you give an example where the method works, and one where it doesn't? Ignoring the method part of your question, and focusing on the URL bit,
>> key = "Baby Yoda"
=> "Baby Yoda"
>> %{key}
=> "key"
is the expected result, regardless of whether you have a variable named key, set to any value. See: https://en.wikibooks.org/wiki/Ruby_Programming/Syntax/Literals#The_.25_Notation
Unless you have a method defined which overloads '%' to do something else special for URLs, but that isn't clear in your question.
If you just want to CGI escape the value of 'key' within your URL string, don't use the percent notation:
>> key = 'Baby Yoda'
=> "Baby Yoda"
>> "https://www.example.com?search=#{CGI.escape(key)}"
=> "https://www.example.com?search=Baby+Yoda"
It just seems not possible. I worked around by defining a syntax ${...}
"https://www.example.com?search=${CGI.escape(%{key})}"
Then I first do subtitution of %{key} and then use eval to do CGI.Escape (or any method for that matter) with
gsub(/\${(.+?)}/) { |e| eval($1) }

Replacing scan by gsub in Ruby: how to allow code in gsub block?

I am parsing a Wiki text from an XML dump, for a string named 'section' which includes templates in double braces, including some arguments, which I want to reorganize.
This has an example named TextTerm:
section="Sample of a text with a first template {{TextTerm|arg1a|arg2a|arg3a...}} and then a second {{TextTerm|arg1b|arg2b|arg3b...}} etc."
I can use scan and a regex to get each template and work on it on a loop using:
section.scan(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/i).each { |item| puts "1=" + item[1] # arg1a etc.}
And, I have been able to extract the database of the first argument of the template.
Now I also want to replace the name of the template "NewTextTerm" and reorganize its arguments by placing the second argument in place of the first.
Can I do it in the same loop? For example by changing scan by a gsub(rgexp){ block}:
section.gsub!(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| '{{NewTextTerm|\2|\1}}'}
I get:
"Sample of a text with a first template {{NewTextTerm|\\2|\\1}} and then a second {{NewTextTerm|\\2|\\1}} etc."
meaning that the arguments of the regexp are not recognized. Even if it worked, I would like to have some place within the gsub block to work on the arguments. For example, I can't have a puts in the gsub block similar to the scan().each block but only a string to be substituted.
Any ideas are welcome.
PS: Some editing: braces and "section= added", code is complete.
When you have the replacement as a string argument, you can use '\1', etc. like this:
string.gsub!(regex, '...\1...\2...')
When you have the replacement as a block, you can use "#$1", etc. like this:
string.gsub!(regex){"...#$1...#$2..."}
You are mixing the uses. Stick to either one.
Yes, changing the quote by a double quote isn't enough, #$1 is the answer. Here is the complete code:
section="Sample of a text with a first template {{TextTerm|arg1a|arg2a|arg3a...}} and then a second {{TextTerm|arg1b|arg2b|arg3b...}} etc."
section.gsub(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| "{{New#$1|#$3|#$2}}"}
"Sample of a text with a first template {{NewTextTerm|arg2a|arg3a...|arg1a}} and then a second {{NewTextTerm|arg2b|arg3b...|arg1b}} etc."
Thus, it works. Thanks.
But now I have to replace the string, by a "function" returning the changed string:
def stringreturn(arg1,arg2,arg3) strr = "{{New"+arg1 + arg3 +arg2 + "}}"; return strr ; end
and
section.gsub(/\{\{(TextTerm)\|(.*?)\|(.*?)\}\}/) { |item| stringreturn("#$1","|#$2","|#$3") }
will return:
"Sample of a text with a first template {{NewTextTerm|arg2a|arg3a...|arg1a}} and then a second {{NewTextTerm|arg2b|arg3b...|arg1b}} etc."
Thanks to all!
There is probably a better way to manipulate arguments in MediaWiki templates using Ruby.

Proper gsub regular expression for this URL?

Say I have a string representing a URL:
http://www.mysite.com/somepage.aspx?id=33
..I'd like to escape the forward slashes and the question mark:
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How can I do this via gsub? I've been playing with some regular expressions in there but haven't hit on the winning formula yet.
I suggest you use
url = url.gsub(/(?=[\/?])/, '\\')
As shown here
url = 'http://www.mysite.com/somepage.aspx?id=33'
url = url.gsub(/(?=[\/?])/, '\\')
puts url
output
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How about this one result = searchText.gsub(/(\/|\?)/, "\\\\$1")
I will suggest using a block to make it more readable:
url.gsub(/[\/?]/) { |c| "\\#{c}" }

Extract URLs from text using Ruby while handling matched parens

URI.extract claims to do this, but it doesn't handle matched parens:
>> URI.extract("text here (http://foo.example.org/bla) and here")
=> ["http://foo.example.org/bla)"]
What's the best way to extract URLs from text without breaking parenthesized URLs (which users like to use)?
If the URLs are always bound by parentheses a Regular Expression might be a better solution.
text = "text here (http://foo.example.org/bla) and here and here is (http://yet.another.url/with/parens) and some more text"
text.scan /\(([^\)]*)\)/
Before using this
>> URI.extract("text here (http://foo.example.org/bla) and here")
=> ["http://foo.example.org/bla)"]
You need to add this
require 'uri'
You could use this regexp to extract URL's from a string
"some thing http://abcd.com/ and http://google.com are great".scan(/(?:http|https):\/\/[a-z0-9]+(?:[\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(?:(?::[0-9]{1,5})?\/[^\s]*)?/ix)

What is Ruby doing with gsub here?

I'm working on converting code from Ruby to Node.js. I came across these lines at the end of a function and I'm curious what the original developers were trying to accomplish:
url = url.gsub "member_id", "member_id__hashed"
url = url.gsub member_id, member_id_hashed
url
I'm assuming that url at the end is Ruby's equivalent to return url;
as for the lines with gsub, from what I've found online that's the wrong syntax, right? Shouldn't it be:
url = url.gsub(var1, var2)?
If it is correct, why are they calling it twice, once with quotes and once without?
gsub does a global substitute on a string. If I had to guess, the URL might be in the form of
http://somewebsite.com?member_id=123
If so, the code has the following effect:
url.gsub "member_id", "member_id__hashed"
# => "http://somewebsite.com?member_id__hashed=123"
Assuming member_id = "123", and member_id_hashed is some hashed version of the id, then the second line would replace "123" with the hashed version.
url.gsub member_id, member_id_hashed
# => "http://somewebsite.com?member_id__hashed=abc"
So you're going from http://somewebsite.com?member_id=123 to http://somewebsite.com?member_id__hashed=abc
Documentation: https://ruby-doc.org/core-2.6/String.html#method-i-gsub
I'm assuming that the url at the end is Ruby's equivalent to return url;
If that code is part of a method or block, indeed, the line url is the value returned by the method. This is because by default a method in Ruby returns the value of the last expression that was evaluated in the method. The keyword return can be used (as in many other languages) to produce an early return of a method, with or without a return value.
that's the wrong syntax, right? shouldn't it be
url = url.gsub(var1, var2)?
The arguments used to invoke a method in Ruby may stay in parentheses but they may, as well, be listed after the method name, without parentheses.
Both:
url = url.gsub var1, var2
and
url = url.gsub(var1, var2)
are correct and they produce the same result.
The convention in Ruby is to not put parentheses around method arguments but this is not always possible. One such case is when one of the arguments is a call of another method with arguments.
The parentheses are then used to make everything clear both for the interpreter and the readers of the code.
If it is correct, why are they calling it twice, once with quotes and once without?
There are two calls of the same method, with different arguments:
url = url.gsub "member_id", "member_id__hashed"
The arguments of url.gsub are the literal strings "member_id" and "member_id__hashed".
url = url.gsub member_id, member_id_hashed
This time the arguments are the variables member_id and member_id_hashed.
This works the same in JavaScript and many other languages that use double quotes to enclose the string literals.
String#gsub is a method of class String that does search & replace in a string and returns a new string. It's name is short of "global substitute" (it replaces all occurrences). To replace only the first occurrence use String#sub.

Resources