I am reading Metaprogramming Ruby book, and there is method, which I cant understant:
def to_alphanumeric(s)
s.gsub /[^\w\s]/, ''
end
I see there is Argument Variable (s), which is called lately and is converted to some weird expression?
What exactly can I do with this method, is he useful?
Following method works just fine:
def to_alphanumeric(s)
s.gsub %r([aeiou]), '<\1>'
end
p = to_alphanumeric("hello")
p p
>> "h<>ll<>"
But if I upgrade method to class, simply calling the method + argv to_alphanumeric, no longer work:
class String
def to_alphanumeric(s)
s.gsub %r([aeiou]), '<\1>'
end
end
p = to_alphanumeric("hello")
p p
undefined method `to_alphanumeric' for String:Class (NoMethodError)
Would it hurt to check the documentation?
http://www.ruby-doc.org/core-2.0/String.html#method-i-gsub
Returns a copy of str with the all occurrences of pattern substituted for the second argument.
The /[^\w\s]/ pattern means "everything that is not a word or whitespace"
Take a look at Rubular, the regular expression /[^\w\s]/ matches special characters like ^, /, or $ which are neither word characters (\w) or whitespace (\s). Therefore the function removes special characters like ^, / or $.
>> "^/$%hel1241lo".gsub /[^\w\s]/, ''
=> "hel1241lo"
call it simple like a function:
>> to_alphanumeric("U.S.A!")
=> "USA"
Related
I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.
You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"
Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"
Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar
I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ādā, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].
I see in the documentation I'm able to do:
/\$(?<dollars>\d+)\.(?<cents>\d+)/ =~ "$3.67" #=> 0
puts dollars #=> prints 3
I was wondering if this would be possible:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
/#{Regexp.escape(string)}/ =~ "$3.67"
I get:
`<main>': undefined local variable or method `dlr' for main:Object (NameError)
There are a few mistakes in your approach. First of all, let's look at your string:
string = "\$(\?<dlr>\d+)\.(\?<cts>\d+)"
You escape the dollar sign with "\$", but that is the same as just writing "$", consider:
"\$" == "$"
#=> true
To actually end up with the string "backslash followed by dollar" you would need to write "\\$". The same thing applies to the decimal character classes, you would have to write "\\d" to end up with the correct string.
The question marks on the other hand are actually part of the regex syntax, so you do not want to escape these at all. I recommend using single quotes for your original string, because that makes the input much easier:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
#=> "\\$(?<dlr>\\d+)\\.(?<cts>\\d+)"
The next issue is with Regexp.escape. Take a look at what regular expression it produces with the above string:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
Regexp.escape(string)
#=> "\\\\\\$\\(\\?<dlr>\\\\d\\+\\)\\\\\\.\\(\\?<cts>\\\\d\\+\\)"
That's one level too much escaping. Regexp.escape can be used when you want to match the literal characters that are contained in the string. For example, the escaped regex above will match the source string itself:
/#{Regexp.escape(string)}/ =~ string
#=> 0 # matches at offset 0
Instead, you can use Regexp.new to treat the source as an actual regular expression.
The last issue is then how you access the match result. Obviously, you are getting a NoMethodError. You might think that the match result is stored in local variables called dlr and cts, but that is not the case. You have two options to access the match data:
Use Regexp.match, it will return a MatchData object as result
Use regexp =~ string and then access the last match data with the global variable $~
I prefer the former, because it is easier to read. The full code would then look like this:
string = '\$(?<dlr>\d+)\.(?<cts>\d+)'
regexp = Regexp.new(string)
result = regexp.match("$3.67")
#=> #<MatchData "$3.67" dlr:"3" cts:"67">
result[:dlr]
#=> "3"
result[:cts]
#=> "67"
Given the following example in Ruby 2.0.0:
class Regexp
def self.build
NumRegexp.new("-?[\d_]+")
end
end
class NumRegexp < Regexp
def match(value)
'hi two'
end
def =~(value)
'hi there'
end
end
var_ex = Regexp.build
var_ex =~ '12' # => "hi there" , as expected
'12' =~ var_ex # => nil , why? It was expected "hi there" or "hi two"
According to the documentation of Ruby of the =~ operator for the class String:
str =~ obj ā fixnum or nil
"If obj is a Regexp, use it as a pattern to match against str,and returns the position the match starts, or nil if there is no match. Otherwise, invokes obj.=~, passing str as an argument. The default =~ in Object returns nil."
http://www.ruby-doc.org/core-2.0.0/String.html#method-i-3D-7E
It is a fact that the variable var_ex is an object of class NumRegexp, hence, it is not a Regexp object. Therefore, it should invoke the method obj.=~ passing the string as an argument, as indicated in the documentation and returning "hi there".
In another case, maybe as NumRegexp is a subclass of Regexp it could be considered a Regexp type. Then, "If obj is a Regexp use it as a pattern to match against str". It should return "hi two" in that case.
What is wrong in my reasoning? What do I have to do to achieve the desired functionality?
I've found that the record:
var_ex =~ '12'
isn't the same of:
'12' =~ var_ex
It seems that there are no string method that calls to regexp class #~= method back, this is a bug already reported, and expected to be solved in 2.2.0. So you have to declare it explicitly:
class String
alias :__system_match :=~
def =~ regex
regex.is_a?( Regexp ) && ( regex =~ self ) || __system_match( regex )
end
end
'12' =~ /-?[\d_]+/
# => 0
This is a possible and acceptable solution using monkey patching but it presents some problems to take into account:
The problem with this is that we have now polluted the namespace with a superfluous __system_match method. This method will show up in our documentation, it will show up in code completion in our IDEs, it will show up during reflection. Also, it still can be called, but presumably we monkey patched it, because we didn't like its behavior in the first place, so we might not want other people to call it.
The reason is that you are calling =~ method on a string, not on your NumRegexp object. You need to tell String how to behave:
class String
def =~(reg)
return reg=~self if reg.is_a? NumRegexp
super
end
end
I am having trouble with named captures in regular expressions in Ruby 2.0. I have a string variable and an interpolated regular expression:
str = "hello world"
re = /\w+/
/(?<greeting>#{re})/ =~ str
greeting
It raises the following exception:
prova.rb:4:in <main>': undefined local variable or methodgreeting' for main:Object (NameError)
shell returned 1
However, the interpolated expression works without named captures. For example:
/(#{re})/ =~ str
$1
# => "hello"
Named Captures Must Use Literals
You are encountering some limitations of Ruby's regular expression library. The Regexp#=~ method limits named captures as follows:
The assignment does not occur if the regexp is not a literal.
A regexp interpolation, #{}, also disables the assignment.
The assignment does not occur if the regexp is placed on the right hand side.
You'll need to decide whether you want named captures or interpolation in your regular expressions. You currently cannot have both.
Assign the result of #match; this will be accessible as a hash that allows you to look up your named capture groups:
> matches = "hello world".match(/(?<greeting>\w+)/)
=> #<MatchData "hello" greeting:"hello">
> matches[:greeting]
=> "hello"
Alternately, give #match a block, which will receive the match results:
> "hello world".match(/(?<greeting>\w+)/) {|matches| matches[:greeting] }
=> "hello"
As an addendum to both answers in order to make it crystal clear:
str = "hello world"
# => "hello world"
re = /\w+/
# => /\w+/
re2 = /(?<greeting>#{re})/
# => /(?<greeting>(?-mix:\w+))/
md = re2.match str
# => #<MatchData "hello" greeting:"hello">
md[:greeting]
# => "hello"
Interpolation is fine with named captures, just use the MatchData object, most easily returned via match.