What does the o modifier for a regexp mean? - ruby

Ruby regexp has some options (e.g. i, x, m, o). i means ignore case, for instance.
What does the o option mean? In ri Regexp, it says o means to perform #{} interpolation only once. But when I do this:
a = 'one'
b = /#{a}/
a = 'two'
b does not change (it stays /one/). What am I missing?

Straight from the go-to source for regular expressions:
/o causes any #{...} substitutions in a particular regex literal to be performed just once, the first time it is evaluated. Otherwise, the substitutions will be performed every time the literal generates a Regexp object.
I could also turn up this usage example:
# avoid interpolating patterns like this if the pattern
# isn't going to change:
pattern = ARGV.shift
ARGF.each do |line|
print line if line =~ /#{pattern}/
end
# the above creates a new regex each iteration. Instead,
# use the /o modifier so the regex is compiled only once
pattern = ARGV.shift
ARGF.each do |line|
print line if line =~ /#{pattern}/o
end
So I guess this is rather a thing for the compiler, for a single line that is executed multiple times.

Related

How do I get Ruby's String Split Method to Include Newlines in the Output?

There's this code right here:
thing = "Stuff.\nHello!"
result = thing.split(" ")
# ^Is equal to ["Stuff.", "Hello!"] despite the delimiter being a single space instead of a newline
How do I make it so that the newline is included, making the result var equal to ["Stuff.\n", "Hello!"] instead?
I'm using Ruby 1.9.2. For those who are curious, I need to know this for the sake of a word-wrapping algorithm that replaces line-break tags with newlines.
You can use a regexp with a positive look-behind assertion:
thing = "Stuff.\nHello!"
thing.split(/(?<=\s)/)
#=> ["Stuff.\n", "Hello!"]
The positive look-behind assertion (?<=pat) ensures that the preceding characters match pat, but doesn’t include those characters in the matched text.
One could simply use String#lines.
"Stuff.\nHello!".lines #=> ["Stuff.\n", "Hello!"]
"Stuff.\nHello!\n".lines #=> ["Stuff.\n", "Hello!\n"]

Convert string to camel case in Ruby

Working on a Ruby challenge to convert dash/underscore delimited words into camel casing. The first word within the output should be capitalized only if the original word was capitalized (known as Upper Camel Case).
My solution so far..:
def to_camel_case(str)
str.split('_,-').collect.camelize(:lower).join
end
However .camelize(:lower) is a rails method I believe and doesn't work with Ruby. Is there an alternative method, equally as simplistic? I can't seem to find one. Or do I need to approach the challenge from a completely different angle?
main.rb:4:in `to_camel_case': undefined method `camelize' for #<Enumerator: []:collect> (NoMethodError)
from main.rb:7:in `<main>'
I assume that:
Each "word" is made up of one or more "parts".
Each part is made of up characters other than spaces, hypens and underscores.
The first character of each part is a letter.
Each successive pair of parts is separated by a hyphen or underscore.
It is desired to return a string obtained by modifying each part and removing the hypen or underscore that separates each successive pair of parts.
For each part all letters but the first are to be converted to lowercase.
All characters in each part of a word that are not letters are to remain unchanged.
The first letter of the first part is to remain unchanged.
The first letter of each part other than the first is to be capitalized (if not already capitalized).
Words are separated by spaces.
It this describes the problem correctly the following method could be used.
R = /(?:(?<=^| )|[_-])[A-Za-z][^ _-]*/
def to_camel_case(str)
str.gsub(R) do |s|
c1 = s[0]
case c1
when /[A-Za-z]/
c1 + s[1..-1].downcase
else
s[1].upcase + s[2..-1].downcase
end
end
end
to_camel_case "Little Miss-muffet sat_on_HE$R Tuffett eating-her_cURDS And_whey"
# => "Little MissMuffet satOnHe$r Tuffett eatingHerCurds AndWhey"
The regular expression is can be written in free-spacing mode to make it self-documenting.
R = /
(?: # begin non-capture group
(?<=^| ) # use a positive lookbehind to assert that the next character
# is preceded by the beginning of the string or a space
| # or
[_-] # match '_' or '-'
) # end non-capture group
[A-Za-z] # match a letter
[^ _-]* # match 0+ characters other than ' ', '_' and '-'
/x # free-spacing regex definition mode
Most Rails methods can be added into basic Ruby projects without having to pull in the whole Rails source.
The trick is to figure out the minimum amount of files to require in order to define the method you need. If we go to APIDock, we can see that camelize is defined in active_support/inflector/methods.rb.
Therefore active_support/inflector seems like a good candidate to try. Let's test it:
irb(main)> require 'active_support/inflector'
=> true
irb(main)> 'foo_bar'.camelize
=> "FooBar"
Seems to work. Note that this assumes you already ran gem install activesupport earlier. If not, then do it first (or add it to your Gemfile).
In pure Ruby, no Rails, given str = 'my-var_name' you could do:
delimiters = Regexp.union(['-', '_'])
str.split(delimiters).then { |first, *rest| [first, rest.map(&:capitalize)].join }
#=> "myVarName"
Where str = 'My-var_name' the result is "MyVarName", since the first element of the splitting result is untouched, while the rest is mapped to be capitalized.
It works only with "dash/underscore delimited words", no spaces, or you need to split by spaces, then map with the presented method.
This method is using string splitting by delimiters, as explained here Split string by multiple delimiters,
chained with Object#then.

Ruby Return the content of a text(.txt) file after replacing variables

Imagine I have 2 files:
A.txt
This is a sentence with a #{variable}.
and a ruby script.
Iamascript.rb
...
variable = "period"
...
Is there any way I can read the content of the .txt file and insert the variable before puts'ing it?
This means my output when running the rb-script shall be
This is a sentence with a period.
The .txt file is dynamic.
What you are looking for is commonly called templating, and you have basically defined a template language. Ruby actually ships with a template language called ERb in the standard library, so, if you are willing to change the syntax of your template language a bit, you can just use that instead of having to invent your own:
A.txt:
This is a sentence with a <%=variable%>.
Iamascript.rb
require 'erb'
variable = 'period'
puts ERB.new(File.read('A.txt')).result(binding)
# This is a sentence with a period.
There's one "obvious" (but bad) solution, that'd be eval. eval runs the bit of code you give it.
This is an issue for security concerns, but can be what you're looking for if you want if you need complex expressions in #{...}.
The more correct way to do it if you care even a tiny little bit about security is to use Ruby's formatting operator: % (similar to Python's).
template = "the variable's value is %{var}"
puts template % {var: "some value"} => prints "the variable's value is some value"
Suppose the file "A.txt" contains the single line of text (or this line is extracted from the file):
s1 = 'This is a sentence with a #{my_var}'
and the second file, "Iamascript.rb", contains:
s2 =<<_
line of code
line of code
my_var = "period"
line of code
line of code
_
#=> "line of code\n line of code\n my_var = 'period'\n line of code\nline of code\n"
Let's create those files:
File.write("A.txt", s1)
#=> 35
File.write("Iamascript.rb", s2)
#=> 78
Now read the first line of "A.txt" and extract the string beginning "\#{" and ending "}", then extract the variable name from that string.
r1 = /
\#\{ # match characters
[_a-z]+ # match > 0 understores or lower case letters
\} # match character
/x # free-spacing regex definition mode
s1 = File.read("A.txt")
#=> "This is a sentence with a #{my_var}"
match = s1[r1]
#=> "\#{my_var}"
var_name = match[2..-2]
#=> "my_var"
Now read "Iamascript.rb" and look for a line that matches the following regex.
r2 = /
\A # match beginning of string
#{var_name} # value of var_name
\s*=\s* # match '=' and surrounding whitespace
([\"']) # match a single or double quote in capture group 1
([^\"']+) # match other than single or double quote in capture group 2
([\"']) # match a single or double quote in capture group 3
\z # match end of string
/x # free-spacing regex definition mode
#=> /
# \A # match beginning of string
# my_var # value of var_name
# \s*=\s* # match '=' and surrounding whitespace
# ([\"']) # match a single or double quote in capture group 1
# ([^\"']+) # match other than single or double quote in capture group 2
# ([\"']) # match a single or double quote in capture group 3
# \z # match end of string
# /x
If a match is found return the line from "A.txt" with the text substitution, else return nil.
if File.foreach("Iamascript.rb").find { |line| line.strip =~ r2 && $1==$3 }
str.sub(match, $2)
else
nil
end
#=> "This is a sentence with a period"

What's different about this ruby regex?

I was trying to substitute either a comma or a percent sign, and it continually failed, so I opened up IRB and tried some things out. Can anyone explain to me why the first regex (IRB line 13) doesn't work but the flipped version does (IRB line 15)? I've looked it up and down and I don't see any typos, so it must be something to do with the rule but I can't see what.
b.gsub(/[%]*|[,]*/,"")
# => "245,324"
b.gsub(/[,]*/,"")
# => "245324"
b.gsub(/[,]*|[%]*/,"")
# => "245324"
b
# => "245,324"
Because ruby happily finds [%]* zero times throughout your string and does the substitution. Check out this result:
b = '232,000'
puts b.gsub(/[%]*/,"-")
--output:--
-2-3-2-,-0-0-0-
If you put all the characters that you want to erase into the same character class, then you will get the result you want:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '')
--output:--
245324000
Even then, there are a lot of needless substitutions going on:
b = "%245,324,000%"
puts b.gsub(/[%,]*/, '-')
--output:--
--2-4-5--3-2-4--0-0-0--
It's the zero or more that gets you into trouble because ruby can find lots of places where there are 0 percent signs or 0 commas. You actually don't want to do substitutions where ruby finds zero of your characters, instead you want to do substitutions where at least one of your characters occurs:
b = '%232,000,000%'
puts b.gsub(/%+|,+/,"")
--output:--
232000000
Or, equivalently:
puts b.gsub(/[%,]+/, '')
Also, note that regexes are like double quoted strings, so you can interpolate into them--it's as if the delimiters // are double quotes:
one_or_more_percents = '%+'
one_or_more_commas = ',+'
b = '%232,000,000%'
puts b.gsub(/#{one_or_more_percents}|#{one_or_more_commas}/,"")
--output:--
232000000
But when your regexes consist of single characters, just use a character class: [%,]+

Looking to clean up a small ruby script

I'm looking for a much more idiomatic way to do the following little ruby script.
File.open("channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
end
Thanks in advance for any suggestions.
The original:
File.open("channels.xml").each do |line|
if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
end
can be changed into this:
m = nil
open("channels.xml").each do |line|
puts m if m = line.match(%r|(mms://{1}[\w\./-]+)|)
end
File.open can be changed to just open.
if XYZ
puts XYZ
end
can be changed to puts x if x = XYZ as long as x has occurred at some place in the current scope before the if statement.
The Regexp '(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)' can be refactored a little bit. Using the %rXX notation, you can create regular expressions without the need for so many backslashes, where X is any matching character, such as ( and ) or in the example above, | |.
This character class [a-zA-Z\.\d\/\w-] (read: A to Z, case insensitive, the period character, 0 to 9, a forward slash, any word character, or a dash) is a little redundant. \w denotes "word characters", i.e. A-Za-z0-9 and underscore. Since you specify \w as a positive match, A-Za-z and \d are redundant.
Using those 2 cleanups, the Regexp can be changed into this: %r|(mms://{1}[\w\./-]+)|
If you'd like to avoid the weird m = nil scoping sorcery, this will also work, but is less idiomatic:
open("channels.xml").each do |line|
m = line.match(%r|(mms://{1}[\w\./-]+)|) and puts m
end
or the longer, but more readable version:
open("channels.xml").each do |line|
if m = line.match(%r|(mms://{1}[\w\./-]+)|)
puts m
end
end
One very easy to read approach is just to store the result of the match, then only print if there's a match:
File.open("channels.xml").each do |line|
m = line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
puts m if m
end
If you want to start getting clever (and have less-readable code), use $& which is the global variable that receives the match variable:
File.open("channels.xml").each do |line|
puts $& if line.match('(mms:\/\/{1}[a-zA-Z\.\d\/\w-]+)')
end
Personally, I would probably just use the POSIX grep command. But there is Enumerable#grep in Ruby, too:
puts File.readlines('channels.xml').grep(%r|mms://{1}[\w\./-]+|)
Alternatively, you could use some of Ruby's file and line processing magic that it inherited from Perl. If you pass the -p flag to the Ruby interpreter, it will assume that the script you pass in is wrapped with while gets; ...; end and at the end of each loop it will print the current line. You can then use the $_ special variable to access the current line and use the next keyword to skip iteration of the loop if you don't want the line printed:
ruby -pe 'next unless $_ =~ %r|mms://{1}[\w\./-]+|' channels.xml
Basically,
ruby -pe 'next unless $_ =~ /re/' file
is equivalent to
grep -E re file

Resources