When using the $1-$9 regular expression form, why does calling gsub(...) on one nil the others? - ruby

First, a working example:
string = "foo-bar-25-baz"
if string =~ /(.+)-(10|25)(?:-(baz))?/
puts $1
puts $2
puts $3
end
This produces 'foo-bar', '25' and 'baz' on three lines, as expected. But if we do this:
string = "foo-bar-25-baz"
if string =~ /(.+)-(10|25)(?:-(baz))?/
puts $1.gsub('-', ' ') # Here be the problem
puts $2 # nil
puts $3 # nil
end
the values of $2 and $3 are now nil. I have to puts $2 and puts $3 and then $1.gsub(...), and it will work. As far as I can tell this only applies to gsub and gsub!
This causes the same problem:
string = "foo-bar-25-baz"
if string =~ /(.+)-(10|25)(?:-(baz))?/
puts $3.gsub('hard', 'h')
puts $1 # nil
puts $2 # nil
end
I spent about 15 minutes debugging this and I'm wondering why.

gsub is most likely re-assigning those variables (as would pretty much any other function that uses the regexp engine). If you need to call gsub before using all your original match results, store them to a local variable first with something like match_results = [$1, $2, $3].

Related

How can I get a Ruby variable from a line read from a file?

I need to reuse a textfile that is filled with one-liners such:
export NODE_CODE="mio12"
How can I do that in my Ruby program the var is created and assign as it is in the text file?
If the file were a Ruby file, you could require it and be able to access the variables after that:
# variables.rb
VAR1 = "variable 1"
VAR2 = 2
# ruby.rb
require "variables"
puts VAR1
If you're not so lucky, you could read the file and then loop through the lines, looking for lines that match your criteria (Rubular is great here) and making use of Ruby's instance_variable_set method. The gsub is to deal with extra quotes when the matcher grabs a variable set as a string.
# variables.txt
export VAR1="variable 1"
export VAR2=2
# ruby.rb
variable_line = Regexp.new('export\s(\w*)=(.*)')
File.readlines("variables.txt").each do |line|
if match = variable_line.match(line)
instance_variable_set("##{match[1].downcase}", match[2].gsub("\"", ""))
end
end
puts #var1
puts #var2
Creating a hash from this file can be a fairly simple thing.
For var.txt:
export BLAH=42
export WOOBLE=67
File.readlines("var.txt").each_with_object({}) { |line, h|
h[$1] = $2 if line =~ /^ export \s+ (.+?) \s* \= \s* (.+) $/x
}
# => {"BLAH"=>"42", "WOOBLE"=>"67"}

Evaluate a string with indexed array as values

I would like to take a string that contains positional argument markers (not named), supply it with an array (not hash) of values, and have it evaluated.
The use case as an example would be somewhat like ARGV.
For example,
# given:
string = "echo $1 ; echo $#"
values = ["hello", "world"]
# expected result:
"echo hello ; echo hello world"
The below function is the best I could come up with:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
# Create a variable that can replace $# with all arguments, and quote
# arguments that had "more than one word" originally
all_arguments = arguments.map{|a| a =~ /\s/ ? "\"#{a}\"" : a}.join ' '
# Replace all $1 - $9 with their respective argument ($1 ==> arguments[0])
string.gsub!(/\$(\d)/) { arguments[$1.to_i - 1] }
# Replace $# or $* with all arguments
string.gsub!(/\$[*|#]/, all_arguments)
return string
end
And it seems to me like it can and should be simpler.
I was hoping to find something that is closer to the Kernel.sprintf method of doing things - like "string with %{marker}" % {marker: 'value'}
So, although this issue is almost solved for me (I think), I would love to know if there is something I missed that can make it more elegant.
It seems like you're trying to reproduce Bash-style variable expansion, which is an extremely complex problem. At the very least, though, you can simplify your code in two ways:
Use Kernel.sprintf's built in positional argument feature. The below code does this by substituting e.g. $1 with the sprintf equivalent %1$s.
Use Shellwords from the standard library to escape arguments with spaces etc.
require 'shellwords'
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
tmpl = string.gsub(/\$(\d+)/, '%\1$s')
(tmpl % arguments).gsub(/\$[*#]/, arguments.shelljoin)
end
string = "echo $1 ; echo $#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
If you didn't have the $* requirement I'd suggest just dropping the Bash-like format and just using sprintf, since it covers everything else you mentioned. Even so, you could further simplify things by using sprintf formatting for everything else:
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string.gsub('%#', arguments.shelljoin) % arguments
end
string = "echo %1$s ; echo %#"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
Edit
If you want to use %{1} with sprintf you could turn the input array into a hash where the integer indexes are turned into symbol keys, e.g. ["hello", "world"] becomes { :"1" => "hello", :"2" => "world" }:
require "shellwords"
def evaluate_args(string, arguments)
return string unless arguments.is_a? Array and !arguments.empty?
string % {
:* => arguments.shelljoin,
**arguments.map.with_index {|val,idx| [ :"#{idx + 1}", val ] }.to_h
}
end
string = "echo %{1} ; echo %{*}"
values = ["hello", "world"]
puts evaluate_args(string, values)
# => echo hello ; echo hello world
string = "echo $1 ; echo $# ; echo $2 ; echo $cat"
values = ["hello", "World War II"]
vals = values.map { |s| s.include?(' ') ? "\"#{s}\"" : s }
#=> ["hello", "\"World War II\""]
all_vals = vals.join(' ')
#=> "hello \"World War II\""
string.gsub(/\$\d+|\$[#*]/) { |s| s[/\$\d/] ? vals[s[1..-1].to_i-1] : all_vals }
#=> "echo hello ; echo hello \"World War II\" ; echo \"World War II\" ; echo $cat" $cat"

Printing $0 in different ways

Among the following ways to print $0, the first one works, yet the second one doesn't. Why?
Directly
puts "current $0 is #{$0}"
Constructing $0 variable name (motivated by javascript)
1.times {|i| puts "current $#{i} is #{$i}"}
Because the second one is looking for a variable called "$i" not "$0"
If you want to build the variable name dynamically you'd need to do something like ...
1.times {|i| puts "current $#{i} is #{eval '$'+i.to_s}"}

Variables inside Regexp

Lets say I have the code:
str = "foobar"
print "Enter in the letters you would like to match: "
match = gets
# Pseudocode:
str =~ /[match]/
I don't want to match the whole string: match, I just want to match each of the letters, like:
str =~ /[aeiou]/
would yield the vowels.
How do I make it so I can match the letters the user inputs?
Try this:
match = gets.chomp # cut off that trailing \n
str =~ /[#{match}]/

$1 and \1 in Ruby

When using regular expressions in Ruby, what is the difference between $1 and \1?
\1 is a backreference which will only work in the same sub or gsub method call, e.g.:
"foobar".sub(/foo(.*)/, '\1\1') # => "barbar"
$1 is a global variable which can be used in later code:
if "foobar" =~ /foo(.*)/ then
puts "The matching word was #{$1}"
end
Output:
"The matching word was bar"
# => nil
Keep in mind there's a third option, the block form of sub. Sometimes you need it. Say you want to replace some text with the reverse of that text. You can't use $1 because it's not bound quickly enough:
"foobar".sub(/(.*)/, $1.reverse) # WRONG: either uses a PREVIOUS value of $1,
# or gives an error if $1 is unbound
You also can't use \1, because the sub method just does a simple text-substitution of \1 with the appropriate captured text, there's no magic taking place here:
"foobar".sub(/(.*)/, '\1'.reverse) # WRONG: returns '1\'
So if you want to do anything fancy, you should use the block form of sub ($1, $2, $`, $' etc. will be available):
"foobar".sub(/.*/){|m| m.reverse} # => returns 'raboof'
"foobar".sub(/(...)(...)/){$1.reverse + $2.reverse} # => returns 'oofrab'

Resources