I ran the following from a bash shell:
echo 'hello world' | ruby -ne 'puts $_ if /hello/'
I thought it was a typo at first, but it outputted hello world surprisingly.
I meant to type:
echo 'hello world' | ruby -ne 'puts $_ if /hello/ === $_'
Can anyone give an explanation, or point to documentation, to why we get this implicit comparison to $_?
I'd also like to note:
echo 'hello world' | ruby -ne 'puts $_ if /test/'
Won't output anything.
The Ruby parser has a special case for regular expression literals in conditionals. Normally (i.e. without using the e, n or p command line options) this code:
if /foo/
puts "TRUE!"
end
produces:
$ ruby regex-in-conditional1.rb
regex-in-conditional1.rb:1: warning: regex literal in condition
Assigning something that matches the regex to $_ first, like this:
$_ = 'foo'
if /foo/
puts "TRUE!"
end
produces:
$ ruby regex-in-conditional2.rb
regex-in-conditional2.rb:2: warning: regex literal in condition
TRUE!
This is a (poorly documented) exception to the normal rules for Ruby conditionals, where anything that’s not false or nil evaluates as truthy.
This only applies to regex literals, the following behaves as you might expect for a conditional:
regex = /foo/
if regex
puts "TRUE!"
end
output:
$ ruby regex-in-conditional3.rb
TRUE!
This is handled in the parser. Searching the MRI code for the text of the warning produces a single match in parse.y:
case NODE_DREGX:
case NODE_DREGX_ONCE:
warning_unless_e_option(parser, node, "regex literal in condition");
return NEW_MATCH2(node, NEW_GVAR(rb_intern("$_")));
I don’t know Bison, so I can’t explain exactly what is going on here, but there are some clues you can deduce. The warning_unless_e_option function simply suppresses the warning if the -e option has been set, as this feature is discouraged in normal code but can be useful in expressions from the command line (this explains why you don’t see the warning in your code). The next line seems to be constructing a parse subtree which is a regular expression match between the regex and the $_ global variable, which contains “[t]he last input line of string by gets or readline”. These nodes will then be compiled into the usually regular expression method call.
That shows what is happening, I’ll just finish with a quote from the Kernel#gets documentation which may explain why this is such an obscure feature
The style of programming using $_ as an implicit parameter is gradually losing favor in the Ruby community.
After digging through the Ruby source (MRI), I think I found an explanation.
The code:
pp RubyVM::InstructionSequence.compile('puts "hello world" if /hello/').to_a
produces the following output:
...
[:trace, 1],
[:putobject, /hello/],
[:getspecial, 0, 0],
[:opt_regexpmatch2],
...
The instructions seem to be calling opt_regexpmatch2 with two arguments, the first argument being the regex /hello/ and the second being a return value from getspecial
getspecial can be found in insns.def
/**
#c variable
#e Get value of special local variable ($~, $_, ..).
#j 特殊なローカル変数($~, $_, ...)の値を得る。
*/
DEFINE_INSN
getspecial
(rb_num_t key, rb_num_t type)
()
(VALUE val)
{
val = vm_getspecial(th, GET_LEP(), key, type);
}
Note that our instructions are most likely telling the VM to bring back the value of $_. $_ is automatically set for us when we run ruby with the correct options, e.g., -n
Now that we have our two arguments, we call opt_regexpmatch2
/**
#c optimize
#e optimized regexp match 2
#j 最適化された正規表現マッチ 2
*/
DEFINE_INSN
opt_regexpmatch2
(CALL_INFO ci)
(VALUE obj2, VALUE obj1)
(VALUE val)
{
if (CLASS_OF(obj2) == rb_cString &&
BASIC_OP_UNREDEFINED_P(BOP_MATCH, STRING_REDEFINED_OP_FLAG)) {
val = rb_reg_match(obj1, obj2);
}
else {
PUSH(obj2);
PUSH(obj1);
CALL_SIMPLE_METHOD(obj2);
}
}
At the end of the day
if /hello/' is equivalent to if $_ =~ /hello/ -- $_ will be nil unless we run ruby with the correct options.
Related
I'm writing a simple method to detect and strip tags from text strings. Given this input string:
{{foobar}}
The function has to return
foobar
I thought I could just chain multiple chomp! methods, like so:
"{{foobar}}".chomp!("{{").chomp!("}}")
but this won't work, because the first chomp! returns a NilClass. I can do it with regular chomp statements, but I'm really looking for a one-line solution.
The String class documentation says that chomp! returns a Str if modifications have been made - therefore, the second chomp! should work. It doesn't, however. I'm at a loss at what's happening here.
For the purposes of this question, you can assume that the input string is always a tag which begins and ends with double curly braces.
You can definitely chain multiple chomp statements (the non-bang version), still having a one-line solution as you wanted:
"{{foobar}}".chomp("{{").chomp("}}")
However, it will not work as expected because both chomp! and chomp removes the separator only from the end of the string, not from the beginning.
You can use sub
"{{foobar}}".sub(/{{(.+)}}/, '\1')
# => "foobar"
"alfa {{foobar}} beta".sub(/{{(.+)}}/, '\1')
# => "alfa foobar beta"
# more restrictive
"{{foobar}}".sub(/^{{(.+)}}$/, '\1')
# => "foobar"
Testing this out, it's clear that chomp! will return nil if the separator it's provided as an argument is not present at the end of the string.
So "{{text}}".chomp!("}}") returns a string, but "{{text}}".chomp!("{{") reurns nil.
See here for an answer of how to chomp at the beginning of a string. But recognize that chomp only looks at the end of the string. So you can call str.reverse.chomp!("{{").reverse to remove the opening brackets.
You could also use a regex:
string = "{{text}}"
puts [/^\{\{(.+)\}\}$/, 1]
# => "text"
Try tr:
'{{foobar}}'.tr('{{', '').tr('}}', '')
You can also use gsub or sub but if the replacement is not needed as pattern, then tr should be faster.
If there are always curly braces, then you can just slice the string:
'{{foobar}}'[2...-2]
If you plan to make a method which returns the string without curly braces then DO NOT use bang versions. Modifying the input parameter of a method will be suprising!
def strip(string)
string.tr!('{{', '').tr!('}}', '')
end
a = '{{foobar}}'
b = strip(a)
puts b #=> foobar
puts a #=> foobar
I am trying to use gsub or sub on a regex passed through terminal to ARGV[].
Query in terminal: $ruby script.rb input.json "\[\{\"src\"\:\"
Input file first 2 lines:
[{
"src":"http://something.com",
"label":"FOO.jpg","name":"FOO",
"srcName":"FOO.jpg"
}]
[{
"src":"http://something123.com",
"label":"FOO123.jpg",
"name":"FOO123",
"srcName":"FOO123.jpg"
}]
script.rb:
dir = File.dirname(ARGV[0])
output = File.new(dir + "/output_" + Time.now.strftime("%H_%M_%S") + ".json", "w")
open(ARGV[0]).each do |x|
x = x.sub(ARGV[1]),'')
output.puts(x) if !x.nil?
end
output.close
This is very basic stuff really, but I am not quite sure on how to do this. I tried:
Regexp.escape with this pattern: [{"src":".
Escaping the characters and not escaping.
Wrapping the pattern between quotes and not wrapping.
Meditate on this:
I wrote a little script containing:
puts ARGV[0].class
puts ARGV[1].class
and saved it to disk, then ran it using:
ruby ~/Desktop/tests/test.rb foo /abc/
which returned:
String
String
The documentation says:
The pattern is typically a Regexp; if given as a String, any regular expression metacharacters it contains will be interpreted literally, e.g. '\d' will match a backlash followed by ‘d’, instead of a digit.
That means that the regular expression, though it appears to be a regex, it isn't, it's a string because ARGV only can return strings because the command-line can only contain strings.
When we pass a string into sub, Ruby recognizes it's not a regular expression, so it treats it as a literal string. Here's the difference in action:
'foo'.sub('/o/', '') # => "foo"
'foo'.sub(/o/, '') # => "fo"
The first can't find "/o/" in "foo" so nothing changes. It can find /o/ though and returns the result after replacing the two "o".
Another way of looking at it is:
'foo'.match('/o/') # => nil
'foo'.match(/o/) # => #<MatchData "o">
where match finds nothing for the string but can find a hit for /o/.
And all that leads to what's happening in your code. Because sub is being passed a string, it's trying to do a literal match for the regex, and won't be able to find it. You need to change the code to:
sub(Regexp.new(ARGV[1]), '')
but that's not all that has to change. Regexp.new(...) will convert what's passed in into a regular expression, but if you're passing in '/o/' the resulting regular expression will be:
Regexp.new('/o/') # => /\/o\//
which is probably not what you want:
'foo'.match(/\/o\//) # => nil
Instead you want:
Regexp.new('o') # => /o/
'foo'.match(/o/) # => #<MatchData "o">
So, besides changing your code, you'll need to make sure that what you pass in is a valid expression, minus any leading and trailing /.
Based on this answer in the thread Convert a string to regular expression ruby, you should use
x = x.sub(/#{ARGV[1]}/,'')
I tested it with this file (test.rb):
puts "You should not see any number [0123456789].".gsub(/#{ARGV[0]}/,'')
I called the file like so:
ruby test.rb "\d+"
# => You should not see any number [].
I have the following block in a ruby script:
for line in allLines
line.match(/aPattern/) { |matchData|
# Do something with matchData
}
end
If /aPattern/ does not match anything in line, will the block still run? And if not, is there a way I can force it to run?
The answer is no, the match block will not be run if the match does not suceed. However, for is generally not used in Ruby anyways, each is more idiomatic, like:
allLines.each do |line|
if line =~ /aPattern/
do_thing_with_last_match($~) ## $~ is last match
else
do_non_match_thing_with_line
end
end
Note, =~ is a regex match operator.
I have a command line tool which generates a flat json doc. Imagine this:
$ ruby gen-json.rb --foo bar --baz qux
{
"foo": "bar"
"baz": "qux"
}
What I want is this to work:
$ ruby gen-json.rb --foo $'\0' --baz qux
{
"foo": null,
"baz": "qux"
}
Instead of null I get an empty string. To simplify the problem even further consider this:
$ cat nil-args.rb
puts "argv[0] is nil" if ARGV[0].nil?
puts "argv[0] is an empty string" if ARGV[0] == ""
puts "argv[1] is nil" if ARGV[1].nil?
I want to run it like this and get this output:
$ ruby nil-args.rb $'\0' foo
argv[0] is nil
But instead I get
argv[0] is an empty string
I suspect this is (arguably) a bug in the ruby interpreter. It is treating argv[0] as a C string which null terminates.
Command line arguments are always strings. You will need to use a sentinel to indicate arguments you want to treat otherwise.
I'm pretty sure you literally cannot do what you are proposing. It's a fundamental limitation of the shell you are using. You can only ever pass string arguments into a script.
It has already been mentioned in a comment but the output you get with the \0 method you tried makes perfect sense. The null terminator technically is an empty string.
Also consider that accessing any element of an array that has not yet been defined will always be nil.
a = [1, 2, 3]
a[10].nil?
#=> true
A possible solution, however, would be for your program to work like this:
$ ruby gen-json.rb --foo --baz qux
{
"foo": null,
"baz": "qux"
}
So when you have a double minus sign argument followed by another double minus sign argument, you infer that the first one was null. You will need to write your own command line option parser to achieve this, though.
Here is a very very simple example script that seems to work (but likely has edge cases and other problems):
require 'pp'
json = {}
ARGV.each_cons(2) do |key, value|
next unless key.start_with? '--'
json[key] = value.start_with?('--') ? nil : value
end
pp json
Would that work for your purposes? :)
Ruby treats EVERYTHING except nil and false as true. Therefore, any empty string will evaluate as true (as well as 0 or 0.0).
You can force the nil behaviour by having something like this:
ARGV.map! do |arg|
if arg.empty?
nil
else
arg
end
end
this way, any empty strings will be transformed in a reference to nil.
You're going to have to decide on a special character to mean nil. I would suggest "\0" (double-quotes ensures the literal string backslash zero is passed in). You can then use a special type converter in OptionParser:
require 'optparse'
require 'json'
values = {}
parser = OptionParser.new do |opts|
opts.on("--foo VAL",:nulldetect) { |val| values[:foo] = val }
opts.on("--bar VAL",:nulldetect) { |val| values[:foo] = val }
opts.accept(:nulldetect) do |string|
if string == '\0'
nil
else
string
end
end
end
parser.parse!
puts values.to_json
Obviously, this requires you to be explicit about the flags you accept, but if you want to accept any flag, you can certainly just hand-jam the if statements for checking for "\0"
As a side note, you can have flags be optional, but you get the empty string passed to the block, e.g.
opts.on("--foo [VAL]") do |val|
# if --foo is passed w/out args, val is empty string
end
So, you'd still need a stand-in
I am trying to see if the string s contains any of the symbols in a regex. The regex below works fine on rubular.
s = "asd#d"
s =~ /[~!##$%^&*()]+/
But in Ruby 1.9.2, it gives this error message:
syntax error, unexpected ']', expecting tCOLON2 or '[' or '.'
s = "asd#d"; s =~ /[~!##$%^&*()]/
What is wrong?
This is actually a special case of string interpolation with global and instance variables that most seem not to know about. Since string interpolation also occurs within regex in Ruby, I'll illustrate below with strings (since they provide for an easier example):
#foo = "instancefoo"
$foo = "globalfoo"
"##foo" # => "instancefoo"
"#$foo" # => "globalfoo"
Thus you need to escape the # to prevent it from being interpolated:
/[~!#\#$%^&*()]+/
The only way that I know of to create a non-interpolated regex in Ruby is from a string (note single quotes):
Regexp.new('[~!##$%^&*()]+')
I was able to replicate this behavior in 1.9.3p0. Apparently there is a problem with the '#$' combination. If you escape either it works. If you reverse them it works:
s =~ /[~!#$#%^&*()]+/
Edit: in Ruby 1.9 #$ invokes variable interpolation, even when followed by a % which is not a valid variable name.
I disagree, you need to escape the $, its the end of string character.
s =~ /[~!##\$%^&*()]/ => 3
That is correct.