Pulling Ruby version from regular expression - ruby

I am trying to return the version of Ruby (such as 2.1.0) from a regular expression. Here's the string as it should be evaluated by a regular expression:
ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]\n
How would I go about extracting 2.1.0 from this? It seems to me that the best way to do this would be to extract the numbers around two periods, but no spaces or characters around them. So basically, it would pull just 2.1.0 instead of anything else.
Any ideas?

How about:
str = "ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]\n"
str[/[\d.]+/] # => "2.1.0"
[\d.]+ means "find a string of characters that are digits or '.'.
str[/[\d.]+/] will find the first such string that matches and return it. See String#[] for more information.
The question is, do all versions and Ruby interpreters return their version information consistently? If your code could end up running on something besides the stock Ruby you might have a problem if the -v output changes in a way that puts the version farther into the string and something else matches first.
TinMan, I think you need a more rigorous regex; e.g., "1..0"[/[\d.]+/] => "1..0", "2.0.0.1."[/[\d.]+/] => "2.0.0.1.", "2.0.0.0.1"[/[\d.]+/] => "2.0.0.0.1"
Ruby uses a similar style to Semantic Versioning, so the actual format of the string shouldn't vary, allowing a simple pattern. Where the version number occurs might not be defined though.
IF it went crazy, something like /[\d.]{3,5}/ should herd things back into some semblance of order, and normalize the returned value:
[
'foo 1.0 bar',
'foo 1.1.1 bar',
'foo 1.1.1.1 bar'
].map{ |s| s[/[\d.]{3,5}/] }
# => ["1.0", "1.1.1", "1.1.1"]
If you're trying to do this with running code, why not use the predefined constant RUBY_VERSION:
RUBY_VERSION # => "2.1.2"
Version numbers are notoriously difficult to grab, because there are so many different ways that people use to format them. Over the last several years we've seen some attempts to create some order and commonality.

Edit: I misread the question. I assumed the given string might be embedded in other text, but on re-reading I see that evidently is not the case. The regex given by #theTinMan is sufficient and preferred.tidE
This is one way:
str = "ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]\n"
str[/[Rr]uby\s+(\d\.\d\.\d)/,1]
#=> "2.1.0"
This could instead be written:
str[/[Rr]uby\s+(\d(\.\d){2})/,1]
If matching "Ruby 2.1" or Ruby "2" were desired, one could use
str[/[Rr]uby\s+(\d(\.\d){,2})/,1] # match "2", "2.1" or "2.1.1"
or
str[/[Rr]uby\s+(\d(\.\d){1,2})/,1] # "2.1" or "2.1.1", but not "2"

Just Inspect RUBY_VERSION
Rather than parsing the output of whatever you're trying to parse, just inspect the RUBY_VERSION constant. On any recent Ruby, you should get output similar to the following in a REPL like irb or pry:
RUBY_VERSION
# => "2.1.0"
Or ask Ruby on the command line with:
$ ruby -e 'puts RUBY_VERSION'
2.1.0

Try this:
str = "ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-darwin12.0]\n"
pieces = str.split(" ", 3)
version, patch_num = pieces[1].split('p')
puts version
--output:--
2.1.0

Related

Escaping in %q notation won't work in irb

Here is a sample code called test.rb:
s = %Q_abc\_def\_ghi_
puts s
s = %q_abc\_def\_ghi_
puts s
It works fine as expected:
➜ Desktop ruby test.rb
abc_def_ghi
abc_def_ghi
However, when I run it in irb, nothing happened after s = %q_abc\_def\_ghi_:
➜ Desktop irb
irb(main):001:0> s = %Q_abc\_def\_ghi_
=> "abc_def_ghi"
irb(main):002:0> puts s
abc_def_ghi
=> nil
irb(main):003:0>
irb(main):004:0* s = %q_abc\_def\_ghi_
irb(main):005:1> puts s
irb(main):006:1>
irb(main):007:1*
irb(main):008:1*
Why it won't work? And how can I escape '_' (or other delimiters) in %q notation?
My Ruby version is:
ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]
IRB has its own Ruby lexer/parser which it uses to try and keep track of the state of code entered so that it can do things like display different prompts depending on things like if you are in the middle of a string or defining a method or class. The code is the passed to Ruby to be evaluated “properly”.
It looks like this has a bug relating to how it handles escaping of single quoted style strings that aren’t actually using using single quotes.
Ruby itself handles the escaping just fine, so normally I don’t think this bug would actually have much affect, but in your example you happen to have used the string def right after the second _, which is a keyword that IRB also looks for.
This combination appears to put IRB into a strange state where its understanding of what is going on differs from what’s actually happening. This is the odd behaviour you are seeing.
A little playing around with a checked out version of the IRB code seems to support this. The snippet I think is to blame looks like this:
elsif ch == '\\' and #ltype == "'" #'
case ch = getc
when "\\", "\n", "'"
else
ungetc
end
Changing the when line to also look for the actual character being used:
when "\\", "\n", "'", quoted
(quoted is a parameter passed to the function) appears to fix it, and your examples all work fine with this modified version. I don’t know if that is a sufficient fix though, I don’t know the code—this is just a quick hack.
It might be worth opening a bug about this.
I'm not sure why this displays differently in your Ruby file and IRB but lowercase percent strings do not escape. See Difference between '%{}', '%Q{}', '%q{}' in ruby string delimiters
Since %q does not support escaping, there is probably some undefined behavior when you try to use different delimiters and escape characters.
This probably isn't the answer you were looking for but I think it should help a bit.

ruby incorrect method behavior (possible depending charset)

I got weird behavior from ruby (in irb):
irb(main):002:0> pp "    LS 600"
"\302\240\302\240\302\240\302\240LS 600"
irb(main):003:0> pp "    LS 600".strip
"\302\240\302\240\302\240\302\240LS 600"
That means (for those, who don't understand) that strip method does not affect this string at all, same with gsub('/\s+/', '')
How can I strip that string (I got it while parsing Internet page)?
The string "\302\240" is a UTF-8 encoded string (C2 A0) for Unicode code point A0, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip method removes none of these.
If you use Ruby 1.9.2, then you can solve this in the following way:
# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")
In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte. This has the advantage of getting a working strip method for free. Install ActiveSupport with gem install activesupport and then try this:
# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s

Ruby: hexadecimal in regular expressions

I need to match an md5 checksum in a regular expression in a Ruby (actually Rails) program. I found out somewhere that I can match hexadecimal strings with \h sequence, but I can't find the link anymore.
I'm using that sequence and my code is working in Ruby 1.9.2. I can make it working even under plain IRB (so it's not a Rails extension).
ruby-1.9.2-p180 :007 > "123abcdf" =~ /^\h+$/; $~
=> #<MatchData "123abcdf">
ruby-1.9.2-p180 :008 > "123abcdfg" =~ /^\h+$/; $~
=> nil
However my IDE mark that expression as wrong and I can't find any reference which cites that sequence.
Is the \h sequence legal in Ruby Regex under any environment/version or should I trust my ide and replace it with something like [abcdef\d]?
Yes it is. Check the official doc for the complete documentation for regex in Ruby.
Note that \h will match uppercase letters too, so it's actually equivalent to [a-fA-F\d]
According to this \h is part of oniguruma, which I believe is standard in ruby 1.9.

Have a ruby script output what version of ruby is running it

How do I have my ruby script output what version of ruby is running it?
The RUBY_VERSION constant contains the version number of the ruby interpreter and RUBY_PATCHLEVEL contains the patchlevel, so this:
puts RUBY_VERSION
outputs e.g. 2.2.3, while this:
puts RUBY_PATCHLEVEL
outputs e.g. 173. Together it can be used like this:
ruby -e 'print "ruby #{ RUBY_VERSION }p#{ RUBY_PATCHLEVEL }"'
to output e.g. ruby 2.2.3p173
For reference, here's how variables and constants work, along with a list of Ruby's built-in variables and constants: Ruby Programming/Syntax/Variables and Constants and Pre-defined Variables.
You can get a list of all the global constants here, including RUBY_VERSION and friends, in the official Ruby language documentation.
For the bonus round, this will tell you some more useful info about your Ruby environment using RbConfig:
require 'rbconfig'
puts Config::CONFIG.sort_by{ |n,v| n.downcase }.map{ |n,v| "#{n} => '#{v}'" }

Why does "?b" mean 'b' in Ruby?

"foo"[0] = ?b # "boo"
I was looking at the above example and trying to figure out:
How "?b" implies the character 'b'?
Why is it necessary? - Couldn't I just write this:
"foo"[0] = 'b' # "boo"
Ed Swangren: ? returns the character code of a
character.
Not in Ruby 1.9. As of 1.9, ?a returns 'a'. See here: Son of 10 things to be aware of in Ruby 1.9!
telemachus ~ $ ~/bin/ruby -v
ruby 1.9.1p0 (2009-01-30 revision 21907) [i686-linux]
telemachus ~ $ ~/bin/ruby -e 'char = ?a; puts char'
a
telemachus ~ $ /usr/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
telemachus ~ $ /usr/bin/ruby -e 'char = ?a; puts char'
97
Edit: A very full description of changes in Ruby 1.9.
Another edit: note that you can now use 'a'.ord if you want the string to number conversion you get in 1.8 via ?a.
The change is related to Ruby 1.9's UTF-8 updates.
The Ruby 1.8 version of ? only worked with single-byte characters. In 1.9, they updated everything to work with multi-byte characters. The trouble is, it's not clear what integer should return from ?€.
They solved it by changing what it returns. In 1.9, all of the following are single-element strings and are equivalent:
?€
'€'
"€"
"\u20AC"
?\u20AC
They should have dropped the notation, IMO, rather than (somewhat randomly) changing the behavior. It's not even officially deprecated, though.
? returns the character code of a character. Here is a relevant post on this.
In some languages (Pascal, Python), chars don't exist: they're just length-1 strings.
In other languages (C, Lisp), chars exist and have distinct syntax, like 'x' or #\x.
Ruby has mostly been on the side of "chars don't exist", but at times has seemed to not be entirely sure of this choice. If you do want chars as a data type, Ruby already assigns meaning to '' and "", so ?x seems about as reasonable as any other option for char literals.
To me, it's simply a matter of saying what you mean. You could just as well say foo[0]=98, but you're using an integer when you really mean a character. Using a string when you mean a character looks equally strange to me: the set of operations they support is almost completely different. One is a sequence of the other. You wouldn't make Math.sqrt take a list of numbers, and just happen to only look at the first one. You wouldn't omit "integer" from a language just because you already support "list of integer".
(Actually, Lisp 1.0 did just that -- Church numerals for everything! -- but performance was abysmal, so this was one of the huge advances of Lisp 1.5 that made it usable as a real language, back in 1962.)

Resources