Supporting Ruby 1.9's hash syntax in Ruby 1.8 - ruby

I'm writing a Ruby gem using the {key: 'value'} syntax for hashes throughout my code. My tests all pass in 1.9.x, but I (understandably) get syntax error, unexpected ':', expecting ')' in 1.8.7.
Is there a best practice for supporting the 1.8.x? Do I need to rewrite the code using our old friend =>, or is there a better strategy?

I think you're out of luck, if you want to support 1.8 then you have to use =>. As usual, I will mention that you must use => in certain cases in 1.9:
If the key is not a symbol. Remember that any object (symbols, strings, classes, floats, ...) can be a key in a Ruby Hash.
If you need a symbol that you'd quote: :'this.that'.
If you use MongoDB for pretty much anything you'll be using things like :$set => hash but $set: hash is a syntax error.
Back to our regularly scheduled programming.
Why do I say that you're out of luck? The Hash literal syntaxes (both of them) are hard-wired in the parser and I don't think you're going to have much luck patching the parser from your gem. Ruby 1.8.7's parse.y has this to say:
assoc : arg_value tASSOC arg_value
{
$$ = list_append(NEW_LIST($1), $3);
}
;
and tASSOC is => so hash literals are hard-wired to use =>. 1.9.3's says this:
assoc : arg_value tASSOC arg_value
{
/*%%%*/
$$ = list_append(NEW_LIST($1), $3);
/*%
$$ = dispatch2(assoc_new, $1, $3);
%*/
}
| tLABEL arg_value
{
/*%%%*/
$$ = list_append(NEW_LIST(NEW_LIT(ID2SYM($1))), $2);
/*%
$$ = dispatch2(assoc_new, $1, $2);
%*/
}
;
We have the fat-arrow syntax again (arg_value tASSOC arg_value) and the JavaScript style (tLABEL arg_value); AFAIK, tLABEL is also the source of the restrictions on what sorts of symbols (no :$set, no :'this.that', ...) can be used with the JavaScript-style syntax. The current trunk parse.y matches 1.9.3 for Hash literals.
So the Hash literal syntax is hard-wired into the parser and you're stuck with fat arrows if you want to support 1.8.

Ruby 1.8.7 does not support the new hash syntax.
If you desperately need hash syntax on the non-YARV c-based implementation of Ruby, there is a totally unsupported 1.8 head branch, so you can do
rvm install ruby-head --branch ruby_1_8 ; rvm ruby-head
ruby -v
ruby 1.8.8dev (2011-05-25) [i386-darwin10.7.0]
but upgrading to 1.9 is the way to go.

Related

Ruby: magic comments "frozen_string_literal: true" vs "immutable: string"

In ruby one can freeze all constant strings in a file via two different magic comments at the beginning of a file:
# frozen_string_literal: true
and
# -*- immutable: string -*-
I have no idea what the differences are.
Are there any?
The 1st syntax is the magic comment for Ruby 2.3+ versions to freeze string literals, otherwise you have to use the String method like this:
'hello world!'.freeze
The 2nd syntax is not implemented in Ruby, however it is the way that variables are specified for files in the Emacs text editor.
For example, the following comment in Emacs would declare that the file is a Ruby file and needs Ruby syntax highlighting, and that the variable immutable is set to the value string.
# -*- mode: ruby; immutable: string -*-
After searching around, it looks like that does nothing and is not used by any Ruby syntax highlighting mode.
So you do not need the 2nd syntax.
Digging for anything on the 2nd version, it looks like they had the same intention but the 2nd magic comment syntax does not to appear to have been adopted as of Ruby 2.1.0.
See https://github.com/ruby/ruby/pull/487
The first version # frozen_string_literal: true was adopted in Ruby 2.3.0
I tried the latter version in a few versions of ruby but didn't work. I would guess it should not be used or trusted to work in any version of >= 2.3 but probably no versions support it. In fact, I was not able to find any reference to that version in the open source code on github searching that syntax
https://github.com/ruby/ruby/search?q=immutable%3A+string&unscoped_q=immutable%3A+string

Regex error in Ruby 1.8.7 but not 2.0?

In Ruby 1.8.7 the following regex warning: nested repeat operator + and * was replaced with '*'.
^(\w+\.\w+)\|(\w+\.\w+)\n+*$
It does work in Ruby 2.0 though?
http://rubular.com/r/nRUSP5LNZA
A nested operator works, but is warned because it is useless. \n+* means:
Zero or more repeatition of
One or more repeatition of
\n
which is equivalent to a more simple expression \n*, which means:
Zero or more repeatition of
\n
There is no reason to use \n+*. Ruby regex engine was replaced in Ruby 1.9 and in Ruby 2.0, and if there are any differences, then it is simply that the newer engine does not check for warnings as the older one did.

ruby incorrect method behavior (possible depending charset)

I got weird behavior from ruby (in irb):
irb(main):002:0> pp "    LS 600"
"\302\240\302\240\302\240\302\240LS 600"
irb(main):003:0> pp "    LS 600".strip
"\302\240\302\240\302\240\302\240LS 600"
That means (for those, who don't understand) that strip method does not affect this string at all, same with gsub('/\s+/', '')
How can I strip that string (I got it while parsing Internet page)?
The string "\302\240" is a UTF-8 encoded string (C2 A0) for Unicode code point A0, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip method removes none of these.
If you use Ruby 1.9.2, then you can solve this in the following way:
# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")
In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte. This has the advantage of getting a working strip method for free. Install ActiveSupport with gem install activesupport and then try this:
# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s

Ruby: hexadecimal in regular expressions

I need to match an md5 checksum in a regular expression in a Ruby (actually Rails) program. I found out somewhere that I can match hexadecimal strings with \h sequence, but I can't find the link anymore.
I'm using that sequence and my code is working in Ruby 1.9.2. I can make it working even under plain IRB (so it's not a Rails extension).
ruby-1.9.2-p180 :007 > "123abcdf" =~ /^\h+$/; $~
=> #<MatchData "123abcdf">
ruby-1.9.2-p180 :008 > "123abcdfg" =~ /^\h+$/; $~
=> nil
However my IDE mark that expression as wrong and I can't find any reference which cites that sequence.
Is the \h sequence legal in Ruby Regex under any environment/version or should I trust my ide and replace it with something like [abcdef\d]?
Yes it is. Check the official doc for the complete documentation for regex in Ruby.
Note that \h will match uppercase letters too, so it's actually equivalent to [a-fA-F\d]
According to this \h is part of oniguruma, which I believe is standard in ruby 1.9.

Why does "?b" mean 'b' in Ruby?

"foo"[0] = ?b # "boo"
I was looking at the above example and trying to figure out:
How "?b" implies the character 'b'?
Why is it necessary? - Couldn't I just write this:
"foo"[0] = 'b' # "boo"
Ed Swangren: ? returns the character code of a
character.
Not in Ruby 1.9. As of 1.9, ?a returns 'a'. See here: Son of 10 things to be aware of in Ruby 1.9!
telemachus ~ $ ~/bin/ruby -v
ruby 1.9.1p0 (2009-01-30 revision 21907) [i686-linux]
telemachus ~ $ ~/bin/ruby -e 'char = ?a; puts char'
a
telemachus ~ $ /usr/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
telemachus ~ $ /usr/bin/ruby -e 'char = ?a; puts char'
97
Edit: A very full description of changes in Ruby 1.9.
Another edit: note that you can now use 'a'.ord if you want the string to number conversion you get in 1.8 via ?a.
The change is related to Ruby 1.9's UTF-8 updates.
The Ruby 1.8 version of ? only worked with single-byte characters. In 1.9, they updated everything to work with multi-byte characters. The trouble is, it's not clear what integer should return from ?€.
They solved it by changing what it returns. In 1.9, all of the following are single-element strings and are equivalent:
?€
'€'
"€"
"\u20AC"
?\u20AC
They should have dropped the notation, IMO, rather than (somewhat randomly) changing the behavior. It's not even officially deprecated, though.
? returns the character code of a character. Here is a relevant post on this.
In some languages (Pascal, Python), chars don't exist: they're just length-1 strings.
In other languages (C, Lisp), chars exist and have distinct syntax, like 'x' or #\x.
Ruby has mostly been on the side of "chars don't exist", but at times has seemed to not be entirely sure of this choice. If you do want chars as a data type, Ruby already assigns meaning to '' and "", so ?x seems about as reasonable as any other option for char literals.
To me, it's simply a matter of saying what you mean. You could just as well say foo[0]=98, but you're using an integer when you really mean a character. Using a string when you mean a character looks equally strange to me: the set of operations they support is almost completely different. One is a sequence of the other. You wouldn't make Math.sqrt take a list of numbers, and just happen to only look at the first one. You wouldn't omit "integer" from a language just because you already support "list of integer".
(Actually, Lisp 1.0 did just that -- Church numerals for everything! -- but performance was abysmal, so this was one of the huge advances of Lisp 1.5 that made it usable as a real language, back in 1962.)

Resources