Why does "?b" mean 'b' in Ruby? - ruby

"foo"[0] = ?b # "boo"
I was looking at the above example and trying to figure out:
How "?b" implies the character 'b'?
Why is it necessary? - Couldn't I just write this:
"foo"[0] = 'b' # "boo"

Ed Swangren: ? returns the character code of a
character.
Not in Ruby 1.9. As of 1.9, ?a returns 'a'. See here: Son of 10 things to be aware of in Ruby 1.9!
telemachus ~ $ ~/bin/ruby -v
ruby 1.9.1p0 (2009-01-30 revision 21907) [i686-linux]
telemachus ~ $ ~/bin/ruby -e 'char = ?a; puts char'
a
telemachus ~ $ /usr/bin/ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
telemachus ~ $ /usr/bin/ruby -e 'char = ?a; puts char'
97
Edit: A very full description of changes in Ruby 1.9.
Another edit: note that you can now use 'a'.ord if you want the string to number conversion you get in 1.8 via ?a.

The change is related to Ruby 1.9's UTF-8 updates.
The Ruby 1.8 version of ? only worked with single-byte characters. In 1.9, they updated everything to work with multi-byte characters. The trouble is, it's not clear what integer should return from ?€.
They solved it by changing what it returns. In 1.9, all of the following are single-element strings and are equivalent:
?€
'€'
"€"
"\u20AC"
?\u20AC
They should have dropped the notation, IMO, rather than (somewhat randomly) changing the behavior. It's not even officially deprecated, though.

? returns the character code of a character. Here is a relevant post on this.

In some languages (Pascal, Python), chars don't exist: they're just length-1 strings.
In other languages (C, Lisp), chars exist and have distinct syntax, like 'x' or #\x.
Ruby has mostly been on the side of "chars don't exist", but at times has seemed to not be entirely sure of this choice. If you do want chars as a data type, Ruby already assigns meaning to '' and "", so ?x seems about as reasonable as any other option for char literals.
To me, it's simply a matter of saying what you mean. You could just as well say foo[0]=98, but you're using an integer when you really mean a character. Using a string when you mean a character looks equally strange to me: the set of operations they support is almost completely different. One is a sequence of the other. You wouldn't make Math.sqrt take a list of numbers, and just happen to only look at the first one. You wouldn't omit "integer" from a language just because you already support "list of integer".
(Actually, Lisp 1.0 did just that -- Church numerals for everything! -- but performance was abysmal, so this was one of the huge advances of Lisp 1.5 that made it usable as a real language, back in 1962.)

Related

Supporting Ruby 1.9's hash syntax in Ruby 1.8

I'm writing a Ruby gem using the {key: 'value'} syntax for hashes throughout my code. My tests all pass in 1.9.x, but I (understandably) get syntax error, unexpected ':', expecting ')' in 1.8.7.
Is there a best practice for supporting the 1.8.x? Do I need to rewrite the code using our old friend =>, or is there a better strategy?
I think you're out of luck, if you want to support 1.8 then you have to use =>. As usual, I will mention that you must use => in certain cases in 1.9:
If the key is not a symbol. Remember that any object (symbols, strings, classes, floats, ...) can be a key in a Ruby Hash.
If you need a symbol that you'd quote: :'this.that'.
If you use MongoDB for pretty much anything you'll be using things like :$set => hash but $set: hash is a syntax error.
Back to our regularly scheduled programming.
Why do I say that you're out of luck? The Hash literal syntaxes (both of them) are hard-wired in the parser and I don't think you're going to have much luck patching the parser from your gem. Ruby 1.8.7's parse.y has this to say:
assoc : arg_value tASSOC arg_value
{
$$ = list_append(NEW_LIST($1), $3);
}
;
and tASSOC is => so hash literals are hard-wired to use =>. 1.9.3's says this:
assoc : arg_value tASSOC arg_value
{
/*%%%*/
$$ = list_append(NEW_LIST($1), $3);
/*%
$$ = dispatch2(assoc_new, $1, $3);
%*/
}
| tLABEL arg_value
{
/*%%%*/
$$ = list_append(NEW_LIST(NEW_LIT(ID2SYM($1))), $2);
/*%
$$ = dispatch2(assoc_new, $1, $2);
%*/
}
;
We have the fat-arrow syntax again (arg_value tASSOC arg_value) and the JavaScript style (tLABEL arg_value); AFAIK, tLABEL is also the source of the restrictions on what sorts of symbols (no :$set, no :'this.that', ...) can be used with the JavaScript-style syntax. The current trunk parse.y matches 1.9.3 for Hash literals.
So the Hash literal syntax is hard-wired into the parser and you're stuck with fat arrows if you want to support 1.8.
Ruby 1.8.7 does not support the new hash syntax.
If you desperately need hash syntax on the non-YARV c-based implementation of Ruby, there is a totally unsupported 1.8 head branch, so you can do
rvm install ruby-head --branch ruby_1_8 ; rvm ruby-head
ruby -v
ruby 1.8.8dev (2011-05-25) [i386-darwin10.7.0]
but upgrading to 1.9 is the way to go.

ruby incorrect method behavior (possible depending charset)

I got weird behavior from ruby (in irb):
irb(main):002:0> pp "    LS 600"
"\302\240\302\240\302\240\302\240LS 600"
irb(main):003:0> pp "    LS 600".strip
"\302\240\302\240\302\240\302\240LS 600"
That means (for those, who don't understand) that strip method does not affect this string at all, same with gsub('/\s+/', '')
How can I strip that string (I got it while parsing Internet page)?
The string "\302\240" is a UTF-8 encoded string (C2 A0) for Unicode code point A0, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip method removes none of these.
If you use Ruby 1.9.2, then you can solve this in the following way:
# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")
In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte. This has the advantage of getting a working strip method for free. Install ActiveSupport with gem install activesupport and then try this:
# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s

Ruby: hexadecimal in regular expressions

I need to match an md5 checksum in a regular expression in a Ruby (actually Rails) program. I found out somewhere that I can match hexadecimal strings with \h sequence, but I can't find the link anymore.
I'm using that sequence and my code is working in Ruby 1.9.2. I can make it working even under plain IRB (so it's not a Rails extension).
ruby-1.9.2-p180 :007 > "123abcdf" =~ /^\h+$/; $~
=> #<MatchData "123abcdf">
ruby-1.9.2-p180 :008 > "123abcdfg" =~ /^\h+$/; $~
=> nil
However my IDE mark that expression as wrong and I can't find any reference which cites that sequence.
Is the \h sequence legal in Ruby Regex under any environment/version or should I trust my ide and replace it with something like [abcdef\d]?
Yes it is. Check the official doc for the complete documentation for regex in Ruby.
Note that \h will match uppercase letters too, so it's actually equivalent to [a-fA-F\d]
According to this \h is part of oniguruma, which I believe is standard in ruby 1.9.

Ruby: how to check if an UTF-8 string contains only letters and numbers?

I have an UTF-8 string, which might be in any language.
How do I check, if it does not contain any non-alphanumeric characters?
I could not find such method in UnicodeUtils Ruby gem.
Examples:
ėččę91 - valid
$120D - invalid
You can use the POSIX notation for alpha-numerics:
#!/usr/bin/env ruby -w
# encoding: UTF-8
puts RUBY_VERSION
valid = "ėččę91"
invalid = "$120D"
puts valid[/[[:alnum:]]+/]
puts invalid[/[^[:alnum:]]+/]
Which outputs:
1.9.2
ėččę91
$
In ruby regex \p{L} means any letter (in any glyph)
so if s represents your string:
s.match /^[\p{L}\p{N}]+$/
This will filter out non numbers and letters.
The pattern for one alphanumeric code point is
/[\p{Alphabetic}\p{Number}]/
From there it’s easy to extrapolate something like this for has a negative:
/[^\p{Alphabetic}\p{Number}]/
or this for is all positive:
/^[\p{Alphabetic}\p{Number}]+$/
or sometimes this, depending:
/\A[\p{Alphabetic}\p{Number}]+\z/
Pick the one that best suits your needs.

Why does the ?a command in ruby 1.9.2 does not return ASCII code

When you do ?a in ruby 1.8.7 you used to get the ASCII character of 'a'
in ruby 1.9.2 this code returns
> ?a
> "a"
What is the significance of this, and what does the output in 1.9.2 mean
In Ruby 1.8 "foo"[0] returned the character code at index 0 instead of the character string at index 0. As part of support for international strings with various encodings (instead of as an array of bytes), Ruby 1.9 changed this behavior to return the single-character string at the specified index.
Along with this change, ?a was changed to evaluate as a single-character string as well. Presumably this was so that libraries with code like this...
if my_string[0] == ?a
...would continue to work. If you want character code value for the first character of a string, use String#ord:
puts "It's a boy!" if my_string[0].ord == 89
For a lot more answer, see this stackoverflow question.

Resources