Confusion with the output of %r/..../option.to_s - ruby

I was playing to see how Regex#to_s disable options with the pattern %r/../. But getting confused with the output of such Regex#to_s :
irb(main):005:0> %r/ab+c/x.to_s
=> "(?x-mi:ab+c)" #why here -m option has been disabled?
irb(main):006:0> %r/ab+c/i.to_s
=> "(?i-mx:ab+c)" #why here -m option has been disabled?
irb(main):007:0> %r/ab+c/m.to_s
=> "(?m-ix:ab+c)" #why here -i option has been disabled?
irb(main):008:0> %r/ab+c/o.to_s
=> "(?-mix:ab+c)" #why here o option not get into the (...) as the above?
irb(main):009:0> %r/ab+c/.to_s
=> "(?-mix:ab+c)" #why always m,i,x option get into the (...) as the above?
Could anyone help me here to understand the logic on which based the option are getting on/off?
How do the Regex#hash and Regex#quote methods work in Ruby 1.9.3 (any small code to understand the same)?

I think your understanding is inverted; the options on the left of the dash are on, while the options to the right of the dash are off.
/ab+c/x => "x-mi"
/ab+c/i => "i-mx"
/ab+c/m => "m-ix"
Each of the three options appears in each regex string, but their presence to the left or right of the dash indicates whether the option is on or off.
Regarding your second question, Regexp#hash is simply a method that generates the same Fixnum for a given Regexp. This allows you to compare two different Regexp options for effective equality. See Object#hash for more details.

Related

How can I comment out my Ruby return values with something like "# =>"?

Having just started with Ruby and while following a tutorial, the result was shown as:
a + b # => 3
I have never seen such a possibility; that seems so handy! Could you please tell me what it is? is it proprietary or is it for everyone?
Josh Cheek's seeing is believing. Apparently you can run it over your code, or it can be integrated in several editors.
Reconfigure Your REPL
The # symbol is a comment in Ruby. By default, most Ruby REPLs (e.g. irb or pry) will use => to prefix the return value of your last expression.
In IRB, you can modify this prefix so that each return value is prefixed by a different string. You can do this through the IRB::Context#return_format method on your conf instance. For example:
$ irb
irb(main):001:0> conf.return_format = "#=> %s\n"
#=> "#=> %s\n"
irb(main):002:0> 1 + 2
#=> 3
More permanent changes would have to be made in your IRB configuration file by customizing the prompt through the IRB.conf[:PROMPT] Hash and then setting IRB.conf[:PROMPT_MODE] to your custom prompt, but in my opinion the solution above is simpler even though you have to run it within the current REPL session rather than saving it as a default.

Documentation for Psych to_yaml options?

Ruby 1.9.3 defaults to using Psych for YAML. While the ruby-doc documentation for it is completely lacking, I was able to find one external piece of documentation that hinted that the indentation option is supported. This was borne out in testing:
irb(main):001:0> RUBY_VERSION
#=> "1.9.3"
irb(main):002:0> require 'yaml'
#=> true
irb(main):003:0> [[[1]]].to_yaml
#=> "---\n- - - 1\n"
irb(main):009:0> [[[1]]].to_yaml indentation:9
#=> "---\n- - - 1\n"
There are presumably more options supported. Specifically, I want to know how to change the line wrap width or disable it altogether.
What are the options available?
Deep in the guts of ruby-1.9.3-p125/ext/psych/emitter.c I found three options:
indentation - The level must be less than 10 and greater than 1.
line_width - Set the preferred line width.
canonical - Set the output style to canonical, or not (true/false).
And they work!
When you want to disable line wrap, use this option:
line_width: -1

Ruby: hexadecimal in regular expressions

I need to match an md5 checksum in a regular expression in a Ruby (actually Rails) program. I found out somewhere that I can match hexadecimal strings with \h sequence, but I can't find the link anymore.
I'm using that sequence and my code is working in Ruby 1.9.2. I can make it working even under plain IRB (so it's not a Rails extension).
ruby-1.9.2-p180 :007 > "123abcdf" =~ /^\h+$/; $~
=> #<MatchData "123abcdf">
ruby-1.9.2-p180 :008 > "123abcdfg" =~ /^\h+$/; $~
=> nil
However my IDE mark that expression as wrong and I can't find any reference which cites that sequence.
Is the \h sequence legal in Ruby Regex under any environment/version or should I trust my ide and replace it with something like [abcdef\d]?
Yes it is. Check the official doc for the complete documentation for regex in Ruby.
Note that \h will match uppercase letters too, so it's actually equivalent to [a-fA-F\d]
According to this \h is part of oniguruma, which I believe is standard in ruby 1.9.

Working around unexpected behavior in yaml for Ruby -- interned unicode strings

(1.9 on Windows)
Reproducing:
require 'yaml'
s = YAML::load("\xEC\x86\x8C\xEB\x85\x80\xEC\x8B\x9C\xEB\x8C\x80")
# => "∞åîδàÇ∞ï£δîÇ" or "소녀시대", depending on your terminal's unicode support
s_interned = s.intern
s_interned.class # => Symbol
s_yamld = s_interned.to_yaml
# => "--- \":\\xEC\\x86\\x8C\\xEB\\x85\\x80\\xEC\\x8B\\x9C\\xEB\\x8C\\x80\"\n"
unyamld = YAML::load(s_yamld)
# => ":∞åîδàÇ∞ï£δîÇ" or ":소녀시대"
unyamld.class # => String
# => expected: Symbol
And once again:
YAML::load(s_interned.to_yaml).class # => String
Here's how a "normal" symbol behaves:
YAML::load(:foo.to_yaml).class # => Symbol
Normal symbols behave fine, but symbols with unicode characters don't seem to. They get interpreted as strings with a colon as their first character.
I'm pretty sure this script was working last night. But I woke up this morning and everything is gone wrong.
Does anyone know how I can resolve this or get around this?
I've tried using some clever regular expression/sub hacks to get around this and "reconvert", but they've all proven inelegant or have made the situation worse.
I'm new to 1.9 as well but it seems you have to add the encoding to the top of the file sometimes. Something like:
# encoding: utf-8
Again... no idea when or why. Still have to learn how it works in 1.9. I found some more background information here: "Ruby 1.9 Common Problems Pt. 1: Encoding".

Ruby's String#gsub, unicode, and non-word characters

As part of a larger series of operations, I'm trying to take tokenized chunks of a larger string and get rid of punctuation, non-word gobbledygook, etc. My initial attempt used String#gsub and the \W regexp character class, like so:
my_str = "Hello,"
processed = my_str.gsub(/\W/,'')
puts processed # => Hello
Super, super, super simple. Of course, now I'm extending my program to deal with non-Latin characters, and all heck's broken loose. Ruby's \W seems to be something like [^A-Za-z0-9_], which, of course, excludes stuff with diacritics (ü, í, etc.). So, now my formerly-simple code crashes and burns in unpleasent ways:
my_str = "Quística."
processed = my_str.gsub(/\W/,'')
puts processed # => Qustica
Notice that gsub() obligingly removed the accented "í" character. One way I've thought of to fix this would be to extend Ruby's \W whitelist to include higher Unicode code points, but there are an awful lot of them, and I know I'd miss some and cause problems down the line (and let's not even start thinking about non-Latin languages...). Another solution would be to blacklist all the stuff I want to get rid of (punctuation, $/%/&/™, etc.), but, again, there's an awful lot of that and I really don't want to start playing blacklist-whack-a-mole.
Has anybody out there found a principled solution to this problem? Is there some hidden, Unicode-friendly version of \W that I haven't discovered yet? Thanks!
You need to run ruby with the "-Ku" option to make it use UTF-8. See the documentation for command-line options. This is what happens when I do this with irb:
% irb -Ku
irb(main):001:0> my_str = "Quística."
=> "Quística."
irb(main):002:0> processed = my_str.gsub(/\W/,'')
=> "Quística"
irb(main):003:0>
You can also put it on the #! line in your ruby script:
#!/usr/bin/ruby -Ku
I would just like to add that in 1.9.1 it works by default.
$ irb
ruby-1.9.1-p243 > my_str = "Quística."
=> "Quística."
ruby-1.9.1-p243 > processed = my_str.gsub(/\W/,'')
=> "Quística"
ruby-1.9.1-p243 > processed.encoding
=> #<Encoding:UTF-8>
PS. Nothing beats rvm for trying out different versions of Ruby. DS.

Resources