Escaping in %q notation won't work in irb

Escaping in %q notation won't work in irb - ruby

Here is a sample code called test.rb:
s = %Q_abc\_def\_ghi_
puts s
s = %q_abc\_def\_ghi_
puts s
It works fine as expected:
➜ Desktop ruby test.rb
abc_def_ghi
abc_def_ghi
However, when I run it in irb, nothing happened after s = %q_abc\_def\_ghi_:
➜ Desktop irb
irb(main):001:0> s = %Q_abc\_def\_ghi_
=> "abc_def_ghi"
irb(main):002:0> puts s
abc_def_ghi
=> nil
irb(main):003:0>
irb(main):004:0* s = %q_abc\_def\_ghi_
irb(main):005:1> puts s
irb(main):006:1>
irb(main):007:1*
irb(main):008:1*
Why it won't work? And how can I escape '_' (or other delimiters) in %q notation?
My Ruby version is:
ruby -v
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-darwin15]

IRB has its own Ruby lexer/parser which it uses to try and keep track of the state of code entered so that it can do things like display different prompts depending on things like if you are in the middle of a string or defining a method or class. The code is the passed to Ruby to be evaluated “properly”.
It looks like this has a bug relating to how it handles escaping of single quoted style strings that aren’t actually using using single quotes.
Ruby itself handles the escaping just fine, so normally I don’t think this bug would actually have much affect, but in your example you happen to have used the string def right after the second _, which is a keyword that IRB also looks for.
This combination appears to put IRB into a strange state where its understanding of what is going on differs from what’s actually happening. This is the odd behaviour you are seeing.
A little playing around with a checked out version of the IRB code seems to support this. The snippet I think is to blame looks like this:
elsif ch == '\\' and #ltype == "'" #'
case ch = getc
when "\\", "\n", "'"
else
ungetc
end
Changing the when line to also look for the actual character being used:
when "\\", "\n", "'", quoted
(quoted is a parameter passed to the function) appears to fix it, and your examples all work fine with this modified version. I don’t know if that is a sufficient fix though, I don’t know the code—this is just a quick hack.
It might be worth opening a bug about this.

I'm not sure why this displays differently in your Ruby file and IRB but lowercase percent strings do not escape. See Difference between '%{}', '%Q{}', '%q{}' in ruby string delimiters
Since %q does not support escaping, there is probably some undefined behavior when you try to use different delimiters and escape characters.
This probably isn't the answer you were looking for but I think it should help a bit.

Related

Ruby: Why do I get warning "regex literal in condition" here?

A simple Ruby program, which works well (using Ruby 2.0.0):
#!/usr/bin/ruby
while gets
print if /foo/../bar/
end
However, Ruby also outputs the warning warning: regex literal in condition. It seems that Ruby considers my flip-flop-expression /foo/../bar/ as dangerous.
My question: Where lies the danger in this program? And: Can I turn off this warning (ideally only for this statement, keeping other warnings active)?
BTW, I found on the net several discussions of this kind of code, also mentioning the warning, but never found a good explanation why we get warned.

You can avoid the warning by using an explicit match:
while gets
print if ~/foo/..~/bar/
end
Regexp#~ matches against $_.
I don't know why the warning is shown (to be honest, I wasn't even aware that Ruby matches regexp literals against $_ implicitly), but according to the parser's source code, it is shown unless you provide the -e option when invoking Ruby, i.e. passing the script as an argument:
$ ruby -e "while gets; print if /foo/../bar/ end"
I would avoid using $_ as an implicit parameter and instead use something like:
while line = gets
print line if line=~/foo/..line=~/bar/
end

I think Neil Slater is right: It looks like a bug in a parser. If I change the code to
#!/usr/bin/ruby
while gets
print if $_=~/foo/..$_=~/bar/
end
the warning disappears.
I'll file a bug report.

How can I comment out my Ruby return values with something like "# =>"?

Having just started with Ruby and while following a tutorial, the result was shown as:
a + b # => 3
I have never seen such a possibility; that seems so handy! Could you please tell me what it is? is it proprietary or is it for everyone?

Josh Cheek's seeing is believing. Apparently you can run it over your code, or it can be integrated in several editors.

Reconfigure Your REPL
The # symbol is a comment in Ruby. By default, most Ruby REPLs (e.g. irb or pry) will use => to prefix the return value of your last expression.
In IRB, you can modify this prefix so that each return value is prefixed by a different string. You can do this through the IRB::Context#return_format method on your conf instance. For example:
$ irb
irb(main):001:0> conf.return_format = "#=> %s\n"
#=> "#=> %s\n"
irb(main):002:0> 1 + 2
#=> 3
More permanent changes would have to be made in your IRB configuration file by customizing the prompt through the IRB.conf[:PROMPT] Hash and then setting IRB.conf[:PROMPT_MODE] to your custom prompt, but in my opinion the solution above is simpler even though you have to run it within the current REPL session rather than saving it as a default.

Can I programmatically convert "Iâ€™d" to "I’d" using Ruby?

I can't seem to find the right combination of String#encode shenanigans.

I think I'd got confused on this one so I'll post this here to hopefully help anyone else who is similarly confused.
I was trying to do my encoding in an irb session, which gives you
irb(main):002:0> 'Iâ€™d'.force_encoding('UTF-8')
=> "Iâ€™d"
And if you try using encode instead of force_encoding then you get
irb(main):001:0> 'Iâ€™d'.encode('UTF-8')
=> "Iâ€™d"
This is with irb set to use an output and input encoding of UTF-8. In my case to convert that string the way I want it involves telling Ruby that the source string is in windows-1252 encoding. You can do this by using the -E argument in which you specify `inputencoding:outputencoding' and then you get this
$ irb -EWindows-1252:UTF-8
irb(main):001:0> 'Iâ€™d'
=> "I\xC3\xA2\xE2\x82\xAC\xE2\x84\xA2d"
That looks wrong unless you pipe it out, which gives this
$ ruby -E Windows-1252:UTF-8 -e "puts 'Iâ€™d'"
I’d
Hurrah. I'm not sure about why Ruby showed it as "I\xC3\xA2\xE2\x82\xAC\xE2\x84\xA2d" (something to do with the code page of the terminal?) so if anyone can comment with further insight that would be great.

I expect your script is using the encoding cp1251 and you have ruby >= 1.9.
Then you can use force_encoding:
#encoding: cp1251
#works also with encoding: binary
source = 'Iâ€™d'
puts source.force_encoding('utf-8') #-> I’d
If my exceptions are wrong: Which encoding do you use and which ruby version?
A little background:
Problems with encoding are difficult to analyse. There may be conflicts between:
Encoding of the source code (That's defined by the editor).
Expected encoding of the source code (that's defined with #encoding on the first line). This is used by ruby.
Encoding of the string (see e.g. section String encodings in http://nuclearsquid.com/writings/ruby-1-9-encodings/ )
Encoding of the output shell

Ruby Output Unicode Character

I'm not a Ruby dev by trade, but am using Capistrano for PHP deployments. I'm trying to cleanup the output of my script and am trying to add a unicode check mark as discussed in this blog.
The problem is if I do:
checkmark = "\u2713"
puts checkmark
It outputs "\u2713" instead of ✓
I've googled around and I just can't find anywhere that discusses this.
TLDR: How do I puts or print the unicode checkmark U-2713?
EDIT
I am running Ruby 1.8.7 on my Mac (OSX Lion) so cannot use the encode method. My shell is Bash in iTerm2.
UPDATE [4/8/2019] Added reference image in case site ever goes down.

In Ruby 1.9.x+
Use String#encode:
checkmark = "\u2713"
puts checkmark.encode('utf-8')
prints
✓
In Ruby 1.8.7
puts '\u2713'.gsub(/\\u[\da-f]{4}/i) { |m| [m[-4..-1].to_i(16)].pack('U') }
✓

In newer versions of Ruby, you don't need to enforce encoding. Here is an example with 2.1.2:
2.1.2 :002 > "\u00BD"
=> "½"
Just make sure you use double quotes!

falsetru's answer is incorrect.
checkmark = "\u2713"
puts checkmark.encode('utf-8')
This transcodes the checkmark from the current system encoding to UTF-8 encoding.
(That works only on a system whose default is already UTF-8.)
The correct answer is:
puts checkmark.force_encoding('utf-8')
This modifies the string's encoding, without modifying any character sequence.

As an additional note, if you want to print an emoji, you have to surround it with braces.
irb(main):001:0> "\u{1F600}"
=> "😀"

Same goes as above in ERB, no forced encoding required, works perfectly, tested at Ruby 2.3.0
<%= "\u00BD" %>
Much appreciation

How we can print as is in newer version of ruby
checkmark = "\u2713"
puts checkmark
This should print AS IS \u2713

Ruby's String#gsub, unicode, and non-word characters

As part of a larger series of operations, I'm trying to take tokenized chunks of a larger string and get rid of punctuation, non-word gobbledygook, etc. My initial attempt used String#gsub and the \W regexp character class, like so:
my_str = "Hello,"
processed = my_str.gsub(/\W/,'')
puts processed # => Hello
Super, super, super simple. Of course, now I'm extending my program to deal with non-Latin characters, and all heck's broken loose. Ruby's \W seems to be something like [^A-Za-z0-9_], which, of course, excludes stuff with diacritics (ü, í, etc.). So, now my formerly-simple code crashes and burns in unpleasent ways:
my_str = "Quística."
processed = my_str.gsub(/\W/,'')
puts processed # => Qustica
Notice that gsub() obligingly removed the accented "í" character. One way I've thought of to fix this would be to extend Ruby's \W whitelist to include higher Unicode code points, but there are an awful lot of them, and I know I'd miss some and cause problems down the line (and let's not even start thinking about non-Latin languages...). Another solution would be to blacklist all the stuff I want to get rid of (punctuation, $/%/&/™, etc.), but, again, there's an awful lot of that and I really don't want to start playing blacklist-whack-a-mole.
Has anybody out there found a principled solution to this problem? Is there some hidden, Unicode-friendly version of \W that I haven't discovered yet? Thanks!

You need to run ruby with the "-Ku" option to make it use UTF-8. See the documentation for command-line options. This is what happens when I do this with irb:
% irb -Ku
irb(main):001:0> my_str = "Quística."
=> "Quística."
irb(main):002:0> processed = my_str.gsub(/\W/,'')
=> "Quística"
irb(main):003:0>
You can also put it on the #! line in your ruby script:
#!/usr/bin/ruby -Ku

I would just like to add that in 1.9.1 it works by default.
$ irb
ruby-1.9.1-p243 > my_str = "Quística."
=> "Quística."
ruby-1.9.1-p243 > processed = my_str.gsub(/\W/,'')
=> "Quística"
ruby-1.9.1-p243 > processed.encoding
=> #<Encoding:UTF-8>
PS. Nothing beats rvm for trying out different versions of Ruby. DS.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Escaping in %q notation won't work in irb - ruby

Related

Ruby: Why do I get warning "regex literal in condition" here?

How can I comment out my Ruby return values with something like "# =>"?

Can I programmatically convert "Iâ€™d" to "I’d" using Ruby?

Ruby Output Unicode Character

Ruby's String#gsub, unicode, and non-word characters

Categories

Resources