Ruby 1.8.7 vs 1.9* String[Fixnum] differences - ruby

Ruby 1.8.7:
"abc"[0]
=> 65
Ruby 1.9*
"abc"[0]
=> "a"
Is there a way I can safely write the code above to produce the second result in both 1.8.7 and 1.9*? My solution so far is: "abc".split('').first but that doesn't seem very clever.

"abc"[0].chr
produces the 2nd result in both versions.
1.8: http://ruby-doc.org/core-1.8.7/Integer.html#method-i-chr
1.9: http://ruby-doc.org/core-1.9.3/String.html#method-i-chr

If you want the first character of a string, as a string, then add a length in the brackets:
"abc"[0,1]

Note that in 1.8, most of these answers will only work for characters in the ASCII range:
irb(main):001:0> "ā"[0].chr
=> "\304"
irb(main):002:0> "ā"[0,1]
=> "\304"
irb(main):003:0> "ā"[0..0]
=> "\304"
Though of course it depends on your encoding.

What about
"abc"[0].ord
?
http://ruby-doc.org/core-1.9.3/String.html#method-i-ord

Related

how to substring input with various languages in ruby?

Given a string, it may contain english or japanese(wide chars) or other languages
How can I get the first char / substrings of this string?
ex: "Give" => "G"
"日本" => "日"
Thanks!
This is built in to ruby so long as you have the correct encoding set on your string:
$ ruby -ve 'p "日本".encoding, "日本"[0]'
ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-darwin11.3.0]
#<Encoding:UTF-8>
"日"
There is no need to use mb_chars nor ActiveRecord.
You can use ActiveSupport's Chars class
string = "日本"
string.mb_chars[0]
=> "日"
If you have 'ActiveRecord', you can use mb_chars.
Or you can use the standard library:
str = '日本'
str.codepoints.take(1)
#日
'codepoint' gives an enumerator through the string's actual encodings and 'take' will take any amount of chars you want. Or you can use
str.codepoints.to_a[0]
It will convert the string's encodings to an array. It is good for short strings but not good for big ones.

How to get Ruby 1.9 regexp supports \p{Nonspacing_Mark}?

Isn't the diacritical mark above "a" should be removed by the Regex?
"hǎo".gsub(/\p{Nonspacing_Mark}/, '')
=> "hǎo"
"hǎo".gsub(/\p{Mn}/, '')
=> "hǎo"
Update:
I kind of get it from how it works in Java.
Normalizer.normalize("hǎo", Form.NFD).replaceAll("\\p{Mn}+", "")
I need to normalizer it first to split the "ǎ" into "a" and the diacritical mark.
puts UnicodeUtils.nfkd("ﻺ (hǎo)").gsub(/[\p{Nonspacing_Mark}]/, '')
See How to replace the Unicode gem on Ruby 1.9?

How do I remove a non-breaking space in Ruby

I have a string that looks like this:
d = "foo\u00A0\bar"
When I check the length, it says that it is 7 characters long. I checked online and found out that it is a non-breaking space. Could someone show me how to remove all the non-breaking spaces in a string?
In case you do not care about the non-breaking space specifically, but about any "special" unicode whitespace character that might appear in your string, you can replace it using the POSIX bracket expression for whitespace:
s.gsub(/[[:space:]]/, '')
These bracket expressions (as opposed to matchers like \s) do not only match ASCII characters, but all unicode characters of a class.
For more details see the ruby documentation
irb(main):001:0> d = "foo\u00A0\bar"
=> "foo \bar"
irb(main):002:0> d.gsub("\u00A0", "")
=> "foo\bar"
It's an old thread but maybe it helps somebody.
I found myself looking for a solution to the same problem when I discovered that strip doesn't do the job. I checked with method ord what the character was and used chr to represent it in gsub
2.2.3 :010 > 160.chr("UTF-8")
=> " "
2.2.3 :011 > 160.chr("UTF-8").strip
=> " "
2.2.3 :012 > nbsp = 160.chr("UTF-8")
=> " "
2.2.3 :013 > nbsp.gsub(160.chr("UTF-8"),"")
=> ""
I couldn't understand why strip doesn't remove something that looked like a space to me so I checked here what ASCII 160 actually is.
d.gsub("\u00A0", "") does not work in Ruby 1.8. Instead use d.gsub(/\302\240/,"")
See http://blog.grayproductions.net/articles/understanding_m17n for lots more on the character encoding differences between 1.8 and 1.9.

Beginner Ruby question - what does the ${...} notation in String literals do?

I'm reading some Ruby code and I don't understand this snippet:
thing = '${other-thing}/etc/'
It appears to substitute a value for the ${other-thing} and use that to build the String thing but I haven't been able to recreate this myself.
EDIT: Sorry to all, it turns out there was some preprocessing going on by Maven (a Java build tool). The accepted answer shows how one could do the substitution in straight Ruby.
$ irb
irb(main):001:0> a = "Hello"
=> "Hello"
irb(main):002:0> b = "world"
=> "world"
irb(main):003:0> puts "${a}, ${b}!" # Doesn't work.
${a}, ${b}!
=> nil
irb(main):004:0> puts "#{a}, #{b}!" # Works fine.
Hello, world!
=> nil
irb(main):005:0> puts '#{a}, #{b}!' # Doesn't work.
#{a}, #{b}!
=> nil
You wanted #{...}, not ${...} I believe. Also, you don't get substitutions inside of single-quoted strings, only double-quoted (or equivalents – there's dozens of ways to delimit strings in Ruby).

How to extract a single character (as a string) from a larger string in Ruby?

What is the Ruby idiomatic way for retrieving a single character from a string as a one-character string? There is the str[n] method of course, but (as of Ruby 1.8) it returns a character code as a fixnum, not a string. How do you get to a single-character string?
In Ruby 1.9, it's easy. In Ruby 1.9, Strings are encoding-aware sequences of characters, so you can just index into it and you will get a single-character string out of it:
'µsec'[0] => 'µ'
However, in Ruby 1.8, Strings are sequences of bytes and thus completely unaware of the encoding. If you index into a string and that string uses a multibyte encoding, you risk indexing right into the middle of a multibyte character (in this example, the 'µ' is encoded in UTF-8):
'µsec'[0] # => 194
'µsec'[0].chr # => Garbage
'µsec'[0,1] # => Garbage
However, Regexps and some specialized string methods support at least a small subset of popular encodings, among them some Japanese encodings (e.g. Shift-JIS) and (in this example) UTF-8:
'µsec'.split('')[0] # => 'µ'
'µsec'.split(//u)[0] # => 'µ'
Before Ruby 1.9:
'Hello'[1].chr # => "e"
Ruby 1.9+:
'Hello'[1] # => "e"
A lot has changed in Ruby 1.9 including string semantics.
Should work for Ruby before and after 1.9:
'Hello'[2,1] # => "l"
Please see Jörg Mittag's comment: this is correct only for single-byte character sets.
'abc'[1..1] # => "b"
'abc'[1].chr # => "b"

Resources