Getting ASCII value of ith character instead of character in Ruby - ruby

I'm a beginner with Ruby and I'm trying to get a certain character in a string in Ruby like so:
string1 = "ohhideraaaa"
puts string1[0]
and it's returning 111 rather than "o". I'm sure I'm doing something really stupid, does anyone have any idea what it is?

I think the best fix is to upgrade to Ruby 1.9.3, the current stable release; you are apparently using 1.8.x, where an expression like yours returns the code of the character at that position, but in 1.9.x, it returns a substring of one character at that position, which is what you want.
If upgrading is not an option, or if you would prefer to stick with Ruby 1.8.x, you can persuade it to give you a substring rather than a character-code by specifying a length as well (in your case, 1):
puts string1[0,1]

Use .chr:
puts string1[0].chr
The .chr method converts an ASCII integer value back to a character.

Related

printing the first letter of a variable returns a number?

I'm fairly new to Ruby and was mucking around with the basics and came across a problem.
Which was that when I tried to print the first letter of a variable it printed a number instead.
the code was.
name = "Max"
print name[0]
but instead of printing the letter M, it would print 77?
could someone please tell me what I did wrong?
The behaviour of this operator is different across various versions of Ruby. You're probably using an older one, in which case this is to be expected.
Here's an excerpt from the docs for Ruby 1.8.7's String class
If passed a single Fixnum, returns the code of the character at that
position.
This has been changed and the newer versions of Ruby (1.9.x and above, according to this site ) simply print the character as a String. See the docs for Ruby 2.1.0.
If passed a single index, returns a substring of one character at that
index.
Ruby 1.9.3, which I happen to have installed on the machine I'm using displays exactly the same behavior:
"Mwada"[0]
=> "M"
"Mwada"[0].class
=> String
If it's ruby 1.8.x, run #chr on a number representing the character:
"Mwada"[0].chr # => "M"
If it's ruby 1.9.x and above, everything will work as you would expect it to:
"Mwada"[0] # => "M"
Humm, I ran your question through irb (interactive ruby console) and got 'M' when looking for name[0]. You can open irb by simply typing irb from command line and test this for yourself.
irb > name = "Max"
=> "Max"
irb > print name[0]
M => nil
Can you tell me more about the context in which you requested name[0]? Could name have been reassigned to something else? Are you calling .to_i (convert to integer) anywhere in your code?
I just checked this issue from a Ruby book that includes information on Ruby 1.8 and Ruby 1.9. The book is called The Ruby Programming Language Book by David Flanagan & Yukihiro Matsumoto.
Well the book says: "In Ruby 1.8, a string is like an array of bytes or 8-bit character codes
s = 'hello' # Ruby 1.8
s[0] # 104: the ASCII character code for the first character 'h'
Ruby 1.9 returns single-character strings rather than character
s = 'hello' # Ruby 1.9
s[0] # 'h': the first character of the string, as a string
(please note some text was left out in the quote above)
In relation to your question directly. I also tested String.bytes.to_a method on
'Max', in my Ruby 1.9 environment.
print name.bytes.to_a
[77, 97, 120] => nil
and it printed the ASCII codes for 'Max', 77 is the ASCII code for 'M'
ASCII Codes
I am quite new Ruby programmer as well. I am also learning Ruby so I have found the book above worthwhile, although I have so far managed to read only the first 70 pages or so, I'll definitely try to finish the book :-)

Split utf8 string regardless of ruby version

str = "é-du-Marché"
I get the first char via
str.split(//).first
How I can get the rest of the string regardless of my ruby version ?
String does not have a method first. So you need in addition a split. When you do the split in unicode-mode (exactly utf-8) you have acces to the first (and other characters).
My solution:
puts RUBY_VERSION
str = "é-du-Marché"
p str.split(//u, 2)
Test with ruby 1.9.2:
1.9.2
["\u00E9", "-du-March\u00E9"]
Test with ruby 1.8.6:
1.8.6
["\303\251", "-du-March\303\251"]
With first and last you get your results:
str.split(//u, 2).first is the first character
str.split(//u, 2).last is the string after the first character.
str[1..-1] should return you everything after the first digit normally.
The first number is the starting index, which is set to 1 to skip the first digit, the second is the length, which is set to -1, so ruby counts from the back
Note: that multibyte characters only work in Ruby 1.9. If you wish to mimic this behavior downwards, you'll have to loop over the bytes yourself and figure out what needs to be removed from the data, cause Ruby 1.8 does not support this.
UPDATE:
You could try this as well, but I can't guarantee that it will work for every multibyte char:
str = "é-du-Marché"
substring = str.mb_chars[1..-1]
the mb_chars is a proxy class that directs the call to the appropiate implementation when dealing with UTF-8, UTF-32 or UTF-16 encoding of characters (e.g. multibyte chars).
More detailed info can be found here : http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
But I do not know if this exists in older rails versions
UPDATE2:
Ruby 1.8 treats any string just as a bunch of bytes, calling size() on it will return the amount of bytes that is used to store the data. To determine the characters regardless of the encoding try this:
char_array = str.scan(/./m)
substring = char_array[1..-1].join
This should do the trick normally. Try looking at http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 who explains how to treat multibyte data in older ruby versions.
EDIT3:
Playing around with the scan & join operations brings me closer to your problem & solution. I honestly don't have the time at to get the full solution working but if you play with the scan(/./mu) options you convert it to utf-8, which is supported by all ruby versions.

Is this the best way to unescape unicode escape sequences in Ruby?

I have some text that contains Unicode escape sequences like \u003C. This is what I came up with to unescape it:
string.gsub(/\u(....)/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
Is it correct? (i.e. it seems to work with my tests, but can someone more knowledgeable find a problem with it?)
Your regex, /\u(....)/, has some problems.
First of all, \u doesn't work the way you think it does, in 1.9 you'll get an error and in 1.8 it will just match a single u rather than the \u pair that you're looking for; you should use /\\u/ to find the literal \u that you want.
Secondly, your (....) group is much too permissive, that will allow any four characters through and that's not what you want. In 1.9, you want (\h{4}) (four hexadecimal digits) but in 1.8 you'd need ([\da-fA-F]{4}) as \h is a new thing.
So if you want your regex to work in both 1.8 and 1.9, you should use /\\u([\da-fA-F]{4})/. This gives you the following in 1.8 and 1.9:
>> s = 'Where is \u03bc pancakes \u03BD house? And u1123!'
=> "Where is \\u03bc pancakes \\u03BD house? And u1123!"
>> s.gsub(/\\u([\da-fA-F]{4})/) {|m| [$1].pack("H*").unpack("n*").pack("U*")}
=> "Where is μ pancakes ν house? And u1123!"
Using pack and unpack to mangle the hex number into a Unicode character is probably good enough but there may be better ways.

Ruby hexacode to unicode conversion

I crawled a website which contains unicode, an the results look something like, if in code
a = "\\u2665 \\uc624 \\ube60! \\uc8fd \\uae30 \\uc804 \\uc5d0"
May I know how do I do it in Ruby to convert it back to the original Unicode text which is in UTF-8 format?
If you have ruby 1.9, you can try:
a.force_encoding('UTF-8')
Otherwise if you have < 1.9, I'd suggest reading this article on converting to UTF-8 in Ruby 1.8.
short answer: you should be able to 'puts a', and see the string printed out. for me, at least, I can print out that string in both 1.8.7 and 1.9.2
long answer:
First thing: it depends on if you're using ruby 1.8.7, or 1.9.2, since the way strings and encodings were handled changed.
in 1.8.7:
strings are just lists of bytes. when you print them out, if your OS can handle it, you can just 'puts a' and it should work correctly. if you do a[0], you'll get the first byte. if you want to get each character, things are pretty darn tricky.
in 1.9.2
strings are lists of bytes, with an encoding. If the webpage was sent with the correct encoding, your string should already be encoded correctly. if not, you'll have to set it (as per Mike Lewis's answer). if you do a[0], you'll get the first character (the heart). if you want each byte, you can do a.bytes.
If your OS, for whatever reason, is giving you those literal ascii characters,my previous answer is obviously invalid, disregard it. :P
here's what you can do:
a.gsub(/\\u([a-z0-9]+)/){|p| [$1.to_i(16)].pack("U")}
this will scan for the ascii string '\u' followed by a hexadecimal number, and replace it with the correct unicode character.
You can also specify the encoding when you open a new IO object: http://www.ruby-doc.org/core/classes/IO.html#M000889
Compared to Mike's solution, this may prevent troubles if you forget to force the encoding before exposing the string to the rest of your application, if there are multiple mechanisms for retrieving strings from your module or class. However, if you begin crawling SJIS or KOI-8 encoded websites, then Mike's solution will be easier to adapt for the character encoding name returned by the web server in its headers.

Getting an ASCII character code in Ruby using `?` (question mark) fails

I'm in a situation where I need the ASCII value of a character (for Project Euler question #22, if you want to get specific) and I'm running into an issue.
Being new to ruby, I googled it, and found that ? was the way to go: ?A or whatever. But when I incorporate it into my code, the result of that statement is the string "A"—no character code. Same issue with [0] and slice(0), both of which should theoretically return the ASCII code.
The only thing I can think of is that this is a ruby version issue. I'm using 1.9.1-p0, having upgraded from 1.8.6 this afternoon. I cheated a little going from a working version of Ruby, in the same directory, I figured I probably already had the files that don't come bundled with the .zip file, so I didn't download them.
So why exactly are all my ASCII codes being turned into actual characters?
Ruby before 1.9 treated characters somewhat inconsistently. ?a and "a"[0] would return an integer representing the character's ASCII value (which was usually not the behavior people were looking for), but in practical use characters would normally be represented by a one-character string. In Ruby 1.9, characters are never mysteriously turned into integers. If you want to get a character's ASCII value, you can use the ord method, like ?a.ord (which returns 97).
How about
"a"[0].ord
for 1.8/1.9 portability.
Ruby Programming/ASCII
In previous ruby version before 1.9, you can use question-mark syntax.
?a
After 1.9, we use ord instead.
'a'.ord
For 1.8 and 1.9
?a.class == String ? ?a.ord : ?a
or
"a".class == String ? "a".ord : "a"[0]
Found the solution. "string".ord returns the ascii code of s.
Looks like the methods I had found are broken in the 1.9 series of ruby.
If you read question 22 from project Euler again you'll find you you are not looking for the ASCII values of the characters. What the question is looking for, for the character "A" for example is 1, its position in the alphabet where as "A" has an ASCII value of 65.

Resources