Getting an ASCII character code in Ruby using `?` (question mark) fails - ruby

I'm in a situation where I need the ASCII value of a character (for Project Euler question #22, if you want to get specific) and I'm running into an issue.
Being new to ruby, I googled it, and found that ? was the way to go: ?A or whatever. But when I incorporate it into my code, the result of that statement is the string "A"—no character code. Same issue with [0] and slice(0), both of which should theoretically return the ASCII code.
The only thing I can think of is that this is a ruby version issue. I'm using 1.9.1-p0, having upgraded from 1.8.6 this afternoon. I cheated a little going from a working version of Ruby, in the same directory, I figured I probably already had the files that don't come bundled with the .zip file, so I didn't download them.
So why exactly are all my ASCII codes being turned into actual characters?

Ruby before 1.9 treated characters somewhat inconsistently. ?a and "a"[0] would return an integer representing the character's ASCII value (which was usually not the behavior people were looking for), but in practical use characters would normally be represented by a one-character string. In Ruby 1.9, characters are never mysteriously turned into integers. If you want to get a character's ASCII value, you can use the ord method, like ?a.ord (which returns 97).

How about
"a"[0].ord
for 1.8/1.9 portability.

Ruby Programming/ASCII
In previous ruby version before 1.9, you can use question-mark syntax.
?a
After 1.9, we use ord instead.
'a'.ord

For 1.8 and 1.9
?a.class == String ? ?a.ord : ?a
or
"a".class == String ? "a".ord : "a"[0]

Found the solution. "string".ord returns the ascii code of s.
Looks like the methods I had found are broken in the 1.9 series of ruby.

If you read question 22 from project Euler again you'll find you you are not looking for the ASCII values of the characters. What the question is looking for, for the character "A" for example is 1, its position in the alphabet where as "A" has an ASCII value of 65.

Related

printing the first letter of a variable returns a number?

I'm fairly new to Ruby and was mucking around with the basics and came across a problem.
Which was that when I tried to print the first letter of a variable it printed a number instead.
the code was.
name = "Max"
print name[0]
but instead of printing the letter M, it would print 77?
could someone please tell me what I did wrong?
The behaviour of this operator is different across various versions of Ruby. You're probably using an older one, in which case this is to be expected.
Here's an excerpt from the docs for Ruby 1.8.7's String class
If passed a single Fixnum, returns the code of the character at that
position.
This has been changed and the newer versions of Ruby (1.9.x and above, according to this site ) simply print the character as a String. See the docs for Ruby 2.1.0.
If passed a single index, returns a substring of one character at that
index.
Ruby 1.9.3, which I happen to have installed on the machine I'm using displays exactly the same behavior:
"Mwada"[0]
=> "M"
"Mwada"[0].class
=> String
If it's ruby 1.8.x, run #chr on a number representing the character:
"Mwada"[0].chr # => "M"
If it's ruby 1.9.x and above, everything will work as you would expect it to:
"Mwada"[0] # => "M"
Humm, I ran your question through irb (interactive ruby console) and got 'M' when looking for name[0]. You can open irb by simply typing irb from command line and test this for yourself.
irb > name = "Max"
=> "Max"
irb > print name[0]
M => nil
Can you tell me more about the context in which you requested name[0]? Could name have been reassigned to something else? Are you calling .to_i (convert to integer) anywhere in your code?
I just checked this issue from a Ruby book that includes information on Ruby 1.8 and Ruby 1.9. The book is called The Ruby Programming Language Book by David Flanagan & Yukihiro Matsumoto.
Well the book says: "In Ruby 1.8, a string is like an array of bytes or 8-bit character codes
s = 'hello' # Ruby 1.8
s[0] # 104: the ASCII character code for the first character 'h'
Ruby 1.9 returns single-character strings rather than character
s = 'hello' # Ruby 1.9
s[0] # 'h': the first character of the string, as a string
(please note some text was left out in the quote above)
In relation to your question directly. I also tested String.bytes.to_a method on
'Max', in my Ruby 1.9 environment.
print name.bytes.to_a
[77, 97, 120] => nil
and it printed the ASCII codes for 'Max', 77 is the ASCII code for 'M'
ASCII Codes
I am quite new Ruby programmer as well. I am also learning Ruby so I have found the book above worthwhile, although I have so far managed to read only the first 70 pages or so, I'll definitely try to finish the book :-)

Getting ASCII value of ith character instead of character in Ruby

I'm a beginner with Ruby and I'm trying to get a certain character in a string in Ruby like so:
string1 = "ohhideraaaa"
puts string1[0]
and it's returning 111 rather than "o". I'm sure I'm doing something really stupid, does anyone have any idea what it is?
I think the best fix is to upgrade to Ruby 1.9.3, the current stable release; you are apparently using 1.8.x, where an expression like yours returns the code of the character at that position, but in 1.9.x, it returns a substring of one character at that position, which is what you want.
If upgrading is not an option, or if you would prefer to stick with Ruby 1.8.x, you can persuade it to give you a substring rather than a character-code by specifying a length as well (in your case, 1):
puts string1[0,1]
Use .chr:
puts string1[0].chr
The .chr method converts an ASCII integer value back to a character.

Split utf8 string regardless of ruby version

str = "é-du-Marché"
I get the first char via
str.split(//).first
How I can get the rest of the string regardless of my ruby version ?
String does not have a method first. So you need in addition a split. When you do the split in unicode-mode (exactly utf-8) you have acces to the first (and other characters).
My solution:
puts RUBY_VERSION
str = "é-du-Marché"
p str.split(//u, 2)
Test with ruby 1.9.2:
1.9.2
["\u00E9", "-du-March\u00E9"]
Test with ruby 1.8.6:
1.8.6
["\303\251", "-du-March\303\251"]
With first and last you get your results:
str.split(//u, 2).first is the first character
str.split(//u, 2).last is the string after the first character.
str[1..-1] should return you everything after the first digit normally.
The first number is the starting index, which is set to 1 to skip the first digit, the second is the length, which is set to -1, so ruby counts from the back
Note: that multibyte characters only work in Ruby 1.9. If you wish to mimic this behavior downwards, you'll have to loop over the bytes yourself and figure out what needs to be removed from the data, cause Ruby 1.8 does not support this.
UPDATE:
You could try this as well, but I can't guarantee that it will work for every multibyte char:
str = "é-du-Marché"
substring = str.mb_chars[1..-1]
the mb_chars is a proxy class that directs the call to the appropiate implementation when dealing with UTF-8, UTF-32 or UTF-16 encoding of characters (e.g. multibyte chars).
More detailed info can be found here : http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html
But I do not know if this exists in older rails versions
UPDATE2:
Ruby 1.8 treats any string just as a bunch of bytes, calling size() on it will return the amount of bytes that is used to store the data. To determine the characters regardless of the encoding try this:
char_array = str.scan(/./m)
substring = char_array[1..-1].join
This should do the trick normally. Try looking at http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 who explains how to treat multibyte data in older ruby versions.
EDIT3:
Playing around with the scan & join operations brings me closer to your problem & solution. I honestly don't have the time at to get the full solution working but if you play with the scan(/./mu) options you convert it to utf-8, which is supported by all ruby versions.

Ruby hexacode to unicode conversion

I crawled a website which contains unicode, an the results look something like, if in code
a = "\\u2665 \\uc624 \\ube60! \\uc8fd \\uae30 \\uc804 \\uc5d0"
May I know how do I do it in Ruby to convert it back to the original Unicode text which is in UTF-8 format?
If you have ruby 1.9, you can try:
a.force_encoding('UTF-8')
Otherwise if you have < 1.9, I'd suggest reading this article on converting to UTF-8 in Ruby 1.8.
short answer: you should be able to 'puts a', and see the string printed out. for me, at least, I can print out that string in both 1.8.7 and 1.9.2
long answer:
First thing: it depends on if you're using ruby 1.8.7, or 1.9.2, since the way strings and encodings were handled changed.
in 1.8.7:
strings are just lists of bytes. when you print them out, if your OS can handle it, you can just 'puts a' and it should work correctly. if you do a[0], you'll get the first byte. if you want to get each character, things are pretty darn tricky.
in 1.9.2
strings are lists of bytes, with an encoding. If the webpage was sent with the correct encoding, your string should already be encoded correctly. if not, you'll have to set it (as per Mike Lewis's answer). if you do a[0], you'll get the first character (the heart). if you want each byte, you can do a.bytes.
If your OS, for whatever reason, is giving you those literal ascii characters,my previous answer is obviously invalid, disregard it. :P
here's what you can do:
a.gsub(/\\u([a-z0-9]+)/){|p| [$1.to_i(16)].pack("U")}
this will scan for the ascii string '\u' followed by a hexadecimal number, and replace it with the correct unicode character.
You can also specify the encoding when you open a new IO object: http://www.ruby-doc.org/core/classes/IO.html#M000889
Compared to Mike's solution, this may prevent troubles if you forget to force the encoding before exposing the string to the rest of your application, if there are multiple mechanisms for retrieving strings from your module or class. However, if you begin crawling SJIS or KOI-8 encoded websites, then Mike's solution will be easier to adapt for the character encoding name returned by the web server in its headers.

What options do exist now to implement UTF8 in Ruby and RoR?

Following the development of Ruby very closely I learned that detailed character encoding is implemented in Ruby 1.9. My question for now is: How may Ruby be used at the moment to talk to a database that stores all data in UTF8?
Background: I am involved in a new project where Ruby/RoR is at least an option. But the project needs to rely on an internationalized character set (it's spread over many countries), preferably UTF8.
So how do you deal with that? Thanks in advance.
Ruby 1.8 works fine with UTF-8 strings for basic operations with the strings. Depending on your application's need, some operations will either not work or not work as expected.
Eg:
1) The size of strings will give you bytes, not characters since the mult-byte support is not there yet. But do you need to know the size of your strings in characters?
2) No splitting a string at a character boundary. But do you need this? Etc.
3) Sorting order will be funky if sorted in Ruby. The suggestion of using the db to sort is a good idea.
etc.
Re poster's comment about sorting data after reading from db: As noted, results will probably not match users' expectations. So the solution is to sort on the db. And it will usually be faster, anyhow--databases are designed to sort data.
Summary: My Ruby 1.8.6 RoR app works fine with international Unicode characters processed and stored as UTF-8 on modern browsers. Right to left languages work fine too. Main issues: be sure that your db and all web pages are set to use UTF-8. If you already have some data in your db, then you'll need to go through a conversion process to change it to UTF-8.
Regards,
Larry
"Unicode ahoy! While Rails has always been able to store and display unicode with no beef, it’s been a little more complicated to truncate, reverse, or get the exact length of a UTF-8 string. You needed to fool around with KCODE yourself and while plenty of people made it work, it wasn’t as plug’n’play easy as you could have hoped (or perhaps even expected).
So since Ruby won’t be multibyte-aware until this time next year, Rails 1.2 introduces ActiveSupport::Multibyte for working with Unicode strings. Call the chars method on your string to start working with characters instead of bytes." Click Here for more
Although I haven't tested it, the character-encodings library (currently in alpha) adds methods to the String class to handle UTF-8 and others. Its page on RubyForge is here. It is designed for Ruby 1.8.
It is my experience, however, that, using Ruby 1.8, if you store data in your database as UTF-8, Ruby will not get in the way as long as your character encoding in the HTTP header is UTF-8. It may not be able to operate on the strings, but it won't break anything. Example:
file.txt:
¡Hola! ¿Como estás? Leí el artículo. ¡Fue muy excellente!
Pardon my poor Spanish; it was the best example of Unicode I could come up with.
in irb:
str = File.read("file.txt")
=> "\302\241Hola! \302\277Como est\303\241s? Le\303\255 el art\303\255culo. \302\241Fue muy excellente!\n"
str += "Foo is equal to bar."
=> "\302\241Hola! \302\277Como est\303\241s? Le\303\255 el art\303\255culo. \302\241Fue muy excellente!\nFoo is equal to bar."
str = " " + str + " "
=> " \302\241Hola! \302\277Como est\303\241s? Le\303\255 el art\303\255culo. \302\241Fue muy excellente!\nFoo is equal to bar. "
str.strip
=> "\302\241Hola! \302\277Como est\303\241s? Le\303\255 el art\303\255culo. \302\241Fue muy excellente!\nFoo is equal to bar."
Basically, it will just treat the UTF-8 as ASCII with odd characters in it. It will not sort lexigraphically if the code points are out of order; however, it will sort by code point. Example:
"\302" <=> "\301"
=> -1
How much are you planning on operating on the data in the Rails app, anyway? Most sorting etc. is usually done by your database engine.

Resources