I'm a bit confused with the '£' symbol in Ruby.
In JRuby if I do :
puts '£40'
in a .rb file I run this, I get
£40
In JRuby IRB I get :
>> pung = 'h40'
=> "h40"
>> pung.gsub!('h', '£')
pung.gsub!('h', '£')
=> "\24340"
The pound symbol is output as \243.
In pure Ruby IRB, I cant even enter the £ symbol.. The cursor jumps to the left three spaces when I hit the £ key!
trying .toutf8 or toutf16 bring up even stranger characters!
Whats going on!??!? Why cant I just output a simple £?
Sometimes this is a problem with the way your console pastes the character. For example, the unicode character sequence may include a character the console uses to do backspace or arrow left. This is probably the issue with the IRB console not receiving your character ok.
For the script, it looks like JRuby's doing what it's supposed to. The issue with the console should probably be reported as a bug, however, since we do want IRB to support entering unicode characters. Pop over to JRuby's bug tracker at http://bugs.jruby.org and provide show a simple session or provide steps to reproduce (which should be easy).
Most likely, the symbol is a Unicode symbol and you are converting it (perhaps unintentionally). If you can't enter the pound sterling symbol, make sure your console supports Unicode.
What do you get when you do £.class ? String? Unicode::String? Perhaps explicitly declaring the character as a Unicode::String or Unicode::Character will give different results.
'\243' is the octal escape sequence for '£'.
Related
I'm having a couple of issues with the most simple code I ever wrote in Ruby.
Let's get straight to the example:
# encode: utf-8
`chcp 65001`
puts "Write something with accents such as àòèùì, or €"
asd = gets
puts asd
If I run this code I get two issues, and both of them are totally incomprehensible to me at the moment.
The first one is that if I enter something such as à, the program will write a new line and not give anything as output. If I press enter again, then it'll output a couple of unreadable characters.
See the pic here as example:
Second issue: if I type € the terminal will just crash, even before pressing enter. Of course I can't attach a screenshot for this one.
Note that the terminal does not crash if I type € while note executing this program. It only does it if I type it while executing it.
I tried to change the default terminal encoding from CP850 to UTF-8, but that didn't change anything. So now I'm back to CP850, since UTF-8 support in the terminal is beta. And the chcp 65001 successfully changes the encoding to UTF-8 anyway.
How would you proceed?
EDIT 1:
It appears to me that the chcp 65001 is successfully chaning the terminal's encoding. In these screenshots, the first one being taken before executing the program and the second being taken after, you'll see that the "Tabella codici corrente", aka "Current code table", has actually changed from CP850 to UTF-8.
EDIT 2:
This simplified MWE will still give the same result:
puts "Write something with accents such as àòèùì, or €"
asd = gets
puts asd
In using the Page Object gem, I'm trying to pull text from a page to verify error messages. One of these error messages contains double-quotes, but when the page object pulls the text from the page, it pulls some other characters.
expected ["Please select a category other than the Default â?oEMSâ?? before saving."]
to include "Please select a category other than the Default \"EMS\" before saving."
(RSpec::Expectations::ExpectationNotMetError)
I'm not quite sure how to escape these - I'm not sure where I could use Regexs and be able to escape these odd characters.
Honestly you are over complicating your validation.
I would recommend simplifying what you are trying to do, start by asking yourself: Is the part in quotes a critical part of your validation?
If it is, isolate it by doing a String.contains("EMS")
If it is not, then you are probably doing too much work, only check for exactly what you need in validation:
String.beginsWith("Please select a category other than the Default")
With respect to the actual issue you are having, on a technical level you have an encoding issue. Encode your result string with utf-8 before you pass it to your validation and you will be fine.
Good luck
It's pretty likely that somewhere along the line encoded the string improperly. (A tipoff is the accented characters followed by ?.) It seems pretty likely that the quotes were converted to "smart quotes" somewhere. This table compares Window-1252 to UTF-8:
Code Point Characters UTF-8 Bytes
Unicode Windows
1252 Expected Actual
------ ---- - --- -----------
U+201C 0x93 “ “ %E2 %80 %9C
U+201D 0x94 ” †%E2 %80 %9D
What you'll want to do is spot check various places in the code to find the first place the string is encoded in something other than UTF-8:
puts error_str.encoding
(For clarity, error_str is the variable that holds the string you are testing. I'm using puts, but you might want have another way to log diagnostic messages.)
Once you find the string that's not encoded UTF-8, you can convert it:
error_str.encode('UTF-8')
Or, if the string is hardcoded somewhere, just replace the string.
For more debugging advice, see: 3 Steps to Fix Encoding Problems in Ruby and How to Get From They’re to They’re.
Ruby Version: ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-darwin14]
Readline Version: 6.2
I'm working with some emojis and many of them behave correctly with the exception of 2. The 🌭 and 🍾 emojis. Here is some terminal output:
(byebug) "🌭"
"\u{1F32D}"
(byebug) "🛍"
"🛍"
(byebug) "🍾"
"\u{1F37E}"
Can someone tell me what's going on here? Is it just some encoding screwiness with irb? I might be snow-blind since I've been wrestling with this for so long so if there's any more information required to answer this please let me know.
Ruby may show a string with various backslash encodings for various reasons, one of which is irregular characters. For example:
"
"
# => "\n"
'"'
# => "\""
This doesn't mean the string contains an actual backslash, but rather that the version shown by inspect contains one. This is a long tradition dating back at least to the era of C in the 1970s where \n and such have been understood to mean "newline character".
In the case of emoji you might find that some are displayed and others aren't. This may be an interaction between the version of Ruby you're using and the terminal settings. As emoji are constantly being introduced you might find older ones display properly but Ruby's not confident enough with new ones to render them as-is, perhaps concerned that's an invalid Unicode character. Rather than showing something blank or the infamous question mark character, it shows the literal code for the character.
I have a string contain some special char like "\u2012" i.e. FIGURE DASH. When i am trying to print this on console I am getting a '?' mark instead of its symbol. I have an editor where in I can insert the symbol using alt+numpad like alt+2012. In editor it I could see the symbol save it in a xml file and get the value using nodevalue, I get a '?' mark.
To summerize I am facing problem to read extended latin a charset. What i need is When i insert such symbols and read it, i should get something like &#xXXXX;.
Please help!
TIA :)
Simply I have a String inpath = "À";, I want to get its unicode value..like &#xXXXX;
The default console encoding in Windows is some MS-DOS code page and they don't support the character. You can try running chcp 65001 before running the program but you might also need to change the console font as well.
You don't need to do anything you wouldn't do with any other character, as long as you use UTF-8. You aren't doing that in many places. You need to explicitly write in your code to save and read the file in UTF-8, and not rely on the platform default encoding.
I have a string in UTF-8 (according to the .encoding.name & .valid_encoding?) and there's an escaped unicode character in it (\u009A)
"Hammarskj\u009Ald"
This SHOULD print out as "Hammarskjšld", but it just drops the grapheme. EG:
puts "Hammarskj\u009Ald"
p "Hammarskj\u009Ald"
Results in the text:
Hammarskjld
"Hammarskj\u009Ald"
It also (if I save the data in the database) drops it when its save as well. I've searched for a while, but I can't quite figure out how to unescape it (which is what I THINK I need to do). A lot of the info out there is for 1.8.7, and some of the stuff for 1.9.2 isn't quite what I need.
Anyone have any idea on how to do what I want? I seem to have a valid UTF-8 string, that all I want to do is save in the database (intact), but it always drops the escaped unicode.
Are you sure it's dropped, and not just not displayed? Maybe it is just the problem of your font having a non-displaying zero-width character in that code point.
When you take it out of the database and p'ed or inspected, if you're seeing the escaped character, it means it's there, not dropped. It's your printing out that's the issue.