Suppose I have a string "this\n\tis \"helpful\"" and I'd like it to be displayed in the terminal, unescaped, for copy/paste reasons, i.e.
this
is "helpful"
Is this possible in terminal, either in IRB or otherwise?
11:15:14lasto1.9.3 ~/clients 🐤 irb
1.9.3-p448 :001 > s = "this\n\tis \"helpful\""
=> "this\n\tis \"helpful\""
1.9.3-p448 :002 > puts s
this
is "helpful"
=> nil
1.9.3-p448 :003 >
Related
I'm running the Ruby irb on a DOS environment.
I've defined a dictionary.
irb(main):001:0> stuff = {'name'=> 'Zed', 'age'=>36, 'height'=>6*12+2}
I've made a mistake in calling it
irb(main):004:0> puts stuff['age]
the ruby prompt changes to an apostrophe ' instead of the usual >
irb(main):006:1'
irb(main):007:1'
IRB doesn't work anymore.
What has happened here and how do I get the shell to function again without quitting the program?
It is waiting for the closing ',that you missed here puts stuff['age]. Use Ctrl+c to get the prompt back,that you are expecting.
See below:
2.0.0p0 :001 > stuff = {'name'=> 'Zed', 'age'=>36, 'height'=>6*12+2}
=> {"name"=>"Zed", "age"=>36, "height"=>74}
2.0.0p0 :002 > puts stuff['age]
2.0.0p0 :003'> ^C
2.0.0p0 :003 >
I got an error JSON::GeneratorError: source sequence is illegal/malformed utf-8 when trying to convert a hash into json string. I am wondering if this has anything to do with encoding, and how can I make to_json just treat \xAE as it is?
$ irb
2.0.0-p247 :001 > require 'json'
=> true
2.0.0-p247 :002 > a = {"description"=> "iPhone\xAE"}
=> {"description"=>"iPhone\xAE"}
2.0.0-p247 :003 > a.to_json
JSON::GeneratorError: source sequence is illegal/malformed utf-8
from (irb):3:in `to_json'
from (irb):3
from /Users/cchen21/.rvm/rubies/ruby-2.0.0-p247/bin/irb:16:in `<main>'
\xAE is not a valid character in UTF-8, you have to use \u00AE instead:
"iPhone\u00AE"
#=> "iPhone®"
Or convert it accordingly:
"iPhone\xAE".force_encoding("ISO-8859-1").encode("UTF-8")
#=> "iPhone®"
Every string in Ruby has a underlaying encoding. Depending on your LANG and LC_ALL environment variables, the interactive shell might be executing and interpreting your strings in a given encoding.
$ irb
1.9.3p392 :008 > __ENCODING__
=> #<Encoding:UTF-8>
(ignore that I’m using Ruby 1.9 instead of 2.0, the ideas are still the same).
__ENCODING__ returns the current source encoding. Yours will probably also say UTF-8.
When you create literal strings and use byte escapes (the \xAE) in your code, Ruby is trying to interpret that according to the string encoding:
1.9.3p392 :003 > a = {"description" => "iPhone\xAE"}
=> {"description"=>"iPhone\xAE"}
1.9.3p392 :004 > a["description"].encoding
=> #<Encoding:UTF-8>
So, the byte \xAE at the end of your literal string will be tried to be treated as a UTF-8 stream byte, but it is invalid. See what happens when I try to print it:
1.9.3-p392 :001 > puts "iPhone\xAE"
iPhone�
=> nil
You either need to provide the registered mark character in a valid UTF-8 encoding (either using the real character, or providing the two UTF-8 bytes):
1.9.3-p392 :002 > a = {"description1" => "iPhone®", "description2" => "iPhone\xc2\xae"}
=> {"description1"=>"iPhone®", "description2"=>"iPhone®"}
1.9.3-p392 :005 > a.to_json
=> "{\"description1\":\"iPhone®\",\"description2\":\"iPhone®\"}"
Or, if your input is ISO-8859-1 (Latin 1) and you know it for sure, you can tell Ruby to interpret your string as another encoding:
1.9.3-p392 :006 > a = {"description1" => "iPhone\xAE".force_encoding('ISO-8859-1') }
=> {"description1"=>"iPhone\xAE"}
1.9.3-p392 :007 > a.to_json
=> "{\"description1\":\"iPhone®\"}"
Hope it helps.
I'm using a Ruby 1.8 lib kakasi-ruby, but it seems that it can only be compiled against Ruby 1.8 (https://github.com/hogelog/kakasi-ruby/issues/2)
My application is Ruby 1.9.3, so I need to call kakasi-ruby from Ruby 1.9.3.
How should I do?
Do I have to open a subprocess with Ruby 1.8, and wait for it finish to get the process return value?
Edit:
https://github.com/hogelog/kakasi-ruby
Found 3 possible paths:
There seems to be a branch for 1.9 in the repo. Maybe try to compile that instead?
Otherwise your fastest option is probably to go back to 1.8 depending on what kind of app it is.
Calling with 1.8 may work BUT since the library seems to be a binding to some C code you could probably call that code directly just as well.
BTW, here is the usage in Ruby 1.9
plee#sos:~/Japanese$ irb
1.9.3p194 :001 > require 'kakasi'
=> true
1.9.3p194 :002 > src="前原誠司経済財政相は4日、朝日新聞などのインタビューに対し"
=> "前原誠司経済財政相は4日、朝日新聞などのインタビューに対し"
1.9.3p194 :003 > src=src.encode("EUC-JP", "UTF-8")
=> "\x{C1B0}\x{B8B6}\x{C0BF}\x{BBCA}\x{B7D0}\x{BAD1}\x{BAE2}\x{C0AF}\x{C1EA}\x{A4CF}\x{A3B4}\x{C6FC}\x{A1A2}\x{C4AB}\x{C6FC}\x{BFB7}\x{CAB9}\x{A4CA}\x{A4C9}\x{A4CE}\x{A5A4}\x{A5F3}\x{A5BF}\x{A5D3}\x{A5E5}\x{A1BC}\x{A4CB}\x{C2D0}\x{A4B7}"
1.9.3p194 :004 > dst=Kakasi.kakasi("-w", src)
=> "\xC1\xB0\xB8\xB6 \xC0\xBF\xBB\xCA \xB7\xD0\xBA\xD1 \xBA\xE2\xC0\xAF \xC1\xEA \xA4\xCF \xA3\xB4 \xC6\xFC \xA1\xA2 \xC4\xAB\xC6\xFC\xBF\xB7\xCA\xB9 \xA4\xCA\xA4\xC9\xA4\xCE \xA5\xA4\xA5\xF3\xA5\xBF\xA5\xD3\xA5\xE5\xA1\xBC \xA4\xCB \xC2\xD0\xA4\xB7"
1.9.3p194 :005 > dst.force_encoding("EUC-JP")
=> "\x{C1B0}\x{B8B6} \x{C0BF}\x{BBCA} \x{B7D0}\x{BAD1} \x{BAE2}\x{C0AF} \x{C1EA} \x{A4CF} \x{A3B4} \x{C6FC} \x{A1A2} \x{C4AB}\x{C6FC}\x{BFB7}\x{CAB9} \x{A4CA}\x{A4C9}\x{A4CE} \x{A5A4}\x{A5F3}\x{A5BF}\x{A5D3}\x{A5E5}\x{A1BC} \x{A4CB} \x{C2D0}\x{A4B7}"
1.9.3p194 :006 > dst=dst.encode("UTF-8", "EUC-JP")
=> "前原 誠司 経済 財政 相 は 4 日 、 朝日新聞 などの インタビュー に 対し"
1.9.3p194 :007 >
I'm trying to parse some JSON containing escaped unicode characters using JSON.parse. But on one machine, using json/ext, it gives back incorrect values. For example, \u2030 should return E2 80 B0 in UTF-8, but instead I'm getting 01 00 00. It fails with either the escaped "\\u2030" or the unescaped "\u2030".
1.9.2p180 :001 > require 'json/ext'
=> true
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
=> {"f"=>"\u0001\u0000\u0000"}
1.9.2p180 :003 > s["f"].encoding
=> #<Encoding:UTF-8>
1.9.2p180 :004 > s["f"].valid_encoding?
=> true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
=> [1, 0, 0]
It works on my other machine with the same version of ruby and similar environment variables. The Gemfile.lock on both machines is identical, including json (= 1.6.3). It does work with json/pure on both machines.
1.9.2p180 :001 > require 'json/pure'
=> true
1.9.2p180 :002 > s = JSON.parse '{"f":"\\u2030"}'
=> {"f"=>"‰"}
1.9.2p180 :003 > s["f"].encoding
=> #<Encoding:UTF-8>
1.9.2p180 :004 > s["f"].valid_encoding?
=> true
1.9.2p180 :005 > s["f"].bytes.map do |x| x; end
=> [226, 128, 176]
So is there something else in my environment or setup that could be causing it to parse incorrectly?
Recently ran into this same problem, and I tracked it down to this Ruby bug caused by the declaration of this buffer in Ruby 1.9.2 and how it gets optimized by GCC. It's fixed in this commit.
You can recompile Ruby with -O0 or use a newer version of Ruby (1.9.3 or better) to fix it.
Try upgrade your JSON Gem (at least to 1.6.6) or newest 1.7.1.
I am using JSON implementation for Ruby in my rails project to parse the JSON string sent by ajax, but I found that although the json string is in UTF-8, the result coming out is in ASCII-8BIT by default, see below
jruby-1.6.7 :068 > json_text = '["に到着を待っている"]'
=> "[\"に到着を待っている\"]"
jruby-1.6.7 :069 > json_text.encoding
=> #<Encoding:UTF-8>
jruby-1.6.7 :070 > json_parsed = JSON.parse(json_text)
=> ["\u00E3\u0081\u00AB\u00E5\u0088\u00B0\u00E7\u009D\u0080\u00E3\u0082\u0092\u00E5\u00BE\u0085\u00E3\u0081\u00A3\u00E3\u0081\u00A6\u00E3\u0081\u0084\u00E3\u0082\u008B"]
jruby-1.6.7 :071 > json_parsed.first.encoding
=> #<Encoding:ASCII-8BIT>
I don't want it being escaped, I would like to have a UTF-8 result. Is there a way to set that? I check the documentation of the JSON project, finding not encoding options for the method JSON.parse. Maybe I missed something, how could I do that?
UPDATE:
as notified by #fl00r, this example is working fine in MRI, but not in JRUBY
This looks like a bug, as this actually works when using the pure version:
jruby-1.6-head :001 > require 'json/pure'
=> true
jruby-1.6-head :002 > json_text = '["に到着を待っている"]'
=> "[\"に到着を待っている\"]"
jruby-1.6-head :003 > json_parsed = JSON.parse(json_text)
=> ["に到着を待っている"]
jruby-1.6-head :004 > json_parsed.first.encoding
=> #<Encoding:UTF-8>
jruby-1.6-head :005 >
Edit: Just saw you opened a ticket for this...
Edit 2: This actually seems to have already been fixed by this commit. To install latest code from json:
$ git clone https://github.com/flori/json.git
$ cd json
$ rake jruby_gem
$ jruby -S gem install pkg/json-1.6.6-java.gem