Ruby - Win32Console - using colors changes encoding? - ruby

I have recently installed gem Win32Console for my program. The program has Polish “interface”, which includes Polish special characters. Which works fine for every
puts "Ciekawym polskim słowem jest: żółć"
However, using escape characters in order to colorize the test (which works) seem to change the encoding and Windows 7 CMD displays such diacritic marks incorrectly:
green = "\e[1;32;40m"
puts "#{green}Ciekawym polskim słowem jest: żółć"
Honestly, with my limited knowledge of hot Ruby treats different encoding, I don't really even know where to start - is that a problem with Ruby, Win32Console or Command Prompt itself?

Windows console does not support ASCII escape sequence (\e[...) at all. (ANSI escape code - Wikipedia).

Turns out it was the gem I installed. I later found out that Ruby 2.0 and higher has built-in support for escape codes and it works just fine with UTF-8.

Related

macOS Automator's Ruby defaults to ASCII despite being >= 1.9

I am trying to get access to the text in the macOS clipboard from within Automator using a Ruby script. This script calls macOS's internal Ruby (/usr/bin/ruby). After running into much trouble with unidentified character sequence errors, I noticed that Automator's Ruby defaults to ASCII instead of UTF-8, while this is not the default behaviour of modern Ruby since years ago.
So, running the following:
require 'clipboard'
puts(Clipboard.paste.encoding)
always yields "ASCII", while running the same Ruby interpreter from the command line to run the same script and to paste the same pieces of text always yields "UTF-8".
This becomes an issue when I copy multibyte characters like the accented characters (e.g. ê). For instance if I copy the following text:
Bourdieu, P., & Passeron, J.-C. (1970). La reproduction: éléments pour une théorie du système d’enseignement. Ed. de Minuit.
And then run:
require 'clipboard'
puts(Clipboard.paste)
I get nothing in Automator while I get a copy of the original text on the command line.
If I try to transform the text in any way, I get an error. Let's say I run the following:
require 'clipboard'
puts(Clipboard.paste.gsub(/\r/,""))
In response, I will receive:
-e:2:in `gsub': invalid byte sequence in US-ASCII (ArgumentError)
from -e:2:in `<main>'
How can I avoid this and make sure what I get from the clipboard is already converted into proper UTF-8?
I have tried encode and force_encoding methods, as well as a variety of combinations of # encoding: UTF-8, Encoding.default_external='utf-8' and Encoding.default_internal='utf-8', but it seems there are corrupt characters that hinder the conversion, so no success in the end.
Is there anything I am ignoring here, or any combination I haven't tried?
Notes:
It is Automator that calls the interpreter, and not me. So, I can't modify Automator's call to add switches and modify options.
string.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') works, but the sanitization comes at the cost of chopping off the multibyte characters, which is obviously not the intended behaviour here.
I found that in macOS Mojave 10.14.6, starting the Automator 'Run Shell Script' with # coding: UTF-8 solved the problem. Not sure the #!/usr/bin/ruby is useful or necessary, but I include it. You can test by using this code with and without the # coding: UTF-8:
#!/usr/bin/ruby
# coding: UTF-8
test_s = "will print ✪"
puts test_s
Credit for the answer is from here: discussions.apple.com

Ruby UTF-8 Encoding doesn't work in Windows even with Magic Comment

I'm trying to run a file (ruby anyfile.rb in cmd prompt) with the following contents:
# encoding: utf-8
puts 'áá'
happens the following error:
invalid multibyte char (UTF-8)
It seems that Ruby does not understand the magic comment...
EDIT: If I remove the "# encoding: utf-8" and run the command prompt like this:
ruby-E:UTF-8 encoding.rb
then it works - any ideas?
EDIT2: when i run:
ruby -e 'p [Encoding.default_external, Encoding.default_internal]'
i got [#Encoding:CP850, nil], maybe my Encoding.default_external is wrong?!
Environment:
Windows XP (yes, I also hate windows + ruby)
ruby 1.9.2p180 (2011-02-18) [i386-mingw32]
I believe this is a classic case of "if you hear hooves, think horses, not zebras".
The error message is telling you that you have a byte sequence in your file that is not a valid UTF-8 multibyte sequence.
It is definitely possible that
It seems that Ruby does not understand the magic comment...
as you say, and that up until now nobody noticed that magic comments don't actually work because you are the first person in the history of humankind to actually try to use magic comments. (Actually, this is not possible. If Ruby didn't understand magic comments, it would complain about an invalid ASCII character, since ASCII is the default encoding if no magic comment is present.)
Or, there actually is an invalid multibyte UTF-8 sequence in your file.
Which do you think is more likely? If I were you, I would check my file.
I've encountered similar issues from time to time with files that were not saved as UTF-8, even when the magic comment states so.
I've found that Ruby 1.9.2 had issues to properly convert UTF-8 to codepages 850 and 437, the defaults for command prompt on Windows.
I do recommend you upgrade to Ruby 1.9.3 (latest is patchlevel 125) which solves a lot of encoding issues, specially on Windows.
Also, to verify that your saved file do not contain a Unicode BOM (so it is plain UTF) and is properly saved.
To verify that, you can switch the codepage in the console to unicode (chcp 65001) and try type myscript.rb
You should see the accented letters correctly.
Last but no least, ensure your command prompt uses a TrueType font so extended characters are properly displayed.
Hope that helps.
Try
# encoding: iso-8859-1
Not everything that's text is utf8.
Are you sure you selected 'UTF-8' from the Encoding dropdown when you saved the file in Notepad? I've just tried this on an XP machine and your code example worked for me.

Ruby 1.9 -Ku, mem_cache_store and invalid multibyte escape error

Originally this bug was posted here: https://rails.lighthouseapp.com/projects/8994/tickets/5713-ruby-19-ku-incompatible-with-mem_cache_store
And now, as we've run into the same issue, I'll copy here a question from that issue, hoping someone have an answer already:
When Ruby 1.9 is started in unicode mode (-Ku), mem_cache_store.rb fails to parse:
/usr/local/ruby19/bin/ruby -Ku /usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/
activesupport-3.0.0/lib/active_support/cache/mem_cache_store.rb
/usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/activesupport-3.0.0/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Our case is practically identical: when you set config.action_controller.cache_store to :mem_cache_store, and try to run tests, console, or server, you recieve this in return:
/Users/%username%/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.1/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Any ideas how this can be avoided?..
Ruby 1.9 in unicode mode will attempt to interpret the regular expression as unicode. To avoid this you need to pass the regular expression option "n" for "no encoding":
ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n
Now we have our raw 8-bit encoding (the only thing Ruby 1.8 speaks) as intended:
ruby-1.9.2-p136 :001 > ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n.encoding
=> # <Encoding:ASCII-8BIT>
Hopefully the Rails teams fixes this, for now you have to edit the file.

Reading ASCII-encoded files with Ruby 1.9 in a UTF-8 environment

I just upgraded from Ruby 1.8 to 1.9, and most of my text processing scripts now fail with the error invalid byte sequence in UTF-8. I need to either strip out the invalid characters or specify that Ruby should use ASCII encoding instead (or whatever encoding the C stdio functions write, which is how the files were produced) -- how would I go about doing either of those things?
Preferably the latter, because (as near as I can tell) there's nothing wrong with the files on disk -- if there are weird, invalid characters they don't appear in my editor...
What's your locale set to in the shell? In Linux-based systems you can check this by running the locale command and change it by e.g.
$ export LANG=en_US
My guess is that you are using locale settings which have UTF-8 encoding and this is causing Ruby to assume that the text files were created according to utf-8 encoding rules. You can see this by trying
$ LANG=en_GB ruby -e 'warn "foo".encoding.name'
US-ASCII
$ LANG=en_GB.UTF-8 ruby -e 'warn "foo".encoding.name'
UTF-8
For a more general treatment of how string encoding has changed in Ruby 1.9 I thoroughly recommend
http://blog.grayproductions.net/articles/ruby_19s_string
(code examples assume bash or similar shell - C-shell derivatives are different)

File.open with ruby on windows with a unicode filename

I have a script running on Ruby 1.9.1 on Windows 7
I've distilled my script down to
File.open("翻譯測試.txt")
and still can't get it to work. I know there are issues with Ruby 1.9 filename handling on windows (Using the Windows ANSI library), but would be happy enough with a work around that is callable from Ruby
Most of the Unicode changes like file and directory operations have been improved in 1.9.2 (trunk) and other bigger changes will be merged pretty soon.
As bobince pointed out, this was already asked:
Unicode filenames on Windows in Ruby
This should help you
string = "翻譯測試" # by default, string is encoded as "ASCII"
string.force_encoding("SHIFT-JIS") # retags the String as SHIFT-JIS or whatever UTF char set that #is in
Heres a nice read a bit about char encoings in 1.9.1
http://yehudakatz.com/2010/05/17/encodings-unabridged/

Resources