Ruby error invalid multibyte char (US-ASCII) - ruby

I am trying to run the ruby script found here
but I am getting the error
invalid multibyte char (US-ASCII)
for line 12 which is
http = Net::HTTP.new("twitter.com", Net::HTTP.https_default_port())
can someone please explain to me what this means and how I can fix it, thanks

When you run the script with Ruby 1.9, change the first two lines of the script to:
#!/usr/bin/env ruby
# encoding: utf-8
require 'net/http'
This tells Ruby to run the script with support for the UTF-8 character set. Without that line Ruby 1.9 would default to the US_ASCII character set.
Just for the record: This will not work in Ruby 1.8, because 1.8 doesn't knew anything about string encodings. And the line is not needed anymore in Ruby 2.0, because Ruby 2.0 is using UTF-8 as the default anyway.

It means that a multibyte character is used and Ruby is not set to handle it. If you are using an old version of Ruby, then put the following magic comment at the beginning of the file:
# coding: utf-8
If you use a modern version of Ruby, then that problem would not arise in the first place.

Related

Ruby: magic comments "frozen_string_literal: true" vs "immutable: string"

In ruby one can freeze all constant strings in a file via two different magic comments at the beginning of a file:
# frozen_string_literal: true
and
# -*- immutable: string -*-
I have no idea what the differences are.
Are there any?
The 1st syntax is the magic comment for Ruby 2.3+ versions to freeze string literals, otherwise you have to use the String method like this:
'hello world!'.freeze
The 2nd syntax is not implemented in Ruby, however it is the way that variables are specified for files in the Emacs text editor.
For example, the following comment in Emacs would declare that the file is a Ruby file and needs Ruby syntax highlighting, and that the variable immutable is set to the value string.
# -*- mode: ruby; immutable: string -*-
After searching around, it looks like that does nothing and is not used by any Ruby syntax highlighting mode.
So you do not need the 2nd syntax.
Digging for anything on the 2nd version, it looks like they had the same intention but the 2nd magic comment syntax does not to appear to have been adopted as of Ruby 2.1.0.
See https://github.com/ruby/ruby/pull/487
The first version # frozen_string_literal: true was adopted in Ruby 2.3.0
I tried the latter version in a few versions of ruby but didn't work. I would guess it should not be used or trusted to work in any version of >= 2.3 but probably no versions support it. In fact, I was not able to find any reference to that version in the open source code on github searching that syntax
https://github.com/ruby/ruby/search?q=immutable%3A+string&unscoped_q=immutable%3A+string

macOS Automator's Ruby defaults to ASCII despite being >= 1.9

I am trying to get access to the text in the macOS clipboard from within Automator using a Ruby script. This script calls macOS's internal Ruby (/usr/bin/ruby). After running into much trouble with unidentified character sequence errors, I noticed that Automator's Ruby defaults to ASCII instead of UTF-8, while this is not the default behaviour of modern Ruby since years ago.
So, running the following:
require 'clipboard'
puts(Clipboard.paste.encoding)
always yields "ASCII", while running the same Ruby interpreter from the command line to run the same script and to paste the same pieces of text always yields "UTF-8".
This becomes an issue when I copy multibyte characters like the accented characters (e.g. ê). For instance if I copy the following text:
Bourdieu, P., & Passeron, J.-C. (1970). La reproduction: éléments pour une théorie du système d’enseignement. Ed. de Minuit.
And then run:
require 'clipboard'
puts(Clipboard.paste)
I get nothing in Automator while I get a copy of the original text on the command line.
If I try to transform the text in any way, I get an error. Let's say I run the following:
require 'clipboard'
puts(Clipboard.paste.gsub(/\r/,""))
In response, I will receive:
-e:2:in `gsub': invalid byte sequence in US-ASCII (ArgumentError)
from -e:2:in `<main>'
How can I avoid this and make sure what I get from the clipboard is already converted into proper UTF-8?
I have tried encode and force_encoding methods, as well as a variety of combinations of # encoding: UTF-8, Encoding.default_external='utf-8' and Encoding.default_internal='utf-8', but it seems there are corrupt characters that hinder the conversion, so no success in the end.
Is there anything I am ignoring here, or any combination I haven't tried?
Notes:
It is Automator that calls the interpreter, and not me. So, I can't modify Automator's call to add switches and modify options.
string.encode!('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '') works, but the sanitization comes at the cost of chopping off the multibyte characters, which is obviously not the intended behaviour here.
I found that in macOS Mojave 10.14.6, starting the Automator 'Run Shell Script' with # coding: UTF-8 solved the problem. Not sure the #!/usr/bin/ruby is useful or necessary, but I include it. You can test by using this code with and without the # coding: UTF-8:
#!/usr/bin/ruby
# coding: UTF-8
test_s = "will print ✪"
puts test_s
Credit for the answer is from here: discussions.apple.com

unable to convert array data to json when '¿' is there

this is my ruby code
require 'json'
a=Array.new
value="¿value"
data=value.gsub('¿','-')
a[0]=data
puts a
puts "json is"
puts jsondata=a.to_json
getting following error
C:\Ruby193>new.rb
C:/Ruby193/New.rb:3: invalid multibyte char (US-ASCII)
C:/Ruby193/New.rb:3: syntax error, unexpected tIDENTIFIER, expecting $end
value="┐value"
^
That's not a JSON problem — Ruby can't decode your source because it contains a multibyte character. By default, Ruby tries to decode files as US-ASCII, but ¿ isn't representable in US-ASCII, so it fails. The solution is to provide a magic comment as described in the documentation. Assuming your source file's encoding is UTF-8, you can tell Ruby that like so:
# encoding: UTF-8
# ...
value = "¿value"
# ...
With an editor or an IDE the soluton of icktoofay (# encoding: UTF-8 - in the first line) is perfect.
In a shell with IRB or PRY it is difficult to find a working configuration. But there is a workaround that at least worked for my encoding problem which was to enter German umlaut characters.
Workaround for PRY:
In PRY I use the edit command to edit the contents of the input buffer
as described in this pry wiki page.
This opens an external editor (you can configure which editor you want). And the editor accepts special characters that can not be entered in PRY directly.

Ruby UTF-8 Encoding doesn't work in Windows even with Magic Comment

I'm trying to run a file (ruby anyfile.rb in cmd prompt) with the following contents:
# encoding: utf-8
puts 'áá'
happens the following error:
invalid multibyte char (UTF-8)
It seems that Ruby does not understand the magic comment...
EDIT: If I remove the "# encoding: utf-8" and run the command prompt like this:
ruby-E:UTF-8 encoding.rb
then it works - any ideas?
EDIT2: when i run:
ruby -e 'p [Encoding.default_external, Encoding.default_internal]'
i got [#Encoding:CP850, nil], maybe my Encoding.default_external is wrong?!
Environment:
Windows XP (yes, I also hate windows + ruby)
ruby 1.9.2p180 (2011-02-18) [i386-mingw32]
I believe this is a classic case of "if you hear hooves, think horses, not zebras".
The error message is telling you that you have a byte sequence in your file that is not a valid UTF-8 multibyte sequence.
It is definitely possible that
It seems that Ruby does not understand the magic comment...
as you say, and that up until now nobody noticed that magic comments don't actually work because you are the first person in the history of humankind to actually try to use magic comments. (Actually, this is not possible. If Ruby didn't understand magic comments, it would complain about an invalid ASCII character, since ASCII is the default encoding if no magic comment is present.)
Or, there actually is an invalid multibyte UTF-8 sequence in your file.
Which do you think is more likely? If I were you, I would check my file.
I've encountered similar issues from time to time with files that were not saved as UTF-8, even when the magic comment states so.
I've found that Ruby 1.9.2 had issues to properly convert UTF-8 to codepages 850 and 437, the defaults for command prompt on Windows.
I do recommend you upgrade to Ruby 1.9.3 (latest is patchlevel 125) which solves a lot of encoding issues, specially on Windows.
Also, to verify that your saved file do not contain a Unicode BOM (so it is plain UTF) and is properly saved.
To verify that, you can switch the codepage in the console to unicode (chcp 65001) and try type myscript.rb
You should see the accented letters correctly.
Last but no least, ensure your command prompt uses a TrueType font so extended characters are properly displayed.
Hope that helps.
Try
# encoding: iso-8859-1
Not everything that's text is utf8.
Are you sure you selected 'UTF-8' from the Encoding dropdown when you saved the file in Notepad? I've just tried this on an XP machine and your code example worked for me.

Ruby 1.9 -Ku, mem_cache_store and invalid multibyte escape error

Originally this bug was posted here: https://rails.lighthouseapp.com/projects/8994/tickets/5713-ruby-19-ku-incompatible-with-mem_cache_store
And now, as we've run into the same issue, I'll copy here a question from that issue, hoping someone have an answer already:
When Ruby 1.9 is started in unicode mode (-Ku), mem_cache_store.rb fails to parse:
/usr/local/ruby19/bin/ruby -Ku /usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/
activesupport-3.0.0/lib/active_support/cache/mem_cache_store.rb
/usr/local/ruby-1.9.2-p0/lib/ruby/gems/1.9.1/gems/activesupport-3.0.0/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Our case is practically identical: when you set config.action_controller.cache_store to :mem_cache_store, and try to run tests, console, or server, you recieve this in return:
/Users/%username%/.rvm/gems/ruby-1.9.2-p0/gems/activesupport-3.0.1/lib/active_support/
cache/mem_cache_store.rb:32: invalid multibyte escape: /[\x00-\x20%\x7F-\xFF]/
Any ideas how this can be avoided?..
Ruby 1.9 in unicode mode will attempt to interpret the regular expression as unicode. To avoid this you need to pass the regular expression option "n" for "no encoding":
ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n
Now we have our raw 8-bit encoding (the only thing Ruby 1.8 speaks) as intended:
ruby-1.9.2-p136 :001 > ESCAPE_KEY_CHARS = /[\x00-\x20%\x7F-\xFF]/n.encoding
=> # <Encoding:ASCII-8BIT>
Hopefully the Rails teams fixes this, for now you have to edit the file.

Resources