I'm trying to migrate a sinatra application to ruby 1.9
I'm using sinatra 1.0, rack 1.2.0 and erb templates
when I start sinatra it works but when I request the web page from the browser I get this error:
Encoding::CompatibilityError at /
incompatible character encodings: ASCII-8BIT and UTF-8
all .rb files has this header:
#!/usr/bin/env ruby
# encoding: utf-8
I think the problem is in the erb files even if it shows that it's UTF-8 encoded
[user#localhost views]$ file home.erb
home.erb: UTF-8 Unicode text
any one had this problem before? is sinatra not fully compatible with ruby 1.9?
I'm not familiar with the specifics of your situation, but this kind of error has come up in Ruby 1.9 when there's an attempt to concatenate a string in the source code (typically encoded in UTF-8) with a string from outside of the system, e.g., input from an HTML form or data from a database.
ASCII-8BIT is basically a synonym for binary. It suggests that the input string was not tagged with the actual encoding that has been used (for example, UTF-8 or ISO-8859-1).
My understanding is that exception messages are not seen in Ruby 1.8 because it treats strings as binary and silently concatenates strings of different encodings. For subtle reasons, this often isn't a problem.
I ran into a similar error yesterday and found this excellent overview.
http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
One option to get your error message to go away is to use force_encoding('UTF-8') (or some other encoding) on the string coming from the external source. This is not to be done lightly, and you'll want to have a sense of the implications.
I had the same issue. The problem was a utf8 encoded file which should be us-ascii.
I checked using the file command (on OSX):
$ file --mime-encoding somefile
somefile: utf-8
After removing the weird characters from the file:
$ file --mime-encoding somefile
somefile: us-ascii
This fixed the issue for me.
Related
Hannibal episodes in tvdb have weird characters in them.
For example:
Œuf
So ruby spits out:
./manifesto.rb:19:in `encode': "\xC3" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
from ./manifesto.rb:19:in `to_json'
from ./manifesto.rb:19:in `<main>'
Line 19 is:
puts #tree.to_json
Is there a way to deal with these non utf characters? I'd rather not replace them, but convert them? Or ignore them? I don't know, any help appreciated.
Weird part is that script works fine via cron. Manually running it creates error.
File.open(yml_file, 'w') should be change to File.open(yml_file, 'wb')
It seems you should use another encoding for the object. You should set the proper codepage to the variable #tree, for instance, using iso-8859-1 instead of ascii-8bit by using #tree.force_encoding('ISO-8859-1'). Because ASCII-8BIT is used just for binary files.
To find the current external encoding for ruby, issue:
Encoding.default_external
If sudo solves the problem, the problem was in default codepage (encoding), so to resolve it you have to set the proper default codepage (encoding), by either:
In ruby to change encoding to utf-8 or another proper one, do as follows:
Encoding.default_external = Encoding::UTF_8
In bash, grep current valid set up:
$ sudo env|grep UTF-8
LC_ALL=ru_RU.UTF-8
LANG=ru_RU.UTF-8
Then set them in .bashrc properly, in a similar way, but not exactly with ru_RU language, such as the following:
export LC_ALL=ru_RU.UTF-8
export LANG=ru_RU.UTF-8
I had the same problems when saving to the database. I'll offer one thing that I use (perhaps, this will help someone).
if you know that sometimes your text has strange characters, then
before saving you can encode your text in some other format, and then
decode the text again after it is returned from the database.
example:
string = "Œuf"
before save we encode string
text_to_save = CGI.escape(string)
(character "Œ" encoded in "%C5%92" and other characters remained the same)
=> "%C5%92uf"
load from database and decode
CGI.unescape("%C5%92uf")
=> "Œuf"
I just suffered through a number of hours trying to fix a similar problem. I'd checked my locales, database encoding, everything I could think of and was still getting ASCII-8BIT encoded data from the database.
Well, it turns out that if you store text in a binary field, it will automatically be returned as ASCII-8BIT encoded text, which makes sense, however this can (obviously) cause problems in your application.
It can be fixed by changing the column encoding back to :text in your migrations.
this is my ruby code
require 'json'
a=Array.new
value="¿value"
data=value.gsub('¿','-')
a[0]=data
puts a
puts "json is"
puts jsondata=a.to_json
getting following error
C:\Ruby193>new.rb
C:/Ruby193/New.rb:3: invalid multibyte char (US-ASCII)
C:/Ruby193/New.rb:3: syntax error, unexpected tIDENTIFIER, expecting $end
value="┐value"
^
That's not a JSON problem — Ruby can't decode your source because it contains a multibyte character. By default, Ruby tries to decode files as US-ASCII, but ¿ isn't representable in US-ASCII, so it fails. The solution is to provide a magic comment as described in the documentation. Assuming your source file's encoding is UTF-8, you can tell Ruby that like so:
# encoding: UTF-8
# ...
value = "¿value"
# ...
With an editor or an IDE the soluton of icktoofay (# encoding: UTF-8 - in the first line) is perfect.
In a shell with IRB or PRY it is difficult to find a working configuration. But there is a workaround that at least worked for my encoding problem which was to enter German umlaut characters.
Workaround for PRY:
In PRY I use the edit command to edit the contents of the input buffer
as described in this pry wiki page.
This opens an external editor (you can configure which editor you want). And the editor accepts special characters that can not be entered in PRY directly.
This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
invalid multibyte char (US-ASCII) with Rails and Ruby 1.9
How can I put French characters in a Ruby file? Here is an error:
SyntaxError in ArticlesController#show
/.../app/controllers/articles_controller.rb:47: invalid multibyte char (US-ASCII)
/.../app/controllers/articles_controller.rb:47: invalid multibyte char (US-ASCII)
/.../app/controllers/articles_controller.rb:47: syntax error, unexpected $end, expecting '}'
...#article, notice: 'Article a été créé avec succes.' }
In a HTML file a put this in head and the accents work:
<!DOCTYPE html>
<head>
<meta http-equiv="content-type" content="text/html"; charset="utf8">
<meta http-equiv="Content-Script-Type" content="text/javascript">
<meta http-equiv="Content-Style-Type" content="text/css">
<!-- ... autres mentions de l'entête de fichier ... -->
</head>
Ruby has a special syntax for declaring the charset of a file: if you are using multibyte characters, you can use this line at the very top of your file, with no preceding whitespace
# encoding: utf-8
Since Ruby 1.9, Strings always have an encoding attached. So Ruby can properly handle multi-byte characters and is able to convert between different encodings. Prior versions of Ruby basically handled strings as byte arrays which made it nearly impossible to properly handle multiple encodings.
By default, Ruby 1.9 uses US_ASCII encoding everywhere while Ruby since 2.0 uses UTF-8 by default.
Generally, you only have to change anything if you are running Ruby 1.9. If your editor saves UTF-8 files and you are running Ruby >= 2.0, everything will be fine by default.
Still, in all Ruby versions since 1.9, you can change the encoding used. There are three different default encodings you can set (which all use the respective Ruby's default encoding by default, i.e.m US_ASCII on 1.9, UTF-8 on Ruby 2.0 and newer):
internal encoding: The default encoding all strings are converted to. This is the encoding that strings are saved internally.
external encoding: When reading files, assume them to be in that encoding.
source encoding: Assume the ruby source code to be written in this encoding
The former two encodings can be set like this
Encoding.default_internal = 'UTF-8'
Encoding.default_external = 'UTF-8'
They are then used during all operations in the current Ruby processes lifetime.
The source encoding can be set using a "magic comment" on the first line of your ruby file (or below the shebang) like so
# encoding: UTF-8
or by starting your script using ruby -KU which also sets the default encoding to UTF-8. You can also set this in your shebang. In your specific case, you have to at least set the source encoding using one of the provided mechanisms.
See http://graysoftinc.com/character-encodings and especially http://graysoftinc.com/character-encodings/ruby-19s-three-default-encodings for some more information and background on String encodings in Ruby 1.9.
Which encoding do you use?
You can define the encoding of your sourcefile in the header. Or better: If you use characters beyond ASCII, you must define it.
Alex already mentioned
#encoding: utf-8
If you don't use UTF-8, but your local french codepage, you may use this header in the first line of your source code:
#encoding: cp1252
Perhaps you will get other encoding errors, when you read and save file. Details can be found in http://blog.grayproductions.net/articles/ruby_19s_three_default_encodings
I try to use search, google but with no luck.
OS: Windows XP
Ruby version 1.9.3po
Error:
`gsub': incompatible character encodings: UTF-8 and IBM437
Code:
require 'rubygems'
require 'hpricot'
require 'net/http'
source = Net::HTTP.get('host', '/' + ARGV[0] + '.asp')
doc = Hpricot(source)
doc.search("p.MsoNormal/a").each do |a|
puts a.to_plain_text
end
Program output few strings but when text is ”NOŻYCE” I am getting error above.
Could somebody help?
You could try converting your HTML to UTF-8 since it appears the original is in vintage-retro DOS format:
source.encode!('UTF-8')
That should flip it from 8-bit ASCII to UTF-8 as expected by the Hpricot parser.
The inner encoding of the source variable is UTF-8 but that is not what you want.
As tadman wrote, you must first tell Ruby that the actual characters in the string are in the IBM437 encoding. Then you can convert that string to your favourite encoding, but only if such a conversion is possible.
source.force_encoding('IBM437').encode('UTF-8')
In your case, you cannot convert your string to ISO-8859-2 because not all IBM437 characters can be converted to that charset. Sticking to UTF-8 is probably your best option.
Anyway, are you sure that that file is actually transmitted in IBM437? Maybe it is stored as such in the HTTP server but it is sent over-the-wire with another encoding. Or it may not even be exactly in IBM437, it may be CP852, also called MS-DOC Latin 2 (different from ISO Latin 2).
I just upgraded from Ruby 1.8 to 1.9, and most of my text processing scripts now fail with the error invalid byte sequence in UTF-8. I need to either strip out the invalid characters or specify that Ruby should use ASCII encoding instead (or whatever encoding the C stdio functions write, which is how the files were produced) -- how would I go about doing either of those things?
Preferably the latter, because (as near as I can tell) there's nothing wrong with the files on disk -- if there are weird, invalid characters they don't appear in my editor...
What's your locale set to in the shell? In Linux-based systems you can check this by running the locale command and change it by e.g.
$ export LANG=en_US
My guess is that you are using locale settings which have UTF-8 encoding and this is causing Ruby to assume that the text files were created according to utf-8 encoding rules. You can see this by trying
$ LANG=en_GB ruby -e 'warn "foo".encoding.name'
US-ASCII
$ LANG=en_GB.UTF-8 ruby -e 'warn "foo".encoding.name'
UTF-8
For a more general treatment of how string encoding has changed in Ruby 1.9 I thoroughly recommend
http://blog.grayproductions.net/articles/ruby_19s_string
(code examples assume bash or similar shell - C-shell derivatives are different)