incompatible character encodings: UTF-8 and ASCII-8BIT in render action - ruby

ActionView::Template::Error (incompatible character encodings: UTF-8
and ASCII-8BIT): app/controllers/posts_controller.rb:27:in `new'
# GET /posts/new
def new
if params[:post]
#post = Post.new(post_params).dup
if #post.valid?
render :action => "confirm"
else
format.html { render action: 'new' }
format.json { render json: #post.errors, status: :unprocessable_entity }
end
else
#post = Post.new
#document = Document.new
#documents = #post.documents.all
#document = #post.documents.build
end
I don't know why it is happening.

Make sure config.encoding = "utf-8" is there in application.rb file.
Make sure you are using 'mysql2' gem instead mysql gem
Putting # encoding: utf-8 on top of rake file.
Above Rails.application.initialize! line in environment.rb file, add following two lines:
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
solution from here: http://rorguide.blogspot.in/2011/06/incompatible-character-encodings-ascii.html
If above solution not helped then I think you either copy/pasted a part of your Haml template into the file, or you're working with a non-Unicode/non-UTF-8 friendly editor.
If you can recreate that file from the scratch in a UTF-8 friendly editor. There are plenty for any platform and see whether this fixes your problem.
Sometimes you may get this error:
incompatible character encodings: ASCII-8BIT and UTF-8
That typically happens because you are trying to concatenate two strings, and one contains characters that do not map to the character-set of the other string. There are characters in ISO-8859-1 that do not have equivalents in UTF-8, and vice-versa and how to handle string joining with those incompatibilities requires the programmer to step in.

I was upgrading my rails and spree and the error was actually coming from cache
Deleting the cache solved the problem for me
rm -rf tmp/cache

Related

Encoding::UndefinedConversionError "\xC2" from ASCII-8BIT to UTF-8 with redcarpet

I'm using redcarpet gem to render some markdown text to html, a portion of the markdown was user inserted, and they typed in a totally valid special character (£), but now when rendering it I get a: Encoding::UndefinedConversionError "\xC2" from ASCII-8BIT to UTF-8
I know it's the £ sign because if I replace it in the text to render then it all works. but they might be inserting other special characters.
I'm not sure how to deal with this, here's my code building the html:
def generate_document
temp_file_service = TempFileService.new
path = temp_file_service.path
template_url = TenantConfig.get('DEPOSIT_GUIDE_TEMPLATE') || DEFAULT_DOC
template = open(template_url, 'rb', &:read)
html = ERB.new(template).result(binding)
File.open( path, 'w') do |f|
f.write html
end
File.new(path, 'r')
end
the error is risen on the f.write line
here's my html.erb:
<%= markdown(clause.text) %>
and here's the helper:
def markdown(text)
Redcarpet::Markdown.new(Redcarpet::Render::HTML).render(text)
end
Note that the encoding problem happens only when saving the html to a file, somewhere else I correctly use the same markdown helper to render the text to the browser, and no problems there.
It would work also the other way, cleaning the markdown code before saving it to DB and replacing any special characters with the corresponding html code (ex. £ becomes £)
I tried having a before_save callback (as suggested here: Encoding::UndefinedConversionError: "\xC2" from ASCII-8BIT to UTF-8) :
before_save :convert_text
private
def convert_text
self.text = self.text.force_encoding("utf-8")
end
which didn't work
I also tried (as recommended here: Using ERB in Markdown with Redcarpet):
<%= markdown(extra_clause.text).html_safe %>
which didn't work either.
How would I fix either way?
in the end I solved this with adding force_encoding("UFT-8") to the html
like this:
f.write html.force_encoding("UTF-8")
it fixed it.

Jekyll encoding name of category special characters

My Jekyll installation used to work. Since an update, I face an issue with URL containing tag names which have some special characters.
I now get an error message when trying to reach a URL with special characters in it like http://127.0.0.1:4000/tag/Actualit%C3%A9%20europ%C3%A9enne/, where Actualité européenne is the name of a category.
The error message is incompatible character encodings: UTF-8 and ASCII-8BIT. All the files in _posts directory are utf-8.
Here is the stack trace :
[2017-01-30 17:39:09] ERROR Encoding::CompatibilityError: incompatible
character encodings: UTF-8 and ASCII-8BIT
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:313:in
'set_filename'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:282:in
'exec_handler'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:217:in
'do_GET'
/var/lib/gems/2.1.0/gems/jekyll-3.4.0/lib/jekyll/commands/serve/servlet.rb:30:in
'do_GET' /usr/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in
'service'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:213:in
'service' /usr/lib/ruby/2.1.0/webrick/httpserver.rb:138:in 'service'
/usr/lib/ruby/2.1.0/webrick/httpserver.rb:94:in 'run'
/usr/lib/ruby/2.1.0/webrick/server.rb:295:in 'block in start_thread'
[2017-01-30 17:41:59] ERROR Encoding::CompatibilityError: incompatible
character encodings: UTF-8 and ASCII-8BIT
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:313:in
'set_filename'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:282:in
'exec_handler'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:217:in
'do_GET'
/var/lib/gems/2.1.0/gems/jekyll-3.4.0/lib/jekyll/commands/serve/servlet.rb:30:in
'do_GET' /usr/lib/ruby/2.1.0/webrick/httpservlet/abstract.rb:106:in
'service'
/usr/lib/ruby/2.1.0/webrick/httpservlet/filehandler.rb:213:in
'service' /usr/lib/ruby/2.1.0/webrick/httpserver.rb:138:in 'service'
/usr/lib/ruby/2.1.0/webrick/httpserver.rb:94:in 'run'
/usr/lib/ruby/2.1.0/webrick/server.rb:295:in 'block in start_thread'
I've renamed all the files in _posts to remove special characters in the filenames, but still does not work. I don't want to rename the tags.
all the pages are encoded to 'utf-8' by default. but you can override this in config.yml:
encoding: ENCODING
but it seems that jekyll doesn't works well (until now: jan-2017) with unicode no english characters, see this similar issue Slugify a string doesn't seem to work on Unicode/Swedish letters #4623. the space also my cause a little problem if you don't put the category inside ' '
a fix whould be to slugify your "Catégories" explicitly before integrating them in the url, using a generator, with:
slug = category.strip.downcase.gsub(' ', '-').gsub(/[^\w-]/, '') # categories slugiffier
// use this slug as the category id
the slugifier above just down case, replace space with -, and remove all non ascii letter, so you'll need to add other substitutions gsub before the last one .gsub(/[^\w-]/, '') to replace:
é è ê -> e
à â -> a
...
Update
while reading the old jekyll issues in GitHub list to implement a "fix" for that one, I found this detailed solution posted by #david-jacquel on 2014 :
This needs to change the way Jekyll generates urls for posts. This can
be done with a plugin.
# _plugins/post.rb
module Jekyll
class Post
# override post method in order to return categories names as slug
# instead of strings
#
# An url for a post with category "category with space" will be in
# slugified form : /category-with-space
# instead of url encoded form : /category%20with%20space
#
# #see utils.slugify
def url_placeholders
{
:year => date.strftime("%Y"),
:month => date.strftime("%m"),
:day => date.strftime("%d"),
:title => slug,
:i_day => date.strftime("%-d"),
:i_month => date.strftime("%-m"),
:categories => (categories || []).map { |c| Utils.slugify(c) }.join('/'),
:short_month => date.strftime("%b"),
:short_year => date.strftime("%y"),
:y_day => date.strftime("%j"),
:output_ext => output_ext
}
end
end
end
-- David Jacquel on Jekyll/jekyll-help/issues/129#
that will resolve the space issue, and give a starter point to solve the encoding name

incompatible character encodings: ASCII-8BIT and UTF-8 in Oga gem

I am using an XML/HTML parser called Oga.
I am attempting to crawl this URL: http://www.johnvanderlyn.com and parse the body for text, like so:
def get_page
body = Net::HTTP.get(URI.parse(#url))
document = Oga.parse_html(body)
end
document = get_page
words = document.css('body').text
When I get this error:
/gems/oga-2.7/lib/oga/xml/node_set.rb:276:in block in text': incompatible character encodings: ASCII-8BIT and UTF-8 (Encoding::CompatibilityError)
That is related to this bit of code here.
What could be causing this and how can I fix it? Is there a way for me to fix it locally, or do I have to fork the gem, fix that method and then use my fork?
Thoughts?
The bit of code you linked has nothing to do with the glitch, that is the issue of body is being interpreted in wrong encoding. Try adding body = body.force_encoding 'UTF-8' before parsing a document:
def get_page
body = Net::HTTP.get(URI.parse(#url)).force_encoding 'UTF-8'
document = Oga.parse_html(body)
end

`scan': invalid byte sequence in UTF-8 (ArgumentError)

I'm trying to read a .txt file in ruby and split the text line-by-line.
Here is my code:
def file_read(filename)
File.open(filename, 'r').read
end
puts f = file_read('alice_in_wonderland.txt')
This works perfectly. But when I add the method line_cutter like this:
def file_read(filename)
File.open(filename, 'r').read
end
def line_cutter(file)
file.scan(/\w/)
end
puts f = line_cutter(file_read('alice_in_wonderland.txt'))
I get an error:
`scan': invalid byte sequence in UTF-8 (ArgumentError)
I found this online for untrusted website and tried to use it for my own code but it's not working. How can I remove this error?
Link to the file: File
The linked text file contains the following line:
Character set encoding: ISO-8859-1
If converting it isn't desired or possible then you have to tell Ruby that this file is ISO-8859-1 encoded. Otherwise the default external encoding is used (UTF-8 in your case). A possible way to do that is:
s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1')
s.encoding # => #<Encoding:ISO-8859-1>
Or even like this if you prefer your string UTF-8 encoded (see utf8everywhere.org):
s = File.read('alice_in_wonderland.txt', encoding: 'ISO-8859-1:UTF-8')
s.encoding # => #<Encoding:UTF-8>
It seems to work if you read the file directly from the page, maybe there's something funny about the local copy you have. Try this:
require 'net/http'
uri = 'http://www.ccs.neu.edu/home/vip/teach/Algorithms/7_hash_RBtree_simpleDS/hw_hash_RBtree/alice_in_wonderland.txt'
scanned = Net::HTTP.get_response(URI.parse(uri)).body.scan(/\w/)

Ruby extracting links from html

Hello here is my script:
ARGV.each do |input_filename|
doc = Nokogiri::HTML(File.read(input_filename))
title, body = doc.title.gsub("/\s+/"," ").downcase.strip, doc.xpath('//body').inner_text.tr('"', '').gsub("\n", '').downcase.strip
link = doc.search("a[#href]") //Adding this part generates errors
filename = File.basename(input_filename, ".*")
puts %Q("#{title}", "#{body}", "#{filename}", "#{link}").downcase
end
I am having trouble extracting links from a list of html files. I believe the issue is due to unconventional coding in some of the html files. Here is the error i am getting.
extractor.rb:9:in `block in <main>': incompatible character encodings: UTF-8 and CP850 (Encoding::CompatibilityError)
from extractor.rb:4:in `each'
from extractor.rb:4:in `<main>'
You can go about it a different way using the CSS selector:
doc.css('a').map { |link| link['href'] }
This would search the doc for all anchors and return their href text in an array.
Nokogiri stores Strings always as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings.
You have a conflict UTF-8 and cp850 (you are working with windows?).
You may adapt your File.read(input_filename)
Try
File.read(input_filename, :encoding => 'cp850:utf-8')
If your html-files are windows files.
If your html-files are already utf-8, the try:
File.read(input_filename, :encoding => 'utf-8')
Another solution may be a Encoding.default_external = 'utf-8' at the begin of your code. (I wouldn't recommend it, use it only for small scripts).

Resources