rails omniauth and UTF-8 errors - ruby

I had a recent error using omniauth trying to populate some fields from Google's login
Encoding::CompatibilityError: incompatible character encodings:
ASCII-8BIT and UTF-8
"omniauth"=>
{"user_info"=>
{"name"=>"Joe McÙisnean",
"last_name"=>"McÙisnean",
"first_name"=>"Joe",
"email"=>"someemail#gmail.com"},
"uid"=>
"https://www.google.com/accounts/o8/id?id=AItOawnQmfdfsdfsdfdsfsdhGWmuLTiX2Id40k",
"provider"=>"google_apps"}
In my user model
def apply_omniauth(omniauth)
#add some info about the user
self.email = omniauth['user_info']['email'] if email.blank?
self.name = omniauth['user_info']['name'] if name.blank?
self.name = omniauth['user_info'][:name] if name.blank?
self.nickname = omniauth['user_info']['nickname'] if nickname.blank?
self.nickname = name.gsub(' ','').downcase if nickname.blank?
unless omniauth['credentials'].blank?
user_tokens.build(:provider => omniauth['provider'],
:uid => omniauth['uid'],
:token => omniauth['credentials']['token'],
:secret => omniauth['credentials']['secret'])
else
user_tokens.build(:provider => omniauth['provider'], :uid => omniauth['uid'])
end
end
I'm not hugely knowledgeable about UTF encoding, so I'm not sure where I should be specifying the encoding? But I'm guessing it's here before it get's put into the user model and created, I'm unsure what to do about it?
UPDATE:
Rails 3.0.10
Omniauth 0.2.6
Ruby 1.9.2
PG 0.11.0
Default encoding is UTF-8
That didn't seem to be it, so I dug further and found this in the view:
Showing /Users/holden/Code/someapp/app/views/users/registrations/_signup.html.erb where line #5 raised:
incompatible character encodings: ASCII-8BIT and UTF-8
Extracted source (around line #5):
2: <%= f.error_messages %>
3:
4: <%= f.input :name, :hint => 'your real name' %>
5: <%= f.input :nickname, :hint => 'Username of your choosing' %>
6:
7: <% unless #user.errors[:email].present? or #user.email %>
8: <%= f.input :email, :as => :hidden %>
UPDATE UPDATE:
It seems to be the omniauth gem which is returns the ASCII-8BIT chars, so my next question is how can I parse the hash and convert it back into UTF8 so my app doesn't explode?
session[:omniauth] = omniauth.to_utf8
Another part to this crazy ride is when I type this into the console
d={"user_info"=>{"email"=>"someemail#gmail.com", "first_name"=>"Joe", "last_name"=>"Mc\xC3\x99isnean", "name"=>"Joe Mc\xC3\x99isnean"}}
It automatically converts it to UTF-8, but it explodes when shoved into a session
=> {"user_info"=>{"email"=>"someemail#gmail.com", "first_name"=>"Joe", "last_name"=>"McÙisnean", "name"=>"Joe McÙisnean"}}
This is a painful nightmare if there ever was one.

Omniauth proved to be the problem producing the ASCII-8BIT
I ended up forcing the Omniauth hash into submission using:
omniauth_controller.rb
session[:omniauth] = omniauth.to_utf8
added recursive method to force convert the rogue ASCII-8BIT to UTF8
some_initializer.rb
class Hash
def to_utf8
Hash[
self.collect do |k, v|
if (v.respond_to?(:to_utf8))
[ k, v.to_utf8 ]
elsif (v.respond_to?(:encoding))
[ k, v.dup.force_encode('UTF-8') ]
else
[ k, v ]
end
end
]
end
end
Special thanks to tadman
recursively convert hash containing non-UTF chars to UTF

Related

Encoding::UndefinedConversionError "\xC2" from ASCII-8BIT to UTF-8 with redcarpet

I'm using redcarpet gem to render some markdown text to html, a portion of the markdown was user inserted, and they typed in a totally valid special character (£), but now when rendering it I get a: Encoding::UndefinedConversionError "\xC2" from ASCII-8BIT to UTF-8
I know it's the £ sign because if I replace it in the text to render then it all works. but they might be inserting other special characters.
I'm not sure how to deal with this, here's my code building the html:
def generate_document
temp_file_service = TempFileService.new
path = temp_file_service.path
template_url = TenantConfig.get('DEPOSIT_GUIDE_TEMPLATE') || DEFAULT_DOC
template = open(template_url, 'rb', &:read)
html = ERB.new(template).result(binding)
File.open( path, 'w') do |f|
f.write html
end
File.new(path, 'r')
end
the error is risen on the f.write line
here's my html.erb:
<%= markdown(clause.text) %>
and here's the helper:
def markdown(text)
Redcarpet::Markdown.new(Redcarpet::Render::HTML).render(text)
end
Note that the encoding problem happens only when saving the html to a file, somewhere else I correctly use the same markdown helper to render the text to the browser, and no problems there.
It would work also the other way, cleaning the markdown code before saving it to DB and replacing any special characters with the corresponding html code (ex. £ becomes £)
I tried having a before_save callback (as suggested here: Encoding::UndefinedConversionError: "\xC2" from ASCII-8BIT to UTF-8) :
before_save :convert_text
private
def convert_text
self.text = self.text.force_encoding("utf-8")
end
which didn't work
I also tried (as recommended here: Using ERB in Markdown with Redcarpet):
<%= markdown(extra_clause.text).html_safe %>
which didn't work either.
How would I fix either way?
in the end I solved this with adding force_encoding("UFT-8") to the html
like this:
f.write html.force_encoding("UTF-8")
it fixed it.

incompatible character encodings: UTF-8 and ASCII-8BIT in render action

ActionView::Template::Error (incompatible character encodings: UTF-8
and ASCII-8BIT): app/controllers/posts_controller.rb:27:in `new'
# GET /posts/new
def new
if params[:post]
#post = Post.new(post_params).dup
if #post.valid?
render :action => "confirm"
else
format.html { render action: 'new' }
format.json { render json: #post.errors, status: :unprocessable_entity }
end
else
#post = Post.new
#document = Document.new
#documents = #post.documents.all
#document = #post.documents.build
end
I don't know why it is happening.
Make sure config.encoding = "utf-8" is there in application.rb file.
Make sure you are using 'mysql2' gem instead mysql gem
Putting # encoding: utf-8 on top of rake file.
Above Rails.application.initialize! line in environment.rb file, add following two lines:
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
solution from here: http://rorguide.blogspot.in/2011/06/incompatible-character-encodings-ascii.html
If above solution not helped then I think you either copy/pasted a part of your Haml template into the file, or you're working with a non-Unicode/non-UTF-8 friendly editor.
If you can recreate that file from the scratch in a UTF-8 friendly editor. There are plenty for any platform and see whether this fixes your problem.
Sometimes you may get this error:
incompatible character encodings: ASCII-8BIT and UTF-8
That typically happens because you are trying to concatenate two strings, and one contains characters that do not map to the character-set of the other string. There are characters in ISO-8859-1 that do not have equivalents in UTF-8, and vice-versa and how to handle string joining with those incompatibilities requires the programmer to step in.
I was upgrading my rails and spree and the error was actually coming from cache
Deleting the cache solved the problem for me
rm -rf tmp/cache

UndefinedConversionError trying to parse Arabic from email body

using mail for ruby I am getting this message:
mail.rb:22:in `encode': "\xC7" from ASCII-8BIT to UTF-8 (Encoding::UndefinedConversionError)
from mail.rb:22:in `<main>'
If I remove encode I get a message ruby
/var/lib/gems/1.9.1/gems/bson-1.7.0/lib/bson/bson_ruby.rb:63:in `rescue in to_utf8_binary': String not valid utf-8: "<div dir=\"ltr\"><div class=\"gmail_quote\">l<br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div class=\"gmail_quote\"><br><br><br><div dir=\"ltr\"><div dir=\"rtl\">\xC7\xE1\xE4\xD5 \xC8\xC7\xE1\xE1\xDB\xC9 \xC7\xE1\xDA\xD1\xC8\xED\xC9</div></div>\r\n</div><br></div>\r\n</div><br></div>\r\n</div><br></div>" (BSON::InvalidStringEncoding)
This is my code:
require 'mail'
require 'mongo'
connection = Mongo::Connection.new
db = connection.db("DB")
db = Mongo::Connection.new.db("DB")
newsCollection = db["news"]
Mail.defaults do
retriever_method :pop3, :address => "pop.gmail.com",
:port => 995,
:user_name => 'my_username',
:password => '*****',
:enable_ssl => true
end
emails = Mail.last
#Checks if email is multipart and decods accordingly. Put to extract UTF8 from body
plain_part = emails.multipart? ? (emails.text_part ? emails.text_part.body.decoded : nil) : emails.body.decoded
html_part = emails.html_part ? emails.html_part.body.decoded : nil
mongoMessage = {"date" => emails.date.to_s , "subject" => emails.subject , "body" => plain_part.encode('UTF-8') }
msgID = newsCollection.insert(mongoMessage) #add the document to the database and returns it's ID
puts msgID
For English and Hebrew it works perfectly but it seems gmail is sending arabic with different encoding. Replacing UTF-8 with ASCII-8BIT gives a similar error.
I get the same result when using plain_part for plain email messages. I am handling emails from one specific source so I can put html_part with confidence it's not causing the error.
To make it extra weird Subject in Arabic is rendered perfectly.
What encoding should I use?
If you use encode without options, it will raise this error, if you're string pretends to be an encoding but contains characters from another encoding.
try it in this way:
plain_part.encode('UTF-8', {:invalid => :replace, :undef => :replace, :replace => '?'})
this replaces invalid and undefined chars for the given encoding with an "?"(more info). If this is not sufficent for your needs, you need to find a way to check if your plain_part string is valid.
For example you can use valid_encoding?(more info) for this.
I recently stumbled across a similar problem, where I couldn't be sure what encoding it really is, so I wrote this (maybe a little humble) method. May it helps you, to find a way to fix your problem.
def self.encode!(str)
return nil if str.nil?
known_encodings = %w(
UTF-8
ISO-8859-1
)
begin
str.encode(Encoding.find('UTF-8'))
rescue Encoding::UndefinedConversionError
fixed_str = ""
known_encodings.each do |encoding|
fixed_str = str
if fixed_str.force_encoding(encoding).valid_encoding?
return fixed_str.encode(Encoding.find('UTF-8'))
end
end
return str.encode(Encoding.find('UTF-8'), {:invalid => :replace, :undef => :replace, :replace => '?'})
end
end
I found a work around.
Since only specific emails will be sent to this account to just to use on this application I have full control over formatting. For some reason mail decodes text/plain attachment perfectly
so:
emails.attachments.each do | attachment |
if (attachment.content_type.start_with?('text/plain'))
# extracting txt file
begin
body = attachment.body.decoded
rescue Exception => e
puts "Unable to save data for #{filename} because #{e.message}"
end
end
end
mongoMessage = {"date" => emails.date.to_s , "subject" => emails.subject , "body" => body }

net ldap - Encoding::UndefinedConversionError

folks! I get the following error message and I have no idea what to do. Is this a already known net-ldap bug? I tried to update my gems and I already looked for further informations in the internet. The first part is ok, I get ally my data from my ldap database but this error occurs in the end.
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ber/core_ext/string.rb:23:in
encode': "\x8E" from ASCII-8BIT to UTF-8
(Encoding::UndefinedConversionError) from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ber/core_ext/string.rb:23:in
raw_utf8_encoded' from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ber/core_ext/string.rb:15:in
to_ber' from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ldap.rb:1396:in
block in search' from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ldap.rb:1367:in
loop' from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ldap.rb:1367:in
search' from
/usr/local/lib/ruby/gems/1.9.1/gems/net-ldap-0.3.1/lib/net/ldap.rb:637:in
`search'
and here my code:
require 'rubygems'
require 'net/ldap'
ldap = Net::LDAP.new
ldap.host = 'xxxxxx'
ldap.authenticate "cn=admin, dc=xxxx, dc=xxxxx, dc=de", "xxxxx!"
#puts ldap.bind
if ldap.bind
# authentication succeeded
else
# authentication failed
# p ldap.get_operation_result
end
filter = Net::LDAP::Filter.eq("uid", "*")
treebase = "xxxxx, dc=xxxxxx, dc=de"
ldap.search(:base => treebase, :filter => filter) do |entry|
puts "DN: #{entry.dn}"
entry.each do |attribute, values|
puts " #{attribute}:"
values.each do |value|
puts " --->#{value}"
end
end
end
There are many encoding issues in v0.3.1 of net-ldap [1],[2],[3],[4]. Several patches are already merged, but sadly, this great project seems semi-abandoned and the changes aren't pushed out to rubygems. Using it directly from github has been working well for me, and if you're using bundler, it is as easy sticking something like this in your Gemfile:
gem "net-ldap", :git => "git://github.com/ruby-ldap/ruby-net-ldap.git", :ref => '8a182675f4'
1 - https://github.com/ruby-ldap/ruby-net-ldap/pull/41
2 - https://github.com/ruby-ldap/ruby-net-ldap/pull/44
3 - https://github.com/ruby-ldap/ruby-net-ldap/pull/64
4 - https://github.com/ruby-ldap/ruby-net-ldap/pull/55

Ruby on Rails - Truncate to a specific string

Clarification: The creator of the post should be able to decide when the truncation should happen.
I implemented a Wordpress like [---MORE---] functionality in my blog with following helper function:
# application_helper.rb
def more_split(content)
split = content.split("[---MORE---]")
split.first
end
def remove_more_tag(content)
content.sub(“[---MORE---]", '')
end
In the index view the post body will display everything up to (but without) the [---MORE---] tag.
# index.html.erb
<%= raw more_split(post.rendered_body) %>
And in the show view everything from the post body will be displayed except the [---MORE---] tag.
# show.html.erb
<%=raw remove_more_tag(#post.rendered_body) %>
This solution currently works for me without any problems.
Since I am still a beginner in programming I am constantly wondering if there is a more elegant way to accomplish this.
How would you do this?
Thanks for your time.
This is the updated version:
# index.html.erb
<%=raw truncate(post.rendered_body,
:length => 0,
:separator => '[---MORE---]',
:omission => link_to( "Continued...",post)) %>
...and in the show view:
# show.html.erb
<%=raw (#post.rendered_body).gsub("[---MORE---]", '') %>
I would use just simply truncate, it has all of the options you need.
truncate("And they found that many people were sleeping better.", :length => 25, :omission => '... (continued)')
# => "And they f... (continued)"
Update
After sawing the comments, and digging a bit the documentation it seems that the :separator does the work.
From the doc:
Pass a :separator to truncate text at a natural break.
For referenece see the docs
truncate(post.rendered_body, :separator => '[---MORE---]')
On the show page you have to use gsub
You could use a helper function on the index page that only grabs the first X characters in your string. So, it would look more like:
<%= raw summarize(post.rendered_body, 250) %>
to get the first 250 characters in your post. So, then you don't have to deal w/ splitting on the [---MORE---] string. And, on the show page for your post, you won't need to do anything at all... just render the post.body.
Here's an example summarize helper (that you would put in application_helper.rb):
def summarize(body, length)
return simple_format(truncate(body.gsub(/<\/?.*?>/, ""), :length => length)).gsub(/<\/?.*?>/, "")
end
I tried and found this one is the best and easiest
def summarize(body, length)
return simple_format = body[0..length]+'...'
end
s = summarize("to get the first n characters in your post. So, then you don't have to deal w/ splitting on the [---MORE---] post.body.",20)
ruby-1.9.2-p290 :017 > s
=> "to get the first n ..."

Resources