UTF-8 Encoding Character set - ruby

I'm working on an e-mail app for fun and practice in Ruby and one of the mails has this subject:
=?UTF-8?B?4p22IEFuZHJvaWQgc3RpY2sgbWsgODA5aXYgKyB1c2IyZXRoZXJuZXQgYWRh?=\r\n
=?UTF-8?B?cHRlciAtNDYlIOKdtyBKb3NlcGggSm9zZXBoIGtldWtlbmNhcnJvdXNlbCAt?=\r\n
=?UTF-8?B?NTUlIOKduCA0IENlcnJ1dGkgYm94ZXJzaG9ydHMgLTcxJSDinbkgQXJub3Zh?=\r\n
=?UTF-8?B?IDkwIEc0IHRhYmxldCAtNDIl?=
I found out I look at a Base64 string and the parts between =?UTF-8?B? and ?= need to be decoded from Base64 to UTF-8.
Can someone explain how I need to decode a string like this in Ruby?

Try the Base64 module of ruby-1.9 stdlib, see example:
require "base64"
enc = Base64.encode64('Send reinforcements')
# -> "U2VuZCByZWluZm9yY2VtZW50cw==\n"
plain = Base64.decode64(enc)
# -> "Send reinforcements"
Since the =?UTF-8?B? is set the proper codepage or encoding, in which the original string was coded, it is required to be present in email messages. I believe strings without the defined codepage are defaulted to utf-8

Related

Print an UTF8-encoded smiley

I am writing an ReactionRoles-Discord-Bot in Python (discord.py).
This Bot saves the ReactionRoles-Smileys as UFT8-Encoded.
The type of the encoded is bytes but it's converted to str to save it.
The string looks something like "b'\\xf0\\x9f\\x98\\x82'".
I am using EMOJI_ENCODED = str(EMOJI.encode('utf8')) to encode it, but bytes(EMOJI_ENCODED).decode('utf8') isn't working.
Do you know how to decode it or how to save it in a better way?
The output of str() is a Unicode string. EMOJI is a Unicode string. str(EMOJI.encode('utf8')) just makes a mangled Unicode string.
The purpose of encoding is to make a byte string that can be saved to a file/database/socket. Simply do b = EMOJI.encode() (default is UTF-8) to get a byte string and s = b.decode() to get the Unicode string back.

Ruby: How to decode strings which are partially encoded or fully encoded?

I am getting encoded strings while parsing text files. I have no idea on how to decode them to english or it's original language.
"info#cloudag.com"
is the encoded string and needs to have it decoded.
I want to decode using Ruby.
Here is a link for your reference and I am expecting the same.
This looks like HTML encoding, not URL encoding.
require 'cgi'
CGI.unescapeHTML("info#cloudag.com")
#=> "info#cloudag.com"

How to decode a string in Ruby

I am working with the Mandrill Inbound Email API, and when an email has an attachment with one or more spaces in its file name, then the file name is encoded in a format that I do not know how to decode.
Here is a an example string I receive for the file name: =?UTF-8?B?TWlzc2lvbmFyecKgRmFpdGjCoFByb21pc2XCoGFuZMKgQ2FzaMKgUmVjZWlwdHPCoFlURMKgMjUzNQ==?= =?UTF-8?B?OTnCoEp1bHktMjAxNS5jc3Y=?=
I tried Base64.decode64(#{encoded_value}) but that didn't return a readable text.
How do I decode that value into a readable string?
This is MIME encoded-word syntax as defined in RFC-2822. From Wikipedia:
The form is: "=?charset?encoding?encoded text?=".
charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.
encoded text is the Q-encoded or base64-encoded text.
Fortunately you don't need to write a decoder for this. The Mail gem comes with a Mail::Encodings.value_decode method that works perfectly and is very well-tested:
subject = "=?UTF-8?B?TWlzc2lvbmFyecKgRmFpdGjCoFByb21pc2XCoGFuZMKgQ2FzaMKgUmVjZWlwdHPCoFlURMKgMjUzNQ==?= =?UTF-8?B?OTnCoEp1bHktMjAxNS5jc3Y=?="
Mail::Encodings.value_decode(subject)
# => "Missionary Faith Promise and Cash Receipts YTD 253599 July-2015.csv"
It gracefully handles lots of edge cases you probably wouldn't think of (until your app tries to handle them and falls over):
subject = "Re:[=?iso-2022-jp?B?GyRCJTAlayE8JV0lcyEmJTglYyVRJXMzdDwwMnEbKEI=?=\n =?iso-2022-jp?B?GyRCPFIbKEI=?=] =?iso-2022-jp?B?GyRCSlY/LiEnGyhC?=\n =?iso-2022-jp?B?GyRCIVolMCVrITwlXSVzIVskKkxkJCQ5ZyRvJDsbKEI=?=\n =?iso-2022-jp?B?GyRCJE43byRLJEQkJCRGIUolaiUvJSglOSVIGyhC?=#1056273\n =?iso-2022-jp?B?GyRCIUsbKEI=?="
Mail::Encodings.value_decode(subject)
# => "Re:[グルーポン・ジャパン株式会社] 返信:【グルーポン】お問い合わせの件について(リクエスト#1056273\n )"
If you're using Rails you already have the Mail gem. Otherwise just add gem "mail" to your Gemfile, then bundle install and, in your script, require "mail".
Thanks to the comment from #Yevgeniy-Anfilofyev who pointed me in the right direction, I was able to write the following method that correctly parsed the encoded value and returned an ASCII string.
def self.decode(value)
# It turns out the value is made up of multiple encoded parts
# so we first need to split each part so we can decode them seperately
encoded_parts = name.split('=?UTF-8?B?').
map{|x| x.sub(/\?.*$/, '') }.
delete_if{|x| x.blank? }
encoded_parts.map{|x| Base64.decode64(x)}. # decode each part
join(''). # join the parts together
force_encoding('utf-8'). # force UTF-8 encoding
gsub("\xC2\xA0", " ") # remove the UTF-8 encoded spaces with an ASCII space
end

How do I get a UTF-8 string out of an MD5 digest?

I am trying to use an API that requires an MD5 hash to be sent in UTF-8 format.
Problem is, I can't find any way to actually make that happen.
require 'digest/md5'
api_sig = Digest::MD5.digest "api_key=blahblahblah"
puts api_sig
>> Decode error: not UTF-8
So I try force_encoding(Encoding::UTF_8). Same error. inspect, to_s, nothing gives me what I want.
How can I get a UTF-8 string representing an MD5 digest of another string?
Call Digest::MD5.hexdigest "api_key=blahblahblah"
The documentation of this is very poor, but you can find a lackluster explanation here: http://www.ruby-doc.org/stdlib-2.0/libdoc/digest/rdoc/Digest/Class.html#method-c-hexdigest

How to decode subject fetched via Net::IMAP which in UTF8? (ruby)

I'm using Net::IMAP.fetch to fetch some messages from Gmail. However, when I fetch a message which has a UTF8 subject (i.e., in cyrillic) I get something like this:
=?UTF-8?B?0KHRgNC/0YHQutC4INGE0L7RgNGD0Lwg0YLRgNCw?= =?UTF-8?B?0LbQuCDQuNC30LHQvtGA0L3QuCDQvNCw0YLQtdGA0Lg=?= =?UTF-8?B?0ZjQsNC7INC4INC90LAg0ZvQuNGA0LjQu9C40YY=?= =?UTF-8?B?0LggLSBjaXJpbGFjZSB0ZXN0?=
How can I convert the above string into UTF8?
NOTE: this is for ruby 1.8.7
The answer is:
Mail::Encodings.unquote_and_convert_to( string, 'utf-8' )
The point is that encoding of email subjects is "QUOTED-PRINTABLE" encoding (by default for Gmail).

Resources