Reading a Gmail Message with ruby-gmail - ruby

I am looking for an instance method from the ruby-gmail gem that would allow me to read either:
the body
or
subject
of a Gmail message.
After reviewing the documentation, found here, I couldn't find anything!?
There is a .message instance method found in the Gmail::Message class section; but it only returns, for lack of a better term, email "mumbo-jumbo," for the body.
My attempt:
#!/usr/local/bin/ruby
require 'gmail'
gmail = Gmail.connect('username', 'password')
emails = gmail.inbox.emails(:from => 'someone#mail.com')
emails.each do |email|
email.read
email.message
end
Now:
email.read does not work
email.message returns that, "mumbo-jumbo," mentioned above
Somebody else asked this question on SO but didn't get an answer.

This probably isn't exactly the answer to your question, but I will tell you what I have done in the past. I tried using the ruby-gmail gem but it didn't do what I wanted it to do in terms of reading a message. Or, at least, I couldn't get it to work. Instead I use the built-in Net::IMAP class to log in and get a message.
require 'net/imap'
imap = Net::IMAP.new('imap.gmail.com',993,true)
imap.login('<username>','<password>')
imap.select('INBOX')
subject_id = search_mail(imap, 'SUBJECT', '<mail_subject>')
subject_message = imap.fetch(subject_id,'RFC822')[0].attr['RFC822']
mail = Mail.read_from_string subject_message
body_message = mail.html_part.body
From here your message is stored in body_message and is HTML. If you want the entire email body you will probably need to learn how to use Nokogiri to parse it. If you just want a small bit of the message where you know some of the surrounding characters you can use a regex to find the part you are interested in.
I did find one page associated with the ruby-gmail gem that talks about using ruby-gmail to read a Gmail message. I made a cursory attempt at testing it tonight but apparently Google upped the security on my account and I couldn't get in using irb without tinkering with my Gmail configuration (according to the warning email I received). So I was unable to verify what is stated on that page, but as I mentioned my past attempts were unfruitful whereas Net::IMAP works for me.
EDIT:
I found this, which is pretty cool. You will need to add in
require 'cgi'
to your class.
I was able to implement it in this way. After I have my body_message, call the html2text method from that linked page (which I modified slightly and included below since you have to convert body_message to a string):
plain_text = html2text(body_message)
puts plain_text #Prints nicely formatted plain text to the terminal
Here is the slightly modified method:
def html2text(html)
text = html.to_s.
gsub(/( |\n|\s)+/im, ' ').squeeze(' ').strip.
gsub(/<([^\s]+)[^>]*(src|href)=\s*(.?)([^>\s]*)\3[^>]*>\4<\/\1>/i,
'\4')
links = []
linkregex = /<[^>]*(src|href)=\s*(.?)([^>\s]*)\2[^>]*>\s*/i
while linkregex.match(text)
links << $~[3]
text.sub!(linkregex, "[#{links.size}]")
end
text = CGI.unescapeHTML(
text.
gsub(/<(script|style)[^>]*>.*<\/\1>/im, '').
gsub(/<!--.*-->/m, '').
gsub(/<hr(| [^>]*)>/i, "___\n").
gsub(/<li(| [^>]*)>/i, "\n* ").
gsub(/<blockquote(| [^>]*)>/i, '> ').
gsub(/<(br)(| [^>]*)>/i, "\n").
gsub(/<(\/h[\d]+|p)(| [^>]*)>/i, "\n\n").
gsub(/<[^>]*>/, '')
).lstrip.gsub(/\n[ ]+/, "\n") + "\n"
for i in (0...links.size).to_a
text = text + "\n [#{i+1}] <#{CGI.unescapeHTML(links[i])}>" unless
links[i].nil?
end
links = nil
text
end
You also mentioned in your original question that you got mumbo-jumbo with this step:
email.message *returns mumbo-jumbo*
If the mumbo-jumbo is HTML, you can probably just use your existing code with this html2text method instead of switching over to Net::IMAP as I had discussed when I posted my original answer.

Nevermind, it's:
email.subject
email.body
silly me
ok, so how do I get the body in "readable" text? without all the encoding stuff and html?

Subject, text body and HTML body:
email.subject
if email.message.multipart?
text_body = email.message.text_part.body.decoded
html_body = email.message.html_part.body.decoded
else
# Only multipart messages contain a HTML body
text_body = email.message.body.decoded
html_body = text
end
Attachments:
email.message.attachments.each do |attachment|
path = "/tmp/#{attachment.filename}"
File.write(path, attachment.decoded)
# The MIME type might be useful
content_type = attachment.mime_type
end

require 'gmail'
gmail = Gmail.connect('username', 'password')
emails = gmail.inbox.emails(:from => 'someone#mail.com')
emails.each do |email|
puts email.subject
puts email.text_part.body.decoded
end

Related

Can I test that a Sinatra post method successfully saves to a YAML store?

I can't find a basic explanation anywhere about how I can test, with Rack::Test, that a Ruby/Sinatra post method successfully saves data to a YAML store/file. (This explains testing get, which I can do(!), but not post; other mentions of testing post methods with rack/test seem irrelevant.) For self-study, I'm building a "to do" app in Ruby/Sinatra and I'm trying to use TDD everything and unit test like a good little boy. A requirement I have is: When a user posts a new task, it is saved in the YML store.
I was thinking of testing this either by seeing if a "Task saved" was shown in the response to the user (which of course isn't directly testing the thing itself...but is something I'd also like to test):
assert last_response.body.include?("Task saved")
or by somehow testing that a test task's description is now in the YML file. I guess I could open up the YML file and look, and then delete it from the YML file, but I'm pretty sure that's not what I'm supposed to do.
I've confirmed post does correctly save to a YML file:
get('/') do |*user_message|
# prepare erb messages
#user_message = session[:message] if session[:message]
#overlong_description = session[:overlong_description] if
session[:overlong_description]
session[:message] = nil # clear message after being used
session[:overlong_description] = nil # ditto
#tasks = store.all
erb :index #, user_message => {:user_message => params[:user_message]}
end
post('/newtask') do
#task = Task.new(store, params)
# decide whether to save & prepare user messages
if #task.complete == true # task is complete!
#task.message << " " + "Task saved!"
session[:message] = #task.message # use session[:message] for user messages
#task.message = ""
store.save(#task)
else
#task.message << " " + "Not saved." # task incomplete
session[:message] = #task.message # use session[:message] for user messages
session[:overlong_description] = #task.overlong_description if
#task.overlong_description
#task.message = ""
#task.overlong_description = nil
end
redirect '/'
end
As you can see, it ends in a redirect...one response I want to test is actually on the slash route, not on the /newtask route.
So of course the test doesn't work:
def test_post_newtask
post('/newtask', params = {"description"=>"Test task 123"})
# Test that "saved" message for user is in returned page
assert last_response.body.include?("Task saved") # boooo
end
Github source here
If you can give me advice on a book (chapter, website, blog, etc.) that goes over this in a way accessible to a relative beginner, I'd be most grateful.
Be gentle...I'm very new to testing (and programming).
Nobody answered my question and, since I have figured out what the answer is, I thought I would share it here.
First of all, I gather that it shouldn't be necessary to check if the data is actually saved to the YAML store; the main thing is to see if the web page returns the correct result (we assume the database is groovy if so).
The test method I wrote above was correct; it was simply missing the single line follow_redirect!. Apparently I didn't realize that I needed to instruct rake/test to follow the redirect.
Part of the problem was that I simply hadn't found the right documentation. This page does give the correct syntax, but doesn't give much detail. This page helped a lot, and this bit covers redirects.
Here's the updated test method:
def test_post_newtask
post "/newtask", params = {"description" => "Write about quick brown foxes",
"categories" => "writing823"}
follow_redirect!
assert last_response.body.include?("Task saved")
assert last_response.body.include?("Write about quick brown foxes")
end
(With thanks to the Columbus Ruby Brigade.)

How to detach an attachment for POP3 using ruby net/pop?

pop = Net::POP3.new mailhost
pop.start mailuser, mailpass
if pop.mails.empty?
puts "Mailbox empty."
else
pop.mails.each do |mail|
if mail.pop.has_attachments?
mail.pop.attachments.each do |attachment|
puts attachment.original_filename
end
end
end
end
gives undefined method 'has_attachments?' for #<String:0xb7cc4f7c>.
Is this example no longer working?
mail.pop returns string representation of email see corresponding docs. If you want to parse it and work with mail object you can do it like this:
email = Mail.new(mail.pop)
I really recommend you to take a look into docs - if you'll have big attachments you can run into memory issues and this thing is explained in docs.

How to get Mechanize to auto-convert body to UTF8?

I found some solutions using post_connect_hook and pre_connect_hook, but it seems like they don't work. I'm using the latest Mechanize version (2.1). There are no [:response] fields in the new version, and I don't know where to get them in the new version.
https://gist.github.com/search?q=pre_connect_hooks
https://gist.github.com/search?q=post_connect_hooks
Is it possible to make Mechanize return a UTF8 encoded version, instead of having to convert it manually using iconv?
Since Mechanize 2.0, arguments of pre_connect_hooks() and post_connect_hooks() were changed.
See the Mechanize documentation:
pre_connect_hooks()
A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
 
post_connect_hooks()
A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
Now you can't change the internal response-body value because an argument is not array. So, the next best way is to replace an internal parser with your own:
class MyParser
def self.parse(thing, url = nil, encoding = nil, options = Nokogiri::XML::ParseOptions::DEFAULT_HTML, &block)
# insert your conversion code here. For example:
# thing = NKF.nkf("-wm0X", thing).sub(/Shift_JIS/,"utf-8") # you need to rewrite content charset if it exists.
Nokogiri::HTML::Document.parse(thing, url, encoding, options, &block)
end
end
agent = Mechanize.new
agent.html_parser = MyParser
page = agent.get('http://somewhere.com/')
...
I found a solution that works pretty well:
class HtmlParser
def self.parse(body, url, encoding)
body.encode!('UTF-8', encoding, invalid: :replace, undef: :replace, replace: '')
Nokogiri::HTML::Document.parse(body, url, 'UTF-8')
end
end
Mechanize.new.tap do |web|
web.html_parser = HtmlParser
end
No issues were found yet.
In your script, just enter: page.encoding = 'utf-8'
However, depending on your scenario, you may alternatively need to enter the reverse (the encoding of the website Mechanize is working with) instead. For that, open Firefox, open the website you want Mechanize to work with, select Tools in the menubar, and then open Page Info. Determine what the page is encoded in from there.
Using that info, you would instead enter what the page is encoded in (such as page.encoding = 'windows-1252').
How about something like this:
class Mechanize
alias_method :original_get, :get
def get *args
doc = original_get *args
doc.encoding = 'utf-8'
doc
end
end

Mail gem. Extract recipient display name and address as separate values

Using the Mail gem (i.e. Rails + ActionMailer), is there a clean way to get the display name of the recipient?
I can get the address with:
mail.to.first
And I can get the formatted display name + address with:
mail.header_fields.select{ |f| f.name == "To" }.first.to_s
But how can I get just the display name part (i.e. before the < and >). I know somebody is going to suggest a Regex, but that's not what I'm looking for, since I'd then have to parse out any encoding, which is something the Mail gem probably already does. I'm the author of a popular Mailer library in PHP and am aware of the pitfalls of just assuming the bit before < and > is human-readable, in the headers, when 8-bit characters come into play.
I can do this:
mail.header_fields.select{ |f| f.name == "To" }.first.parse.individual_recipients.first.display_name.text_value
But there must be a better way? :)
Figured it out, sorry. For anyone else who hits this thread looking for the solution:
mail[:to].display_names.first
The gotcha is that bracket access and dotted access are different for this gem.
From the doc:
mail = Mail.new
mail.to = 'Mikel Lindsaar <mikel#test.lindsaar.net>, ada#test.lindsaar.net'
mail.to #=> ['mikel#test.lindsaar.net', 'ada#test.lindsaar.net']
mail[:to] #=> '#<Mail::Field:0x180e5e8 #field=#<Mail::ToField:0x180e1c4
mail['to'] #=> '#<Mail::Field:0x180e5e8 #field=#<Mail::ToField:0x180e1c4
mail['To'] #=> '#<Mail::Field:0x180e5e8 #field=#<Mail::ToField:0x180e1c4
mail[:to].encoded #=> 'To: Mikel Lindsaar <mikel#test.lindsaar.net>, ada#test.lindsaar.net\r\n'
mail[:to].decoded #=> 'Mikel Lindsaar <mikel#test.lindsaar.net>, ada#test.lindsaar.net'
mail[:to].addresses #=> ['mikel#test.lindsaar.net', 'ada#test.lindsaar.net']
mail[:to].formatted #=> ['Mikel Lindsaar <mikel#test.lindsaar.net>', 'ada#test.lindsaar.net']
So to get the display name, you can use #display_name
mail[:to].addrs.first.display_name #=> Mikel Lindsaar
Use #address to get the email address
mail[:from].addrs.first.address #=> mikel#test.lindsaar.net

How to visit a URL with Ruby via http and read the output?

So far I have been able to stitch this together :)
begin
open("http://www.somemain.com/" + path + "/" + blah)
rescue OpenURI::HTTPError
#failure += painting.permalink
else
#success += painting.permalink
end
But how do I read the output of the service that I would be calling?
Open-URI extends open, so you'll get a type of IO stream returned:
open('http://www.example.com') #=> #<StringIO:0x00000100977420>
You have to read that to get content:
open('http://www.example.com').read[0 .. 10] #=> "<!DOCTYPE h"
A lot of times a method will let you pass different types as a parameter. They check to see what it is and either use the contents directly, in the case of a string, or read the handle if it's a stream.
For HTML and XML, such as RSS feeds, we'll typically pass the handle to a parser and let it grab the content, parse it, and return an object suitable for searching further:
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.example.com'))
doc.class #=> Nokogiri::HTML::Document
doc.to_html[0 .. 10] #=> "<!DOCTYPE h"
doc.at('h1').text #=> "Example Domains"
doc = open("http://etc..")
content = doc.read
More often people want to be able to parse the returned document, for this use something like hpricot or nokogiri
I'm not sure if you want to do this yourself for the hell of it or not but if you don't.. Mecanize is a really nice gem for doing this.
It will visit the page you want and automatically wrap the page with nokogiri so that you can access it's elements with css selectors such as "div#header h1". Ryan Bates has a video tutorial on it which will teach you everything you need to know to use it.
Basically you can just
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.get("http://www.google.com")
agent.page.at("some css selector").text
It's that simple.

Resources