I'm trying to use Mail::Address to parse an email address, however the output is not as expected:
Mail::Address.new('Arnold, Roa <aroa#so.com>').address
=> "Arnold"
What is the problem and what alternatives do I have?
This works, not sure why the comma is there:
Mail::Address.new('Arnold, Roa <aroa#so.com>'.gsub(',','')).address
I've created an issue on the github project: https://github.com/mikel/mail/issues/1219
In the meanwhile I created this monkey patch (which is not a good practice and should be avoided):
class Mail::Address
class << self
def new(value = nil)
if value.is_a? String
value = value.gsub(',', ' ')
end
super(value)
end
end
end
I'm using this code to list email addresses from a HTML page.
require 'nokogiri'
selector = "//a[starts-with(#href, \"mailto:\")]/#href"
doc = Nokogiri::HTML.parse File.read 'in.rb'
nodes = doc.xpath selector
addresses = nodes.collect {|n| n.value[7..-1]}
puts addresses
This is sample code I'm parsing:
<a href="mailto:joe#example.com?subject=My Business Is Dying">
But I'm getting more than just the email address. I'm getting this in my results:
joe#example.com?subject=My Business Is Dying
How do I drop off everything after the question mark so it's only the email address?
You could always chop off anything after the ? character:
addresses.map! do |address|
address.sub(/\?.*/, '')
end
I'd probably use one of these two:
str = 'joe#example.com?subject=My Business Is Dying'
str.split('?').first # => "joe#example.com"
str[/^[^?]+/] # => "joe#example.com"
The second is a simple regular expression embedded in String's [] (slice) method. The pattern basically says "start at the beginning and grab everything up until a question mark."
They're equivalent as far as speed goes. I'd probably use the first because it's easier to read.
I am looking for an instance method from the ruby-gmail gem that would allow me to read either:
the body
or
subject
of a Gmail message.
After reviewing the documentation, found here, I couldn't find anything!?
There is a .message instance method found in the Gmail::Message class section; but it only returns, for lack of a better term, email "mumbo-jumbo," for the body.
My attempt:
#!/usr/local/bin/ruby
require 'gmail'
gmail = Gmail.connect('username', 'password')
emails = gmail.inbox.emails(:from => 'someone#mail.com')
emails.each do |email|
email.read
email.message
end
Now:
email.read does not work
email.message returns that, "mumbo-jumbo," mentioned above
Somebody else asked this question on SO but didn't get an answer.
This probably isn't exactly the answer to your question, but I will tell you what I have done in the past. I tried using the ruby-gmail gem but it didn't do what I wanted it to do in terms of reading a message. Or, at least, I couldn't get it to work. Instead I use the built-in Net::IMAP class to log in and get a message.
require 'net/imap'
imap = Net::IMAP.new('imap.gmail.com',993,true)
imap.login('<username>','<password>')
imap.select('INBOX')
subject_id = search_mail(imap, 'SUBJECT', '<mail_subject>')
subject_message = imap.fetch(subject_id,'RFC822')[0].attr['RFC822']
mail = Mail.read_from_string subject_message
body_message = mail.html_part.body
From here your message is stored in body_message and is HTML. If you want the entire email body you will probably need to learn how to use Nokogiri to parse it. If you just want a small bit of the message where you know some of the surrounding characters you can use a regex to find the part you are interested in.
I did find one page associated with the ruby-gmail gem that talks about using ruby-gmail to read a Gmail message. I made a cursory attempt at testing it tonight but apparently Google upped the security on my account and I couldn't get in using irb without tinkering with my Gmail configuration (according to the warning email I received). So I was unable to verify what is stated on that page, but as I mentioned my past attempts were unfruitful whereas Net::IMAP works for me.
EDIT:
I found this, which is pretty cool. You will need to add in
require 'cgi'
to your class.
I was able to implement it in this way. After I have my body_message, call the html2text method from that linked page (which I modified slightly and included below since you have to convert body_message to a string):
plain_text = html2text(body_message)
puts plain_text #Prints nicely formatted plain text to the terminal
Here is the slightly modified method:
def html2text(html)
text = html.to_s.
gsub(/( |\n|\s)+/im, ' ').squeeze(' ').strip.
gsub(/<([^\s]+)[^>]*(src|href)=\s*(.?)([^>\s]*)\3[^>]*>\4<\/\1>/i,
'\4')
links = []
linkregex = /<[^>]*(src|href)=\s*(.?)([^>\s]*)\2[^>]*>\s*/i
while linkregex.match(text)
links << $~[3]
text.sub!(linkregex, "[#{links.size}]")
end
text = CGI.unescapeHTML(
text.
gsub(/<(script|style)[^>]*>.*<\/\1>/im, '').
gsub(/<!--.*-->/m, '').
gsub(/<hr(| [^>]*)>/i, "___\n").
gsub(/<li(| [^>]*)>/i, "\n* ").
gsub(/<blockquote(| [^>]*)>/i, '> ').
gsub(/<(br)(| [^>]*)>/i, "\n").
gsub(/<(\/h[\d]+|p)(| [^>]*)>/i, "\n\n").
gsub(/<[^>]*>/, '')
).lstrip.gsub(/\n[ ]+/, "\n") + "\n"
for i in (0...links.size).to_a
text = text + "\n [#{i+1}] <#{CGI.unescapeHTML(links[i])}>" unless
links[i].nil?
end
links = nil
text
end
You also mentioned in your original question that you got mumbo-jumbo with this step:
email.message *returns mumbo-jumbo*
If the mumbo-jumbo is HTML, you can probably just use your existing code with this html2text method instead of switching over to Net::IMAP as I had discussed when I posted my original answer.
Nevermind, it's:
email.subject
email.body
silly me
ok, so how do I get the body in "readable" text? without all the encoding stuff and html?
Subject, text body and HTML body:
email.subject
if email.message.multipart?
text_body = email.message.text_part.body.decoded
html_body = email.message.html_part.body.decoded
else
# Only multipart messages contain a HTML body
text_body = email.message.body.decoded
html_body = text
end
Attachments:
email.message.attachments.each do |attachment|
path = "/tmp/#{attachment.filename}"
File.write(path, attachment.decoded)
# The MIME type might be useful
content_type = attachment.mime_type
end
require 'gmail'
gmail = Gmail.connect('username', 'password')
emails = gmail.inbox.emails(:from => 'someone#mail.com')
emails.each do |email|
puts email.subject
puts email.text_part.body.decoded
end
pop = Net::POP3.new mailhost
pop.start mailuser, mailpass
if pop.mails.empty?
puts "Mailbox empty."
else
pop.mails.each do |mail|
if mail.pop.has_attachments?
mail.pop.attachments.each do |attachment|
puts attachment.original_filename
end
end
end
end
gives undefined method 'has_attachments?' for #<String:0xb7cc4f7c>.
Is this example no longer working?
mail.pop returns string representation of email see corresponding docs. If you want to parse it and work with mail object you can do it like this:
email = Mail.new(mail.pop)
I really recommend you to take a look into docs - if you'll have big attachments you can run into memory issues and this thing is explained in docs.
The idea of having a single parser for any kind of feed is great and was hoping that it would work for me.
I have been trying to get feedzirra to parse atom feeds.
specifically:
http://pindancing.blogspot.com/feeds/posts/default
http://adam.heroku.com/feed
Those are just 2 that I tried with the problem is that feedzirra cannot parse the
entry URL. It always comes out nil
feed = Feedzirra::Feed.fetch_and_parse(search.rss_feed_url)
p feed.entries.first.title
p feed.entries.first.url #=> returns nil
Is there anything I need to do to get it working?
thanks for your help
Hate to say "works for me", but, well, works for me:
require 'Feedzirra'
urls = %w{
http://adam.heroku.com/feed
http://pindancing.blogspot.com/feeds/posts/default
}
urls.each do |url|
feed = Feedzirra::Feed.fetch_and_parse(url)
puts feed.entries.first.title
puts feed.entries.first.url
end
# => Memcached, a Database?
# => http://adam.heroku.com/past/2010/7/19/memcached_a_database/
# => The answer to "Will you mentor me?" is
# => http://pindancing.blogspot.com/2010/12/answer-to-will-you-mentor-me-is.html
It'd help to see the rest of your code, particularly the actual parameter you're using in the fetch_and_parse method.