AppleScript - extract an email message as a MIME object - macos

I have an AppleScript I wrote to do some parsing on Mail.app messages, however it seems I will need more powerful processing (specifically - separate a replied message from the original message it quotes) than what is provided by AppleScript (say, using Python's email package). Is it possible to get an email message as a MIME string?

I'm not sure if this is what you meant, but here is how you can get the raw message text from a selection you've made in Mail.app which can than be processed with MIME tools to extract all the parts.
tell application "Mail"
set msgs to selection
if length of msgs is not 0 then
repeat with msg in msgs
set messageSource to source of msg
set textFile to "/Users/harley/Desktop/foo.txt"
set myFile to open for access textFile with write permission
write messageSource to myFile
close access myFile
end repeat
end if
end tell
And then here's a Python email example script that unpacks the message and writes out each MIME part to a separate file in a directory
https://docs.python.org/3.4/library/email-examples.html
#!/usr/bin/env python3
"""Unpack a MIME message into a directory of files."""
import os
import sys
import email
import errno
import mimetypes
from argparse import ArgumentParser
def main():
parser = ArgumentParser(description="""\
Unpack a MIME message into a directory of files.
""")
parser.add_argument('-d', '--directory', required=True,
help="""Unpack the MIME message into the named
directory, which will be created if it doesn't already
exist.""")
parser.add_argument('msgfile')
args = parser.parse_args()
with open(args.msgfile) as fp:
msg = email.message_from_file(fp)
try:
os.mkdir(args.directory)
except FileExistsError:
pass
counter = 1
for part in msg.walk():
# multipart/* are just containers
if part.get_content_maintype() == 'multipart':
continue
# Applications should really sanitize the given filename so that an
# email message can't be used to overwrite important files
filename = part.get_filename()
if not filename:
ext = mimetypes.guess_extension(part.get_content_type())
if not ext:
# Use a generic bag-of-bits extension
ext = '.bin'
filename = 'part-%03d%s' % (counter, ext)
counter += 1
with open(os.path.join(args.directory, filename), 'wb') as fp:
fp.write(part.get_payload(decode=True))
if __name__ == '__main__':
main()
So then if the unpack.py script is run on the AppleScript output...
python unpack.py -d OUTPUT ./foo.txt
You get a directory with the MIME parts separated. When I run this on a message which quotes an original message then the original message shows up in a separate part.

Related

Python: Opening auto-generated file

As part of my larger program, I want to create a logfile with the current time & date as part of the title. I can create it as follows:
malwareLog = open(datetime.datetime.now().strftime("%Y%m%d - %H.%M " + pcName + " Malware scan log.txt"), "w+")
Now, my app is going to call a number of other functions, so I'll need to open the file, write some output to it and close the file, several times. It doesn't seem to work if I simply go:
malwareLog.open(malwareLog, "a+")
or similar. So how should I open a dynamically created txt file that I don't know the actual filename for...?
When you create malwareLog object, it has name attribute which contains the file name.
Here's an example: (my test is your malwareLog)
import random
test = open(str(random.randint(0,999999))+".txt", "w+")
test.write("hello ")
test.close()
test = open(test.name, "a+")
test.write("world!")
test.close()
with open(test.name, "r") as f: print(f.read())
You also can store the file name in a variable before or after creating the file.
###Before
file_name = "123"
malwareLog = open(file_name, "w")
###After
malwareLog = open(random.randint(0,999999), "w")
file_name = malwareLog.name

Storing the Output while Executing ".exe" file using python is not as expected

I have an executable file where i need to pass the input and get the output details. However Output can be saved in 3 formats "HTML","XML","Text" format. But by default it will save in HTML if user doesn't opt for any specific format.
I have my script which can do what i have mentioned above. But output is saving in ".HTML" . Is there any way to select the desired output options.
I used subprocess for interacting with exe file.
cmd = (r'C:\Program Files\file.exe',k1)
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
out = proc.communicate()
print out[0]
s= str(out[0])
print type(s)
f2 = open(k1+'.xml','wb') #storing in html (as its taking it by default)
f2.write(s)
f2.close()

How to get sequence description from gi number through biopython?

I have a list of GI (genbank identifier) numbers. How can I get the Sequence description (as 'mus musculus hypothetical protein X') for each GI number so that I can store it in a variable and write it to a file?
Thanks for your help!
This is a script I wrote to pull the entire GenBank file for each genbank identifier in a file. It should be easy enough to change for your applications.
#This program will open a file containing NCBI sequence indentifiers, find the associated
#information and write the data to *.gb
import os
import sys
from Bio import Entrez
Entrez.email = "yourname#xxx.xxx" #Always tell NCBI who you are
try: #checks to make sure input file is in the folder
name = raw_input("\nEnter file name with sequence identifications only: ")
handle = open(name, 'r')
except:
print "File does not exist in folder! Check file name and extension."
quit()
outfile = os.path.splitext(name)[0]+"_GB_Full.gb"
totalhand = open(outfile, 'w')
for line in handle:
line = line.rstrip() #strips \n from file
print line
fetch_handle = Entrez.efetch(db="nucleotide", rettype="gb", retmode="text", id=line)
data = fetch_handle.read()
fetch_handle.close()
totalhand.write(data)
So, in case anybody else had that question, here is the solution:
handle=Entrez.esummary(db="nucleotide, protein, ...", id="gi or NCBI_ref number")
record=Entrez.read(handle)
handle.close()
description=record[0]["Title"]
print description
This will print the sequence description that corresponds to the identifier.

Tempfile.new vs. File.open on Heroku

I'm capturing/creating user entered text into files from my app, attempting to temporarily store them in my Heroku tmp directory, then upload them to a cloud service such as Google Drive.
In using Tempfile I can successfully upload, but when using File.open I get the following error when attempting to upload:
ArgumentError (wrong number of arguments (1 for 0))
The error is on the call:
#client.upload_file_by_folder_id(save_path, #folder_id)
Where #client is a session with the cloud service, save_path is the location of the attached file for upload and #folder_id is the folder they should go into.
When I use Tempfile.new I am successful in doing so:
tempfile = Tempfile.new([final_filename, '.txt'], Rails.root.join('tmp','text-temp'))
tempfile.binmode
tempfile.write msgbody
tempfile.close
save_path = tempfile.path
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
tempfile.unlink
File.open code is:
path = 'tmp/text-temp'
filename = "#{final_filename}.txt"
save_path = Rails.root.join(path, filename)
File.open(save_path, 'wb') do |file|
file.write(msgbody)
file.close
end
upload_file = #client.upload_file_by_folder_id(save_path, #folder_id)
File.delete(save_path)
Could it be that the File.path is a string, and Tempfile.path is the full path (but not as a string)? When I put out each, they look identical.
I'd like to use File as I don't want to change the filename of the existing attachments I'm uploading, whereas Tempfile appends to the filename.
Any and all assistance is greatly appreciated. Thanks!
In order for it to work using File, I needed to set the save_path to a string:
save_path.to_s

Reading intended recipient from Undeliverable emails via Interop for Outlook

I've created an application, which is used to loop through the emails in an inbox and find all the undeliverable, mailbox full or delayed emails and generate a report.
The usual routine is to loop through all the emails in the inbox (up to a specified date).
If an email is undeliverable use regex to find the email. This works 95% of the time as this information is contained in the body of the Undelivered message (ReportItem).
So, my problem is I have a few emails which are returning blank emails to the report making it nigh on impossible to clean them or easily report that we have a problem with someone's email.
I have found that the information in the Internet Headers has who the mail was intended for, but cannot find anything on if it is possible to use an interop or some other object to obtain this information.
If anyone else has come across this problem and knows of a work around I would be very grateful.
Cheers
I was looking to automate an outlook mail box to move all undelivered emails and store the email address of the recipient of the undeliverable message in a list, so that I can later check if an entry of the list is present in an excel column and then remove it from the excel. I hope this helps !
I've found a Python solution for this problem. A python library that is used to connect to the outlook is win32com, so first we import all libraries that we will need:
import win32com.client
import re
import datetime as dt
from tqdm import tqdm
import time
import extract_msg
This is a good way to connect to a specific outlook account, if you have :
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
accounts= win32com.client.Dispatch("Outlook.Application").Session.Accounts
Then create a loop that iterates through the whole outlook and gets to the specified mail account:
for account in accounts:
inbox = outlook.Folders(account.DeliveryStore.DisplayName)
if account.DeliveryStore.DisplayName == 'place_your_account_name_here':
for folder in inbox.Folders:
Find the folder in outlook you wish to check by folder name,
so if you would want to iterate through Inbox, type "Inbox" instead of "Folder_name"
if folder.__str__() == "Folder_name":
messages = folder.Items
messages.Sort('[ReceivedTime]', True)
if folder.Folders.Item('Undeliverable'):
undeliverable = folder.Folders.Item('Undeliverable')
list_of_undelivered_email_addresses = my_super_function(messages,undeliverable)
After we have reached the mail items and declared the undeliverable subfolder as "undeliverable", we specify the time period for which we want to do the below function:
def my_super_function(messages,undeliverable):
list_of_undelivered_email_addresses = []
last_n_days = dt.datetime.now() - dt.timedelta(days = 25)
messages = messages.Restrict("[ReceivedTime] >= '" +last_n_days.strftime('%m/%d/%Y %H:%M %p')+"'")
rl= list()
I have found that the msot popular times of undeliverable email addresses present some sort of an error, and below the error is the original version of the email I have sent. Most of them (with very few exceptions, have a line that says:
To: "Some_email_address" ....
This is why I used this regular expression to get read the whole line after my pattern (which is "To: "")
pattern = re.compile('To: ".*\n?',re.MULTILINE)
for counter, message in enumerate(messages):
It is very important that you save the email somewhere on your PC, because otherwise as soon as you read it's body, the email gets encrypted.
message.SaveAs("undeliverable_emails.msg")
f = r'specify_the_absolute_path_where_you_want_it_saved'
try:
msg = extract_msg.Message(f)
print(counter)
Search the saved msg body for the keyword Undeliverable:
if msg.body.find("undeliverable")!= -1 or msg.body.find("Undeliverable")!= -1 or msg.subject.find("Undeliverable")!= -1 or msg.subject.find("undeliverable")!= -1 or msg.body.find("wasn't found at")!= -1:
Save the actual email to a list, so you can move it to the undeliverables subfolder later
rl.append(message)
m = re.search(pattern, msg.body)
m = m[0]
mail_final = m.split('"')[1]
list_of_undelivered_email_addresses.append(mail_final)
list_of_undelivered_email_addresses=list(filter(None, list_of_undelivered_email_addresses))
else:
print('this email is not an undeliverable one')
except:
pass
Move all mails in the list to the undeliverables folder:
if len(rl) ==0:
pass
else:
for m in tqdm(rl):
m.Move(undeliverable)
return list_of_undelivered_email_addresses
Here is the full code:
import win32com.client
import re
import datetime as dt
from tqdm import tqdm #tqdm gives you the progress bar
import time
import extract_msg
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
accounts= win32com.client.Dispatch("Outlook.Application").Session.Accounts
def my_super_function(messages,undeliverable):
list_of_undelivered_email_addresses = []
last_n_days = dt.datetime.now() - dt.timedelta(days = 25)
messages = messages.Restrict("[ReceivedTime] >= '" +last_n_days.strftime('%m/%d/%Y %H:%M %p')+"'")
rl= list()
pattern = re.compile('To: ".*\n?',re.MULTILINE)
for counter, message in enumerate(messages):
message.SaveAs("undeliverable_emails.msg")
f = r'some_absolute_path'
try:
msg = extract_msg.Message(f)
print(counter)
if msg.body.find("undeliverable")!= -1 or msg.body.find("Undeliverable")!= -1 or msg.subject.find("Undeliverable")!= -1 or msg.subject.find("undeliverable")!= -1 or msg.body.find("wasn't found at")!= -1:
rl.append(message)
m = re.search(pattern, msg.body)
m = m[0]
mail_final = m.split('"')[1]
list_of_undelivered_email_addresses.append(mail_final)
list_of_undelivered_email_addresses=list(filter(None, list_of_undelivered_email_addresses))
else:
print('else')
except:
pass
if len(rl) ==0:
pass
else:
for m in tqdm(rl):
m.Move(undeliverable)
return list_of_undelivered_email_addresses
for account in accounts:
inbox = outlook.Folders(account.DeliveryStore.DisplayName)
if account.DeliveryStore.DisplayName == 'desired_email_address':
for folder in inbox.Folders:
if folder.__str__() == "Inbox":
messages = folder.Items
messages.Sort('[ReceivedTime]', True)
if folder.Folders.Item('Undeliverable'):
undeliverable = folder.Folders.Item('Undeliverable')
list_of_undelivered_email_addresses = my_super_function(messages,undeliverable)
looks like what I want isnt part of the ReportItem properties.
The possible options are Extended IMAPI, CDO or Redemption
http://www.tech-archive.net/Archive/Outlook/microsoft.public.outlook.program_vba/2004-11/0084.html

Resources