Ruby - IMAP iterate multiple folders / select multiple folders - ruby

This is my IMAP code:
imap = Net::IMAP.new('outlook.office365.com', 993, true)
imap.login('USERNAME', 'PASSWORD')
imap.examine("FOLDER1")
imap.search(["SINCE", "17-Dec-2020"]).each do |message_id|
envelope = imap.fetch(message_id, "ENVELOPE")[0].attr["ENVELOPE"]
subject = Mail::Encodings.unquote_and_convert_to( envelope.subject, 'utf-8' )
next unless subject.include?("SOME TXT")
body = imap.fetch(message_id, "BODY[]")[0].attr["BODY[]"]
mail = Mail.new(body)
mail.attachments.each do |a|
next unless File.extname(a.filename) == ".pdf"
File.open("/path/to/file/pdfscript/#{a.filename}", 'wb') do |file|
file.write(a.body.decoded)
end
end
end
Right now I have it setup only for "FOLDER1", but I have a lot of folders that I need to get files from. Is it possible to add multiple folders? Something like
imap.examine("FOLDER1", "FOLDER2", "FOLDER3", "FOLDER4")

I solved my problem by making an array
array_of_folders = ["Folder1","Folder2","Folder3","Folder4"]
and made an .each
array_of_folders.each do |folders|
imap.examine(folders) # takes each folder
# Rest of the code
end

Related

RubyZip docx issues with write_buffer instead of open

I'm adapting the RubyZip recursive zipping example (found here) to work with write_buffer instead of open and am coming across a host of issues. I'm doing this because the zip archive I'm producing has word documents in it and I'm getting errors on opening those word documents. Therefore, I'm trying the work-around that RubyZip suggests, which is using write_buffer instead of open (example found here).
The problem is, I'm getting errors because I'm using an absolute path, but I'm not sure how to get around that. I'm getting the error "#//', name must not start with />"
Second, I'm not sure what to do to mitigate the issue with word documents. When I used my original code, which worked and created an actual zip file, any word document in that zip file had the following error upon opening: "Word found unreadable content in Do you want to recover the contents of this document? If you trust the source of this document, click Yes." The unreadable content error is the reason why I went down the road of attempting to use write_buffer.
Any help would be appreciated.
Here is the code that I'm currently using:
require 'zip'
require 'zip/zipfilesystem'
module AdvisoryBoard
class ZipService
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
path = ""
buffer = Zip::ZipOutputStream.write_buffer do |zipfile|
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
#file = nil
#data = nil
if !File.directory?(disk_file_path)
#file = File.open(disk_file_path, "r+b")
#data = #file.read
unless [#output_file, #input_dir].include?(e)
zipfile.put_next_entry(e)
zipfile.write #data
end
#file.close
end
end
zipfile.put_next_entry(#output_file)
zipfile.put_next_entry(#input_dir)
end
File.open(#output_file, "wb") { |f| f.write(buffer.string) }
end
end
end
I was able to get word documents to open without any warnings or corruption! Here's what I ended up doing:
require 'nokogiri'
require 'zip'
require 'zip/zipfilesystem'
class ZipService
# Initialize with the directory to zip and the location of the output archive.
def initialize(input_dir, output_file)
#input_dir = input_dir
#output_file = output_file
end
# Zip the input directory.
def write
entries = Dir.entries(#input_dir) - %w[. ..]
::Zip::File.open(#output_file, ::Zip::File::CREATE) do |zipfile|
write_entries entries, '', zipfile
end
end
private
# A helper method to make the recursion work.
def write_entries(entries, path, zipfile)
entries.each do |e|
zipfile_path = path == '' ? e : File.join(path, e)
disk_file_path = File.join(#input_dir, zipfile_path)
if File.directory? disk_file_path
recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
else
put_into_archive(disk_file_path, zipfile, zipfile_path, e)
end
end
end
def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
zipfile.mkdir zipfile_path
subdir = Dir.entries(disk_file_path) - %w[. ..]
write_entries subdir, zipfile_path, zipfile
end
def put_into_archive(disk_file_path, zipfile, zipfile_path, entry)
if File.extname(zipfile_path) == ".docx"
Zip::File.open(disk_file_path) do |zip|
doc = zip.read("word/document.xml")
xml = Nokogiri::XML.parse(doc)
zip.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
end
zipfile.add(zipfile_path, disk_file_path)
else
zipfile.add(zipfile_path, disk_file_path)
end
end
end

Select and use part of an AWS s3 object key using Ruby aws-sdk

I am trying to list only the objects from the s3 folder (not a real folder I know) called distribution but I want to remove the reference to the name and any slashes around the object. The output should just look like 021498cd-ca73-4675-a57a-c12b3c652aac whereas currently it looks like distribution/021498cd-ca73-4675-a57a-c12b3c652aac/
So far I have tried;
def files
s3 = Aws::S3::Resource.new
s3.client
bucket = s3.bucket('test')
files = []
bucket.objects.each do |obj|
if obj.key.include?('distribution/')
temp_files = puts "#{obj.key}"
files = temp_files.select do |file|
file.gsub("distribution/", "")
end
else
end
end
end
But this doesn't seem to be working at all.
Your explanation is pretty simple but your code is implying something else.
However, this should help with what you are trying to achieve.
def files
s3 = Aws::S3::Resource.new
s3.client
bucket = s3.bucket('test')
files = []
bucket.objects.each do |obj|
if obj.key.include?('distribution/')
files << "#{file.gsub(/(distribution)|\//, '')}"
end
end
end
The files array will contain all the file names with garbage stripped.

Download a file only if it exists with ruby

I'm doing a scraper to download all the issues of The Exile available at http://exile.ru/archive/list.php?IBLOCK_ID=35&PARAMS=ISSUE.
So far, my code is like this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
puts "Downloading issue #{number}"
open(numero) { |f|
File.open("#{DATA_DIR}/#{number}.pdf",'w') do |file|
file.puts f.read
end
}
end
puts "done"
The thing is, a lot of the issue links are down, and the code creates a PDF for every issue and, if it's empty, it will leave an empty PDF. How can I change the code so that it can only create and copy a file if the link exists?
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
url_template = "http://exile.ru/docs/pdf/issues/exile%d.pdf"
filename_template = "#{DATA_DIR}/%d.pdf"
(120..290).each do |number|
pdf_url = url_template % number
print "Downloading issue #{number}"
# Opening the URL downloads the remote file.
open(pdf_url) do |pdf_in|
if pdf_in.read(4) == '%PDF'
pdf_in.rewind
File.open(filename_template % number,'w') do |pdf_out|
pdf_out.write(pdf_in.read)
end
print " OK\n"
else
print " #{pdf_url} is not a PDF\n"
end
end
end
puts "done"
open(url) downloads the file and provides a handle to a local temp file. A PDF starts with '%PDF'. After reading the first 4 characters, if the file is a PDF, the file pointer has to be put back to the beginning to capture the whole file when writing a local copy.
you can use this code to check if exist the file:
require 'net/http'
def exist_the_pdf?(url_pdf)
url = URI.parse(url_pdf)
Net::HTTP.start(url.host, url.port) do |http|
puts http.request_head(url.path)['content-type'] == 'application/pdf'
end
end
Try this:
require 'rubygems'
require 'open-uri'
DATA_DIR = "exile"
Dir.mkdir(DATA_DIR) unless File.exists?(DATA_DIR)
BASE_exile_URL = "http://exile.ru/docs/pdf/issues/exile"
for number in 120..290
numero = BASE_exile_URL + number.to_s + ".pdf"
open(numero) { |f|
content = f.read
if content.include? "Link is missing"
puts "Issue #{number} doesnt exists"
else
puts "Issue #{number} exists"
File.open("./#{number}.pdf",'w') do |file|
file.write(content)
end
end
}
end
puts "done"
The main thing I added is a check to see if the string "Link is missing". I wanted to do it using HTTP status codes but they always give a 200 back, which is not the best practice.
The thing to note is that with my code you always download the whole site to look for that string, but I don't have any other idea to fix it at the moment.

How do you upload a zip file and unzip to s3?

I am working on an application where I have to upload a zip file. The zip file is basically a static website so it has many files and a couple subdirectories. I have been playing with the rubyzip gem for a while now and can not figure out how to simply extract the files from it. Any pointers on where I can read up on some examples? I am sure someone has ran in to this problem before. the documentation for rubyzip is not very good so I am hoping someone can give me some pointers.
Here you go, one super magical multithreaded zip-to-S3 uploader which I haven't tested at all - go nuts! Looks like I'm three years too late though.
class S3ZipUploader
require 'thread'
require 'thwait'
require 'find'
attr_reader *%i{ bucket s3 zip failed_uploads }
def initialize(zipfilepath, mys3creds)
# next 4 lines are important
#s3 = AWS::S3.new(access_key_id: mys3creds[Rails.env]['aws_access_key'],
secret_access_key: mys3creds[Rails.env]['aws_secret_access_key'],
region: 'us-west-2')
#bucket = #s3.buckets[ mys3creds[Rails.env]['bucket'] ]
#failed_uploads = []
#zip = Zip::File.open(zipfilepath)
end
def upload_zip_contents
rootpath = "mypath/"
desired_threads = 10
total_entries = zip.entries.count
slice_size = (total_entries / desired_threats).ceil
threads = []
zip.entries.each_slice(slice_size) do |e_arr|
threads << Thread.new do |et|
e_arr.each do |e|
result = upload_to_s3(rootpath + e.name, e.get_input_stream.read)
if !result
#failed_uploads << {name: e.name, entry: e, error: err}
end
end
end
end
ThreadsWait.all_waits(*threads)
end
def upload_file_to_s3(filedata,path, rewrite_basepath)
retries = 0
success = false
while !success && retries < 3
success = begin
obj = bucket.objects[path]
obj.write(Pathname.new(outputhtml))
obj.acl = :public_read
success = true
rescue
retries += 1
success = false
end
end
return success
end
end
uploader = S3ZipUploader.new("/path/to/myzip.zip", MYS3CREDS)
uploader.upload_zip_contents

send_file for a tempfile in Sinatra

I'm trying to use Sinatra's built-in send_file command but it doesn't seem to be working for tempfiles.
I basically do the following to zip an album of mp3s:
get '/example' do
songs = ...
file_name = "zip_test.zip"
t = Tempfile.new(['temp_zip', '.zip'])
# t = File.new("testfile.zip", "w")
Zip::ZipOutputStream.open(t.path) do |z|
songs.each do |song|
name = song.name
name += ".mp3" unless name.end_with?(".mp3")
z.put_next_entry(name)
z.print(open(song.url) {|f| f.read })
p song.name + ' added to file'
end
end
p t.path
p t.size
send_file t.path, :type => 'application/zip',
:disposition => 'attachment',
:filename => file_name,
:stream => false
t.close
t.unlink
end
When I use t = File.new(...) things work as expected, but I don't want to use File as it will have concurrency problems.
When I use t = Tempfile.new(...), I get:
!! Unexpected error while processing request: The file identified by body.to_path does not exist`
Edit: It looks like part of the problem is that I'm sending multiple files. If I just send one song, the Tempfile system works as well.
My guess is that you have a typo in one of your song-names, or maybe a slash in one of the last parts of song.url? I adopted your code and if all the songs exist, sending the zip as a tempfile works perfectly fine.

Resources