I'm new both to IMAP and multi-thread programming, and I'd like to write a script to fetch incoming mails and process them in parallel.
Thread, Queue, Mutex and Monitor are new concepts to me, I may miss-use them in the following question and example.
The script's goal is to:
receive IDLE notifications from the IMAP server when a new mail arrives
fetch the corresponding mail
parse its body
generate a PDF from parsed data
send the report through SMTP
Another aproach could be to fetch and identify mails based on their UID, thought learning IMAP & multi-thread programing is another goal.
Here's what I've got so far (IMAP part):
def self.idle(imap)
#imap = Net::IMAP.new 'mail.company.org', 143, false # Not using TLS for test purposes
#imap.login 'username', 'password'
#idler_listener = Thread.new do
loop do
begin
imap.examine 'INBOX'
imap.idle do |res|
if res.kind_of?(Net::IMAP::UntaggedResponse) and res.name == 'EXISTS'
imap.idle_done
Thread.new { process_email(imap) }
end
end
rescue => e
puts e.inspect
end
end
end.join
end
def self.process_email(imap)
begin
uid = imap.uid_search(['SUBJECT', 'MyFavoriteSubject']).last
mail = imap.uid_fetch(uid, 'BODY[TEXT]')[0].attr['BODY[TEXT]']
puts mail
rescue => e
puts e.inspect
end
end
This example successfully prints out the body of an incoming mail. However, if several mail arrives at the same time, only one will be treated.
Q:
Is this behavior due to the fact that the loop is executed in a single thread ?
Does it illustrate the need for a queue, a thread pool, or a producer / consumer pattern ?
Chances that this script will attempt to access the same resource at the same time are low, however, would Mutex.new.synchronize protect me against race conditions & deadlocks ?
(Eventhough I've not fully understand it, I think it's worth sharing this great resource from Masatoshi Seki: https://www.druby.org/sidruby/)
Related
I've written little UDP server in Ruby:
def listen
puts "Started UDP server on #{#port}..."
Socket.udp_server_loop(#port) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
I start it in a separate thread:
thread = Thread.new { listen }
Is there a way to gracefully stop the udp_server_loop from outside the thread without just killing it (thread.kill)? I also dont't want to stop it from the inside by receiving any UDP message. Is udp_server_loop maybe not the right tool for me?
I don’t think you can do this with udp_server_loop (although you might be able to use some of the methods it uses). You are going to have to call IO::select in a loop of your own with some way of signalling it to exit, and some way of waking the thread so you don’t have to send a packet to stop it.
A simple way would be to use the timeout option to select with a variable to set to indicate you want the thread to end, something like:
#halt_loop = false
def listen
puts "Started UDP server on #{#port}..."
sockets = Socket.udp_server_sockets(#port)
loop do
readable, _, _ = IO.select(sockets, nil, nil, 1) # timeout 1 sec
break if #halt_loop
next unless readable # select returns nil on timeout
Socket.udp_server_recv(readable) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
end
You then set #halt_loop to true when you want to stop the thread.
The downside to this is that it is effectively polling. If you decrease the timeout then you potentially do more work on an empty loop, and if you increase it you have to wait longer when stopping the thread.
Another, slightly more complex solution would be to use a pipe and have the select listen on it along with the sockets. You could then signal directly to finish the select and exit the thread.
#read, #write = IO.pipe
#halt_loop = false
def listen
puts "Started UDP server on #{#port}..."
sockets = Socket.udp_server_sockets(#port)
sockets << #read
loop do
readable, _, _ = IO.select(sockets)
break if #halt_loop
readable.delete #read
Socket.udp_server_recv(readable) do |message, message_source|
puts "Got \"#{message}\" from #{message_source}"
handle_incoming_message(message)
end
end
end
def end_loop
#halt_loop = true
#write.puts "STOP!"
end
To exit the thread you just call end_loop which sets the #halt_loop flag then writes to the pipe, making the other end readable and causing the other thread to return from select.
You could have this code check the readable IOs and exit if one of them is the read end of the pipe instead of using the variable, but at least on Linux there is a potential bug where a call to select might return a file descriptor as readable when it actuallt isn’t. I don’t know if Ruby deals with this, so better safe than sorry.
Also be sure to remove the pipe from the readable array before passing it to udp_server_recv. It’s not a socket so will cause an exception if you don’t.
A downside to this technique is that pipes are “[n]ot available on all platforms".
Although I doubt I understand what would be wrong with Thread::kill and/or Thread#exit, you might use the thread local variable for that.
def listen
Socket.udp_server_loop(#port) do |message, message_source|
break :interrupted if Thread.current[:break]
handle_incoming_message(message)
end
end
and do
thread[:break] = true
from the outside.
I have a TCPserver that I made in ruby, the server seems to work, I can see that two or more clients can connect and be served by the server, but, they sometime get stuck (as in need to wait for the other client to disconnect or just get unresponsive), usually after the "pass_ok" bit, When connecting only with one client I don't see this issue.
Here is my code:
def self.main_server
begin
server = TCPServer.open(#port)
rescue Exception => e
CoreLogging.syslog_error("Cant start server: #{e}")
end
#main_pid = Process.pid
# Main Loop
Thread.abort_on_exception = true
while true
Thread.fork(server.accept) do |client|
#client = client
sock_domain, remote_port, remote_hostname, remote_ip = #client.peeraddr # Get some info on the incoming connection
CoreLogging.syslog_error("Got new connection from #{#client.peeraddr[3]} Handeled by Thread: #{Thread.current}") # Log incoming connection
#client.puts "Please enter password: " # Password testing (later will be from a config file or DB)
action = #client.gets(4096).chomp # get client password response 'chomp' is super important
if action == #password
# what to do when password is right
pass_ok
Thread.exit
else
# what to do when password is wrong
pass_fail
Thread.exit
end
end
begin
CoreLogging.syslog_error("Thread Ended (SOFT)")
rescue Exception => e
CoreLogging.syslog_error("Thread was killed (HARD)")
end
end
end
I'll leave it here for future reference and hope someone in a close situation will find it useful.
The issue was the global #client variable, which got overwritten every new thread and then inherited to the subclasses inside the thread.
using a local client variable (without the '#') got it to work as supposed.
I'm using Pubnub to publish live messages from a Server to a client (browser page). When using Pubnub, one must abide by their message size constraints, sometimes resulting in the need to chunk the message, send it in pieces, and reconstruct on the client side. Following Pubnub's advice, one can ensure delivery of each chunk of a message if the Pubnub.publish() function is not called too quickly (i.e. if the message pieces are simply being pumped through a for loop).
The Pubnub Ruby API specifies 3 required arguments in a Pubnub.publish(), a channel, a message, and a callback function. The callback function shown below is from Pubnub's Ruby examples:
#my_callback = lambda { |message| puts(message) }
pn.publish(:channel => :hello_world,
:message => "hi",
:callback => #my_callback)
The message in the callback (not the "hi" message in the publish), contains the status information of the publish() call with values like "sent" and "message to large", both of which would be accompanied by a unique identifier.
So somewhere under the Pubnub hood, this callback is getting a .call() - I'm wondering if there is a way for me to get inbetween this process. More detailed, say I have a message that needs to be broken up into three chunks, I would like for chunk 0 to be sent, and upon receipt of a "sent" status in the callback, I would like to then send chunk 1, etc...
I'm not very familiar with lambda functions and their scopes, and this was my first attempt:
#my_callback = lambda { |message|
puts(message)
Rails.logger.debug("Setting pubnub_do_send to true from callback")
pubnub_do_send = true
}
pubnub_do_send = true
while !pubnub_message.nil?
if pubnub_do_send
#Send pubnub message
#Cut off first chunk of message (this is why I'm testing for nil)
#Set pubnub_do_send to false
Rails.logger.debug("Message #{message_id} chunk #{chunk_id} sent")
pubnub_do_send = false
end
end
This resulted in an utter failure - getting the server completely locked into an infinite while loop because (if I had to guess) pubnub_do_send was never set to true again. Looking at the debug log, I see the first message print ("Message 1 chunk 0 sent") but never the output from the callback function. (Probably because of the infinite while loop it's found itself in)
There must be a clean way to do this, and I'm ok with refactoring code to some extent, chunking up the messages and storing into an array first and then simply looping through the array to send, but I feel like the solution can't be far off, I'm just not too handy with lambda functions and callbacks.
I feel like the solution should look like:
#my_callback = lambda { |message|
puts(message)
send_pubnub_message(message_id, chunk_id, chunk)
}
def send_pubnub_message(message_id, chunk_id, chunk)
#Send pubnub message
#Send message_id, next chunk_id, and next chunk to my_callback
end
But the problem is my_callback is called by Pubnub when the Pubnub.publish() gets some status back about the message rather than me directly calling it! (Is there a way for me to insert message_id, chunk_id, and chunk into the callback while still letting pubnub attach its message to the mix? That sounds way wrong, but maybe with Ruby...)
Thanks in advance for all help.
You shouldn't be trying to handle message chunk state in your callback. It has only a single responsibility, notifying about the status of the publish. However, you can insert things into the message. I might create a message wrapper that knows it's current state, and send that in the lambda, so you don't have to keep track of it. I haven't tested the following code, but here's an example of what I'm talking about:
class Message
attr_accessor :id, :message, :chunked_message, :chunk_id
def initialize(id, message)
#id, #message = id, message
chunk_message
end
def current_chunk
#chunked_message[#chunk_id]
end
def next_chunk
#chunk_id += 1
self
end
def more?
#chunked_message.length > chunk_id
end
private
def chunk_message
implement splitting message here
end
end
def send_pubnub_message(message)
pn.publish(:channel => :hello_world,
:message => message.current_chunk
:callback => lambda { |status|
puts(status)
case status[0] // 1 = success, 0 = fail
when 1
send_pubnub_message(message.next_chunk) if message.more?
when 0
handle_pubnub_error(status[1], message)
end
}
end
I have an application that reacts to messages sent by clients. One message is reload_credentials, that the application receives any time a new client registers. This message will then connect to a PostgreSQL database, do a query for all the credentials, and then store them in a regular Ruby hash ( client_id => client_token ).
Some other messages that the application may receive are start,stop,pause which are used to keep track of some session times. My point is that I envision the application functioning in the following way:
client sends a message
message gets queued
queue is being processed
However, for example, I don't want to block the reactor. Furthermore, let's imagine I have a reload_credentials message that's next in queue. I don't want any other message from the queue to be processed until the credentials are reloaded from the DB. Also, while I am processing a certain message ( like waiting for the credentials query to finish) , I want to allow other messages to be enqueued .
Could you please guide me towards solving such a problem? I'm thinking I may have to use em-synchrony, but I am not sure.
Use one of the Postgresql EM drivers, or EM.defer so that you won't block the reactor.
When you receive the 'reload_credentials' message just flip a flag that causes all subsequent messages to be enqueued. Once the 'reload_credentials' has finished, process all messages from the queue. After the queue is empty flip the flag that causes messages to be processed as they are received.
EM drivers for Postgresql are listed here: https://github.com/eventmachine/eventmachine/wiki/Protocol-Implementations
module Server
def post_init
#queue = []
#loading_credentials = false
end
def recieve_message(type, data)
return #queue << [type, data] if #loading_credentials || !#queue.empty?
return process_msg(type, data) unless :reload_credentials == type
#loading_credentials = true
reload_credentials do
#loading_credentials = false
process_queue
end
end
def reload_credentials(&when_done)
EM.defer( proc { query_and_load_credentials }, when_done )
end
def process_queue
while (type, data = #queue.shift)
process_msg(type, data)
end
end
# lots of other methods
end
EM.start_server(HOST, PORT, Server)
If you want all connections to queue messages whenever any connection receives a 'reload_connections' message you'll have to coordinate via the eigenclass.
The following is I presume, something like your current implementation:
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
q = EM::Queue.new
workers = Array.new(10) { Worker.new q }
The problem above, if I understand you correctly, is that you don't want workers working on new jobs (jobs that have arrived earlier in the producer timeline), than any reload_credentials jobs. The following should service this (additional words of caution at the end).
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
class LockingDispatcher
def initialize channel, queue
#channel = channel
#queue = queue
#backlog = []
#channel.subscribe method(:dispatch_with_locking)
#locked = false
end
def dispatch_with_locking item
if locked?
#backlog << item
else
# You probably want to move the specialization here out into a method or
# block that's passed into the constructor, to make the lockingdispatcher
# more of a generic processor
case item.type
when :reload_credentials
lock
deferrable = CredentialReloader.new(item).start
deferrable.callback { unlock }
deferrable.errback { unlock }
else
dispatch_without_locking item
end
end
end
def dispatch_without_locking item
#queue << item
end
def locked?
#locked
end
def lock
#locked = true
end
def unlock
#locked = false
bl = #backlog.dup
#backlog.clear
bl.each { |item| dispatch_with_locking item }
end
end
channel = EM::Channel.new
queue = EM::Queue.new
dispatcher = LockingDispatcher.new channel, queue
workers = Array.new(10) { Worker.new queue }
So, input to the first system comes in on q, but in this new system it comes in on channel. The queue is still used for work distribution among workers, but the queue is not populated while a refresh credentials operation is going on. Unfortunately, as I didn't take more time, I have not generalized the LockingDispatcher such that it isn't coupled with the item type and code for dispatching CredentialsReloader. I'll leave that to you.
You should note here that whilst this services what I understand of your original request, it is generally better to relax this kind of requirement. There are several outstanding problems that essentially cannot be eradicated without alterations in that requirement:
The system does not wait for executing jobs to complete before starting credentials jobs
The system will handle bursts of credentials jobs very badly - other items that might be processable, won't be.
In the case of a bug in the credentials code, the backlog could fill up ram and cause failure. A simple timeout might be enough to avoid catastrophic effects, iff the code is abortable, and subsequent messages are sufficiently processable to avoid further deadlocks.
It actually sounds like you have some notion of a userid in the system. If you think through your requirements, it's likely possible that you only need to backlog items that pertain to a userid who's credentials are in a refresh state. This is a different problem, that involves a different kind of dispatching. Try a hash of locked backlogs for those users, with a callback on credential completion to drain those backlogs into the workers, or some similar arrangement.
Good luck!
I have a circumstance where my server may close TCPServer and restart, saving all the users to a file, and immediately reloading them; their connections do not sever.
The problem is I can't seem to reinitialize their streams.
When we restart (and attempt to maintain connections), I reinitialize TCPServer, and load my array of connected users – Since these each have an existing socket address, stored as <TCPSocket:0x00000000000000>, can I reinitialize these addresses with TCPServer?
Normally, each user connects and is accepted:
$nCS = TCPServer.new(HOST, PORT)
begin
while socket = $nCS.accept
Thread.new( socket ) do |sock|
begin
d = User.new(sock)
while sock.gets
szIn = $_.chomp
DBG( "Received '" + szIn + "' from Client " + sock.to_s )
d.parseInput( szIn )
end
rescue => e
$stdout.puts "ERROR: Caught error in Client Thread: #{e} \r\n #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
d.closeConnection
end
end
end
rescue => e
$stdout.puts "ERROR: Caught error in Server Thread: #{e} \r\n #{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
To give it a command to hot reboot, we use exec('./main --copyover') to flag that a copy over is occurring.
If $connected holds an array of all users, and each user has a socket, how do I reinitialize the socket that was open before the restart (assuming the other end is still connected)?
I suspect that using exec("./main", "--copyover", *$nCS, *$connected) is getting me closer, since this simply replaces the process, and should maintain the files (not close them).
You can't. The socket is only valid for the lifetime of the process: it is closed by the OS when the process exits. That in turn invalidates the connection, so the other end is not still connected.
How to Hot-Reboot a TCPServer in Ruby
Hot-Rebooting (aka Copyover) is a process by which an administrator can reload the application (along with any new changes made since last boot) without losing the client connections. This is useful in managing customer expectations as the application does not need to suffer severe downtime and disruption if in use.
What I propose below may not be the best practice, but it's functioning and perhaps will guide others to a similar solution.
The Command
I use a particular style of coding that makes use of command tables to find functions and their accessibility. All command functions are prefixed with cmd. I'll clean up the miscellany to improve readability:
def cmdCopyover
#$nCS is the TCPServer object
#$connected holds an array of all users sockets
#--copyover flags that this is a hot reboot.
connected_args = $connected.map do |sock|
sock.close_on_exec = false if sock.respond_to?(:close_on_exec=)
sock.fileno.to_s
end.join(",")
exec('./main.rb', '--copyover', $nCS.fileno.to_s, connected_args)
end
What we're passing are strings; $nCS.fileno.to_s provides us the file descriptor of the main TCPServer object, while connected_args is a comma-delineated list of file descriptors for each user connected. When we restart, ARGV will be an array holding each argument:
ARGV[0] == "--copyover"
ARGV[1] == "5" (Or whatever the file descriptor for TCPServer was)
ARGV[2] == "6,7,8,9" (Example, assuming 4 connected users)
What To Expect When You're Expecting (a Copyover)
Under normal circumstances, we may have a basic server (in main.rb that looks something like this:
puts "Starting Server"
$connected = Array.new
$nCS = TCPServer.new("127.0.0.1",9999)
begin
while socket = $nCS.accept
# NB: Move this loop to its own function, threadLoop()
Thread.new( socket ) do |sock|
begin
while sock.gets
szIn = $_.chomp
#do something with input.
end
rescue => e
puts "ERROR: Caught error in Client Thread: #{e}"
puts #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
sock.close
end
end
end
rescue => e
puts "Error: Caught Error in Server Thread: #{e}"
puts "#{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
We want to move that main loop to its own function to make it accessible -- our reconnecting users will need to be reinserted in the loop.
So let's get main.rb ready for accepting a hot reboot:
def threadLoop( socket )
Thread.new( socket ) do |sock|
begin
while sock.gets
szIn = $_.chomp
#do something with input.
end
rescue => e
puts "ERROR: Caught error in Client Thread: #{e}"
puts #{e.backtrace.to_s.gsub(",", ",\r\n")}"
sock.write("Sorry, an error has occurred, and you have been disconnected."+EOL+"Please try again later."+EOL)
sock.close
end
end
end
puts "Starting Server"
$connected = Array.new
if ARGV[0] == '--copyover'
$nCS = TCPServer.for_fd( ARGV[1].to_i )
$nCS.close_on_exec = false if $nCS.respond_to?(:close_on_exec=)
connected_args = ARGV[2]
connected_args.split(/,/).map do |sockfd|
$connected << sockfd
$connected.each {|c| threadLoop( c ) }
else
$nCS = TCPServer.new("127.0.0.1",9999)
$nCS.close_on_exec = false if $nCS.respond_to?(:close_on_exec=)
end
begin
while socket = $nCS.accept
threadLoop( socket )
end
rescue => e
puts "Error: Caught Error in Server Thread: #{e}"
puts "#{e.backtrace.to_s.gsub(",", ",\r\n")}"
exit
end
Caveat
My actual usage was a lot more ridiculously complicated, so I did my best to strip out all the garbage; however, I was realizing when I got the end here that you could probably do without $connected (it's a part of a larger system for me). There may be some errors, so please comment if you find them and I'll correct.
Hope this helps anyone who finds it.