I am trying to make an interactive telnet client for Ruby. The current library is extremely lacking so I have been trying to add onto it by creating an interactive gem that allows a user to streamline data in real time with telnet. To do this I need to use multithreading:
t1 accepts user input. Users must always have capacity to input data through entire application. Once user data is sent we will receive data back right away which will be caught with our block { |c| print c }. The problem is that we want data to be streamlined to us. In other words, right now we only get data that is sent back after we send something, we want data to be received up to a minute after we sent a command. We want data to constantly be flowing to us.
I made t2 for this purpose. t2 waits for data to be received and then displays it when its regex pattern is matched. The problem with t2 is, if data is never received then the user cannot enter information into t1.
t3 operates on t1 and t2.
My Question is how can I organize my threads in such a way where the user can constantly type in console and submit commands, and simultaneously constantly receive information back from the server?
t1 = Thread.new {
while true
input = gets.chomp
localhost.cmd(input) { |c| print c }
end
}
t2 = Thread.new {
puts localhost.waitfor("Prompt" => /[$%#>:?.|](\e\[0?m\s*)* *\z/)
}
t3 = Thread.new {
t1.join
t2.join
}
t3.join
The problem is that we want data to be streamlined to us. In other
words, right now we only get data that is sent back after we send
something,
require 'thread'
user_data = Queue.new
t1 = Thread.new do
loop do
print "Enter data: "
line = gets.chomp
if line == ""
user_data << "END_OF_DATA"
break
else
user_data << line
end
end
end.join
t2 = Thread.new do
processed_data = []
loop do
line = user_data.shift
break if line == "END_OF_DATA"
processed_data << line
end
p processed_data
end.join
You might want to read this:
https://www.rfc-editor.org/rfc/rfc854
Related
Following is structure of processing data and printing results. Some contents will be printed out during process_records, so I want for all threads ensure part can be printed out at very end of program running together with order, like first thread's ensure part should be printed out first. How can I do that without moving print_report out of Thread.new do? Thanks
lock = Mutex.new()
thread_num.times do |i|
threads << Thread.new do
records = lock.synchronize{db_proxy.query(account_id)}
result1 = process_records(records)
result2 = process_records2(records)
result3 = process_records3(records)
ensure
print_report(result1, result2, result3)
end
}
end
end
I'm trying to multithread loop in ruby following this exmaple: http://t-a-w.blogspot.com/2010/05/very-simple-parallelization-with-ruby.html.
I copied that coded and wrote this:
module Enumerable
def ignore_exception
begin
yield
rescue Exception => e
STDERR.puts e.message
end
end
def in_parallel(n)
t_queue = Queue.new
threads = (1..n).map {
Thread.new{
while x = t_queue.deq
ignore_exception{ yield(x[0]) }
end
}
}
each{|x| t_queue << [x]}
n.times{ t_queue << nil }
threads.each{|t|
t.join
unless t[:out].nil?
puts t[:out]
end
}
end
end
ids.in_parallel(10){ |id|
conn = open_conn(loc)
out = conn.getData(id)
Thread.current[:out] = out
}
The way I understand it is that it will dequeue 10 items at a time, process the block in the loop per id and join the 10 threads at the end, and repeat until finished. After running this code I get different results sometimes, especially if the size of my ids is less then 10, and I am confused why this is occuring. Half the time it will not output anything for upto half the ids, even though I can check on server side that output for those ids exists. For example if the correct output is "Got id 1" and "Got id 2", it will only print out {"Got id 1"} or {"Got id 2"} or {"Got id 1", "Got id 2"}. My question is that is that is my understanding of this code correct?
The issue in my code was the open_conn() function call, which was not thread safe. I fixed the issue by synchronizing around getting the connection handle:
connLock = Mutex.new
ids.in_parallel(10){ |id|
conn = nil
connLock.synchronize {
conn = open_conn(loc)
}
out = conn.getData(id)
Thread.current[:out] = out
}
Also should use http://peach.rubyforge.org/ for the loop parallelization by using:
ids.peach(10){ |id| ... }
I'm aware that there are great gems like Parallel, but I came up with the class below as an exercise.
It's working fine, but when doing a lot of iterations it happens sometimes that Ruby will get "stuck". When pressing CTRL+C I can see from the backtrace that it's always in lines 38 or 45 (the both Marshal lines).
Can you see anything that is wrong here? It seems to be that the Pipes are "hanging", so I thought I might be using them in a wrong way.
My goal was to iterate through an array (which I pass as 'objects') with a limited number of forks (max_forks) and to return some values. Additionally I wanted to guarantee that all childs get killed when the parent gets killed (even in case of kill -9), this is why I introduced the "life_line" Pipe (I've read here on Stackoverflow that this might do the trick).
class Parallel
def self.do_fork(max_forks, objects)
waiter_threads = []
fork_counter = []
life_line = {}
comm_line = {}
objects.each do |object|
key = rand(24 ** 24).to_s(36)
sleep(0.01) while fork_counter.size >= max_forks
if fork_counter.size < max_forks
fork_counter << true
life_line[key] = {}
life_line[key][:r], life_line[key][:w] = IO.pipe
comm_line[key] = {}
comm_line[key][:r], comm_line[key][:w] = IO.pipe
pid = fork {
life_line[key][:w].close
comm_line[key][:r].close
Thread.new {
begin
life_line[key][:r].read
rescue SignalException, SystemExit => e
raise e
rescue Exception => e
Kernel.exit
end
}
Marshal.dump(yield(object), comm_line[key][:w]) # return yield
}
waiter_threads << Thread.new {
Process.wait(pid)
comm_line[key][:w].close
reply = Marshal.load(comm_line[key][:r])
# process reply here
comm_line[key][:r].close
life_line[key][:r].close
life_line[key][:w].close
life_line[key] = nil
fork_counter.pop
}
end
end
waiter_threads.each { |k| k.join } # wait for all threads to finish
end
end
The bug was this:
A pipe can handle only a certain amount of data (e.g. 64 KB).
Once you write more than that, the Pipe will get "stuck" forever.
An easy solution is to read the pipe in a thread before you start writing to it.
comm_line = IO.pipe
# Buffered Pipe Reading (in case bigger than 64 KB)
reply = ""
read_buffer = Thread.new {
while !comm_line[0].eof?
reply = Marshal.load(comm_line[0])
end
}
child_pid = fork {
comm_line[0].close
comm_line[0].write "HUGE DATA LARGER THAN 64 KB"
}
Process.wait(child_pid)
comm_line[1].close
read_buffer.join
comm_line[0].close
puts reply # outputs the "HUGE DATA"
I don't think the problem is with Marshal. The more obvious one seems to be that your fork may finish execution before the waiter thread gets to it (leading to the latter to wait forever).
Try changing Process.wait(pid) to Process.wait(pid, Process::WNOHANG). The Process::WNOHANG flag instructs Ruby to not hang if there are no children (matching the given PID, if any) available. Note that this may not be available on all platforms but at the very least should work on Linux.
There's a number of other potential problems with your code but if you just came up with it "as an exercise", they probably don't matter. For example, Marshal.load does not like to encounter EOFs, so I'd probably guard against those by saying something like Marshal.load(comm_line[key][:r]) unless comm_line[key][:r].eof? or loop until comm_line[key][:r].eof? if you expect there to be several objects to be read.
I have an application that reacts to messages sent by clients. One message is reload_credentials, that the application receives any time a new client registers. This message will then connect to a PostgreSQL database, do a query for all the credentials, and then store them in a regular Ruby hash ( client_id => client_token ).
Some other messages that the application may receive are start,stop,pause which are used to keep track of some session times. My point is that I envision the application functioning in the following way:
client sends a message
message gets queued
queue is being processed
However, for example, I don't want to block the reactor. Furthermore, let's imagine I have a reload_credentials message that's next in queue. I don't want any other message from the queue to be processed until the credentials are reloaded from the DB. Also, while I am processing a certain message ( like waiting for the credentials query to finish) , I want to allow other messages to be enqueued .
Could you please guide me towards solving such a problem? I'm thinking I may have to use em-synchrony, but I am not sure.
Use one of the Postgresql EM drivers, or EM.defer so that you won't block the reactor.
When you receive the 'reload_credentials' message just flip a flag that causes all subsequent messages to be enqueued. Once the 'reload_credentials' has finished, process all messages from the queue. After the queue is empty flip the flag that causes messages to be processed as they are received.
EM drivers for Postgresql are listed here: https://github.com/eventmachine/eventmachine/wiki/Protocol-Implementations
module Server
def post_init
#queue = []
#loading_credentials = false
end
def recieve_message(type, data)
return #queue << [type, data] if #loading_credentials || !#queue.empty?
return process_msg(type, data) unless :reload_credentials == type
#loading_credentials = true
reload_credentials do
#loading_credentials = false
process_queue
end
end
def reload_credentials(&when_done)
EM.defer( proc { query_and_load_credentials }, when_done )
end
def process_queue
while (type, data = #queue.shift)
process_msg(type, data)
end
end
# lots of other methods
end
EM.start_server(HOST, PORT, Server)
If you want all connections to queue messages whenever any connection receives a 'reload_connections' message you'll have to coordinate via the eigenclass.
The following is I presume, something like your current implementation:
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
q = EM::Queue.new
workers = Array.new(10) { Worker.new q }
The problem above, if I understand you correctly, is that you don't want workers working on new jobs (jobs that have arrived earlier in the producer timeline), than any reload_credentials jobs. The following should service this (additional words of caution at the end).
class Worker
def initialize queue
#queue = queue
dequeue
end
def dequeue
#queue.pop do |item|
begin
work_on item
ensure
dequeue
end
end
end
def work_on item
case item.type
when :reload_credentials
# magic happens here
else
# more magic happens here
end
end
end
class LockingDispatcher
def initialize channel, queue
#channel = channel
#queue = queue
#backlog = []
#channel.subscribe method(:dispatch_with_locking)
#locked = false
end
def dispatch_with_locking item
if locked?
#backlog << item
else
# You probably want to move the specialization here out into a method or
# block that's passed into the constructor, to make the lockingdispatcher
# more of a generic processor
case item.type
when :reload_credentials
lock
deferrable = CredentialReloader.new(item).start
deferrable.callback { unlock }
deferrable.errback { unlock }
else
dispatch_without_locking item
end
end
end
def dispatch_without_locking item
#queue << item
end
def locked?
#locked
end
def lock
#locked = true
end
def unlock
#locked = false
bl = #backlog.dup
#backlog.clear
bl.each { |item| dispatch_with_locking item }
end
end
channel = EM::Channel.new
queue = EM::Queue.new
dispatcher = LockingDispatcher.new channel, queue
workers = Array.new(10) { Worker.new queue }
So, input to the first system comes in on q, but in this new system it comes in on channel. The queue is still used for work distribution among workers, but the queue is not populated while a refresh credentials operation is going on. Unfortunately, as I didn't take more time, I have not generalized the LockingDispatcher such that it isn't coupled with the item type and code for dispatching CredentialsReloader. I'll leave that to you.
You should note here that whilst this services what I understand of your original request, it is generally better to relax this kind of requirement. There are several outstanding problems that essentially cannot be eradicated without alterations in that requirement:
The system does not wait for executing jobs to complete before starting credentials jobs
The system will handle bursts of credentials jobs very badly - other items that might be processable, won't be.
In the case of a bug in the credentials code, the backlog could fill up ram and cause failure. A simple timeout might be enough to avoid catastrophic effects, iff the code is abortable, and subsequent messages are sufficiently processable to avoid further deadlocks.
It actually sounds like you have some notion of a userid in the system. If you think through your requirements, it's likely possible that you only need to backlog items that pertain to a userid who's credentials are in a refresh state. This is a different problem, that involves a different kind of dispatching. Try a hash of locked backlogs for those users, with a callback on credential completion to drain those backlogs into the workers, or some similar arrangement.
Good luck!
I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end