Please explain the logic behind this ruby fiber example - ruby

The example code is from here:
def http_get(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get
# resume fiber once http call is done
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
EventMachine.run do
Fiber.new{
page = http_get('http://www.google.com/')
puts "Fetched page: #{page.response_header.status}"
if page
page = http_get('http://www.google.com/search?q=eventmachine')
puts "Fetched page 2: #{page.response_header.status}"
end
}.resume
end
So, in the context of the EM run block, the author's creating a fiber and running it immediately with resume. But, I don't understand why the http_get logic is structured in that way. I mean, it's taking the current fiber ( which in this case should be the one created in the EM run block ), it starts a http request which may fail or succeed, and it resumes the current fiber. Afterwards it just calls yield on the fiber. What exactly will be running since he is calling yield? Can someone please explain why http_get is written the way it is?

Fiber is created and triggered in EventMachine
the goal is (a) to fetch a page and (b) work on it
the Fiber should be paused until the page is fetched, this is the role of http_get
http = EventMachine::HttpRequest.new(url).get doesn't trigger anything: EventMachine needs to get the reins back, that's the role of Fiber.yield
Once EventMachine has done the job getting the page, it triggers the callback and resumes the Fiber which was stopped at puts ...
Clearer?

Related

Ruby TCPServer performance issue

I am encountering an interesting issue with Ruby TCPServer, where once a client connects, it continually uses more and more CPU processing power until it hits 100% and then the entire system starts to bog down and can't process incoming data.
The processing class that is having an issue is designed to be a TCP Client that receives data from an embedded system, processes it, then returns the processed data to be further used (either by other similar data processors, or output to a user).
In this particular case, there is an external piece of code that would like this processed data, but cannot access it from the main parent code (the thing that the original process class is returning it's data to). This external piece may or may not be connected at any point while it is running.
To solve this, I set up a Thread with a TCPServer, and the processing class continually adds to a queue, and the Thread pulls from the queue and sends it to the client.
It works great, except for the performance issues. I am curious if I have something funky going on in my code, or if it's just the nature of this methodology and it will never be performant enough to work.
Thanks in advance for any insight/suggestions with this problem!
Here is my code/setup, with some test helpers:
process_data.rb
require 'socket'
class ProcessData
def initialize
super
#queue = Queue.new
#client_active = false
Thread.new do
# Waiting for connection
#server = TCPServer.open('localhost', 5000)
loop do
Thread.start(#server.accept) do |client|
puts 'Client connected'
# Connection established
#client_active = true
begin
# Continually attempt to send data to client
loop do
unless #queue.empty?
# If data exists, send it to client
begin
until #queue.empty?
client.puts(#queue.pop)
end
rescue Errno::EPIPE => error
# Client disconnected
client.close
end
end
sleep(1)
end
rescue IOError => error
# Client disconnected
#client_active = false
end
end # Thread.start(#server.accept)
end # loop do
end # Thread.new do
end
def read(data)
# Data comes in from embedded system on this method
# Do some processing
processed_data = data.to_i + 5678
# Ready to send data to external client
if #client_active
#queue << processed_data
end
return processed_data
end
end
test_embedded_system.rb (source of the original data)
require 'socket'
#data = '1234'*100000 # Simulate lots of data coming ing
embedded_system = TCPServer.open('localhost', 5555)
client_connection = embedded_system.accept
loop do
client_connection.puts(#data)
sleep(0.1)
end
parent.rb (this is what will create/call the ProcessData class)
require_relative 'process_data'
processor = ProcessData.new
loop do
begin
s = TCPSocket.new('localhost', 5555)
while data = s.gets
processor.read(data)
end
rescue => e
sleep(1)
end
end
random_client.rb (wants data from ProcessData)
require 'socket'
loop do
begin
s = TCPSocket.new('localhost', 5000)
while processed_data = s.gets
puts processed_data
end
rescue => e
sleep(1)
end
end
To run the test in linux, open 3 terminal windows:
Window 1: ./test_embedded_system.rb
Window 2: ./parent.rb
\CPU usage is stable
Window 3: ./random_client.rb
\CPU usage continually grows
I ended up figuring out what the issue was, and unfortunately I lead folks astray with my example.
It turns out my example didn't quite have the issue I was having, and the main difference was the sleep(1) was not in my version of process_data.rb.
That sleep is actually incredibly important, because it is inside of a loop do, and without the sleep, the Thread won't yield the GVL, and will continually eat up CPU resources.
Essentially, it was unrelated to TCP stuff, and more related to Threads and loops.
If you stumble on this question later on, you can put a sleep(0) in your loops if you don't want it to wait, but you want it to yield the GVL.
Check out these answers as well for more info:
Ruby infinite loop causes 100% cpu load
sleep 0 has special meaning?

How to use fibers with ruby eventmachine in non-rack?

So basically my goal is get some sort of light-weight ruby daemon(or sidekiq/resque worker), that processes jobs and notifies other apps over http. The app itself does not need to receive http requests, so no rack to remain as light-weight as possible. Pretty much a bit of ruby code I can run in loop {}
So trying to not use EventMachine' reactor pattern and using fiber approach instead. Where would I put EM.run or EM.stop in this context Thread.new { EM.run } doesn't seem to be fiber aware so adding it gave no callbacks? Is there a em-synchrony alternative to this?
#slow=true injects a sleep 3, so page 2 callback should output faster
require 'em-http-request'
require 'fiber'
def http_get(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get
# resume fiber once http call is done
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
puts "fetching some data from database for request params"
EventMachine.run do
Fiber.new{
page = http_get('http://localhost:3000/status?slow=true')
puts "notified external page it responded with: #{page.response_header.status}"
}.resume
Fiber.new{
page = http_get('http://localhost:4000/status')
puts "notified external page 2 it responded with: #{page.response_header.status}"
}.resume
puts "Finishised notification task"
end
puts "Moving on to next task as fast as possible"
Avoid reinventing the wheel, use EM::Synchrony or even better switch to celluloid or celluloid-io as EM seems to have fallen out of maintenance

How does thin asynch app example buffer response to the body?

Hi I have been going through the documentation on Thin and I am reasonably new to eventmachine but I am aware of how Deferrables work. My goal is to understand how Thin works when the body is deferred and streamed part by part.
The following is the example that I'm working with and trying to get my head around.
class DeferrableBody
include EventMachine::Deferrable
def call(body)
body.each do |chunk|
#body_callback.call(chunk)
end
# #body_callback.call()
end
def each &blk
#body_callback = blk
end
end
class AsyncApp
# This is a template async response. N.B. Can't use string for body on 1.9
AsyncResponse = [-1, {}, []].freeze
puts "Aysnc testing #{AsyncResponse.inspect}"
def call(env)
body = DeferrableBody.new
# Get the headers out there asap, let the client know we're alive...
EventMachine::next_tick do
puts "Next tick running....."
env['async.callback'].call [200, {'Content-Type' => 'text/plain'}, body]
end
# Semi-emulate a long db request, instead of a timer, in reality we'd be
# waiting for the response data. Whilst this happens, other connections
# can be serviced.
# This could be any callback based thing though, a deferrable waiting on
# IO data, a db request, an http request, an smtp send, whatever.
EventMachine::add_timer(2) do
puts "Timer started.."
body.call ["Woah, async!\n"]
EventMachine::add_timer(5) {
# This could actually happen any time, you could spawn off to new
# threads, pause as a good looking lady walks by, whatever.
# Just shows off how we can defer chunks of data in the body, you can
# even call this many times.
body.call ["Cheers then!"]
puts "Succeed Called."
body.succeed
}
end
# throw :async # Still works for supporting non-async frameworks...
puts "Async REsponse sent."
AsyncResponse # May end up in Rack :-)
end
end
# The additions to env for async.connection and async.callback absolutely
# destroy the speed of the request if Lint is doing it's checks on env.
# It is also important to note that an async response will not pass through
# any further middleware, as the async response notification has been passed
# right up to the webserver, and the callback goes directly there too.
# Middleware could possibly catch :async, and also provide a different
# async.connection and async.callback.
# use Rack::Lint
run AsyncApp.new
The part which I don't clearly understand is what happens within the DeferrableBody class in the call and the each methods.
I get that the each receives chunks of data once the timer fires as blocks stored in #body_callback and when succeed is called on the body it sends the body but when is yield or call called on those blocks how does it become a single message when sent.
I feel I don't understand closures enough to understand whats happening. Would appreciate any help on this.
Thank you.
Ok I think I figured out how the each blocks works.
Thin on post_init seems to be generating a #request and #response object when the connection comes in. The response object needs to respond to an each method. This is the method we override.
The env['async.callback'] is a closure that is assigned to a method called post_process in the connection.rb class method where the data is actually sent to connection which looks like this
#response.each do |chunk|
trace { chunk }
puts "-- [THIN] sending data #{chunk} ---"
send_data chunk
end
How the response object's each is defined
def each
yield head
if #body.is_a?(String)
yield #body
else
#body.each { |chunk| yield chunk }
end
end
So our env['async.callback'] is basically a method called post_process defined in the connection.rb class accessed via method(:post_process) allowing our method to be handled like a closure, which contains access to the #response object. When the reactor starts it first sends the header data in the next_tick where it yields the head, but the body is empty at this point so nothing gets yielded.
After this our each method overrides the old implementation owned by the #response object so when the add_timers fire the post_process which gets triggered sends the data that we supply using the body.call(["Wooah..."]) to the browser (or wherever)
Completely in awe of macournoyer and the team committing to thin. Please correct my understanding if you feel this is not how it works.

Making errors abort EventMachine process

I'm working on creating a background script that uses EventMachine to connect to a server with WebSockets. The script will be run using DelayedJob or Resque. I've been able to get it to talk to the WebSockets server and send messages, but whenever an error is raised within the EventMachine loop it doesn't crash the script - which is what should happen (and what I need to have happen). I don't have to use EventMachine as I'm only sending WebSocket messages and not receiving them - but I'd love any help on this :) thank you!
#!/usr/bin/env ruby
require 'rubygems'
require 'eventmachine'
require 'em-http'
class Job
include EventMachine::Deferrable
def self.perform
job = Job.new
EventMachine.run {
http = EventMachine::HttpRequest.new("ws://localhost:8080/").get :timeout => 0
http.errback { puts "oops" }
http.callback {
puts "WebSocket connected!"
http.send("Hello watcher")
}
http.stream { |msg| }
job.callback { puts "done" }
Thread.new {
job.execute(http)
http.close
EventMachine.stop
}
}
end
def execute(h)
sleep 1
puts "Job Runner!"
h.send("welcome!")
sleep 2
asdsadsa # here I am trying to simulate an error
sleep 1
h.send("we are all done!")
sleep 1
set_deferred_status :succeeded
end
end
Job.perform
Since you're causing an exception inside a thread, you should set Thread.abort_on_exception to true otherwise these errors will not be raised properly.
You don't need to use Thread.new here at all, in fact, it's not thread safe to do so (eventmachine itself is not thread safe, except for EM::Queue, EM::Channel and EM.schedule).
If you wanted to do synchronous things in execute, and you must have that thread, then, you'll want to call h.send via EM.schedule, for example:
EM.schedule { h.send("welcome!") }
If you must have that thread in this way, then, you want to catch exceptions from the thread you spawn yourself. You should then stop and shutdown on your own, or just raise back up in the main (eventmachine) thread:
EM.run do
thread = Thread.new do
raise 'boom'
end
EM.add_periodic_timer(0.1) { thread.join(0) }
end
The above pattern can easily just enumerate an array of threads in the periodic timer instead, if appropriate.
Finally, please note that exception bubbling (correct exception reporting) was only supported in EventMachine > 1.0, which is still in beta. To get usable backtraces when exceptions occur, either gem install eventmachine --pre, or better, use master from the Github repo.

Running a loop (such as one for a mock webserver) within a thread

I'm trying to run a mock webserver within a thread within a class. I've tried passing the class' #server property to the thread block but as soon as I try to do server.accept the thread stops. Is there some way to make this work? I want to basically be able to run a webserver off of this script while still taking user input via stdin.gets. Is this possible?
class Server
def initialize()
#server = TCPServer.new(8080)
end
def run()
#thread = Thread.new(#server) { |server|
while true
newsock = server.accept
puts "some stuff after accept!"
next if !newsock
# some other stuff
end
}
end
end
def processCommand()
# some user commands here
end
test = Server.new
while true do
processCommand(STDIN.gets)
end
In the above sample, the thread dies on server.accept
In the code you posted, you're not calling Server#run. That's probably just an oversight in making the post. Server.accept is supposed to block a thread, returning only when someone has connected.
Anyone who goes into writing an HTTP server with bright eyes soon learns that it's more fun to let someone else do that work. For quick and dirty HTTP servers, I've got good results enlisting the aid of WEBrick. It's a part of the Ruby library. Here's a WEBrick server that will serve up "Boo!" When you connect your browser to localhost:8080/:
#!/usr/bin/ruby1.8
require 'webrick'
class MiniServer
def initialize
Thread.new do
Thread::abort_on_exception = true
server = WEBrick::HTTPServer.new(:BindAddress=>'127.0.0.1',
:Port=>8080,
:Logger=>WEBrick::Log.new('/dev/stdout'))
server.mount('/', Servlet, self)
server.start
end
end
private
class Servlet < WEBrick::HTTPServlet::AbstractServlet
def initialize(webrick_server, mini_server)
end
def do_GET(req, resp)
resp.body = "<html><head></head><body>Boo!</body></html>"
end
alias :do_POST :do_GET
end
end
server = MiniServer.new
gets
I don't know ruby, but it looks like server.accept is blocking until you get a tcp connection... your thread will continue as soon as a connection is accepted.
You should start the server in your main thread and then spawn a new thread for each connection that you accept, that way your server will immediately go to accept another connection and your thread will service the one that was just accepted.

Resources