Cases when tarantool yields - tarantool

i'm newbie in tarantool and want to ask general question about implicit yelds. At documentation:
"There are implicit yields: every data-change operation or network-access causes an implicit yield, and every statement that goes through the tarantool client causes an implicit yield."
1) What does it mean "every statement that goes through the tarantool client" ? Will be better to get more information about it case. Is it about connectors or common client ?
2) Will be tarantool yelds in data-change operation if WAL log is switched off by setting wal_mode to none ?

1) Poor wording in the documentation. Network/disk IO yields. If you're using box.begin() and box.commit(), then Disk IO must not yield control to another fiber, until box.commit().
2) Tarantool won't yield, if insert/delete/update/upsert executed, but wal_mode = 'none'.

Related

Raise exception when TCP connection broken

I'm building a server which accepts connections through TCP (using TCPServer). I mostly just read data (socket.gets.chomp) and write data (socket.print).
socket.gets will return nil if the connection has been closed by the client in the meantime, so .chomp will raise NoMethodError. This is hard to handle specifically since it's such an unspecific exception - I want to distinguish exceptions caused by the connection loss from other causes of NoMethodError, such as me typoing a method.
Ideally, I would receive something more specific such as SocketError whenever trying to interact with a closed socket, rather than just getting back nil. How could I accomplish that?
I have already considered these options:
Write a wrapper for TCPSocket or IO which checks on socket availability before every call (a lot of work to do cleanly considering how many methods there are in IO)
Check each return value for nil (even more effort and code redundancy as my application grows, also I would still .print to the socket when it's already closed)
Monkey patching NilClass for chomp (again only handles this specific use case, and monkey patching should be avoided for clean code)
Being at end of file is not intrinsically an error, nor is it normally understood to mean a "broken" connection like your title says.
For example, HTTP allows multiple requests to be sent over a single connection. After completely reading a request you can read again, and if the connection is closed you'd get nil, which tells you there are no more requests coming. This situation isn't considered an error condition by most/all HTTP software.
Most Ruby software handles nil return from read as an indication that the network conversation is over (successfully). I suggest you do something like that.
If you wish to consider EOF an error, you could create a wrapper class for IO that would "upgrade" nil return from read into an exception of some kind, but I would suggest rethinking whether this is really what you need.
See also https://ruby-doc.org/core-3.0.0/IO.html#method-i-read.

How to keep a persistent connection to SQL Server using Ruby Sequel and Tiny_TDS while in a loop

I have a ruby script that needs to run continually on the server. I've daemonized it using the daemon gem, and in my script I have it running in an infinite loop, since the daemon gem handles starting and stopping of the process that kicks off my script. In my script, I start out by setting up my DB instance using the Sequel gem and tiny_tds. Like so:
DB = Sequel.connect(adapter: 'tinytds', host: MSSQLHost, database: MSSQLDatabase, user: MSSQLUser, password: MSSQLPassword)
Then I have a loop do that is my infinite loop. Inside that, I test to see if I have a connection using DB.test_connection and then I query the DB every second or so to check if there is new content using a query such as:
DB['SELECT * FROM dbo.[MyTable]'].all do |row|
# MY logic here
# As part of my logic I test to see if I need to delete this row in the table and if so I use
DB.run('DELETE FROM dbo.[MyTable] WHERE some condition')
end
Then at the end of my logic, just before I loop again, I do:
sleep 1
DB.disconnect
All of this works great for about an hour to an hour and a half with everything checking the table, doing the logic, deleting rows, etc., then it dies and gives me this error message TinyTds::Error: Adaptive Server connection timed out
My question, why is that happening? Do I need to reformat my code in a different way? Why doesn't the DB.test_connection do what it is advertised to do? The documentation on that says it checks for a connection in the connection pool, and uses it if it finds it, and creates a new one otherwise.
Any help would be much appreciated
DB.test_connection just acquires a connection from the connection pool, it doesn't check that the connection is still valid (it must have been valid at one point or it wouldn't be in the pool). There's no way that a connection is still valid without actually sending a query. You can use the connection_validator extension that ships with Sequel if you want to do that automatically.
If you are loading Sequel before forking, you need to make sure you call DB.disconnect before forking, otherwise you can end up with multiple forked processes sharing the same connection, which can cause many different issues.
I finally ended up just putting a rescue statement in there that caught this, and re-ran my line of code to create the DB instance, yes, it puts a warning in my log about already setting that instance, but I guess I could just make that not a contstant an that would go away. Anyway, it appears to be working now, and the times it does timeout, I'm recovering gracefully from those. I just wish I could have figured out why it was/is disconnecting like it is.

What's the correct way to check if a host-alive and handle timeouts efficiently?

I'm trying to check if a given host is up, running, and listening to a specific port, and to handle any errors correctly.
I found a a number of references of Ruby socket programming but none of them seems to able to handle "socket time-out" efficiently. I tried IO.select, which takes four parameters, of which, the last one is the timeout value:
IO.select([TCPSocket.new('example.com', 22)], [nil], [nil], 4)
The problem is, it gets stuck, especially if the port number is wrong or the server is not listening on to it. So, finally I ended up with this, which I didn't like that much but doing the job:
require 'socket'
require 'timeout'
dns = "example.com"
begin
Timeout::timeout(3) { TCPSocket.new(dns, 22) }
puts "Responded!!"
# do some stuff here...
rescue SocketError
puts "No connection!!"
# do some more stuff here...
rescue Timeout::Error
puts "No connection, timed out!!"
# do some other stuff here...
end
Is there a better way doing this?
The best test for availability of any resource is to try to use it. Adding extra code to try to predict ahead of time whether the use will work is bound to fail:
You test the wrong thing and get a different answer.
You test the right thing but at the wrong time, and the answer changes between the test and the use, and your application performs double the work for nothing, and you write redundant code.
The code you have to write to handle the test failure is identical to the code you should write to handle the use-failure. Why write that twice?
We make extensive use of Net::SSH in one of our systems, and ran into timeout issues.
Probably the biggest fix was to implement use of the select method, to set a low-level timeout, and not try to use the Timeout class, which is thread based.
"How do I set the socket timeout in Ruby?" and "Set socket timeout in Ruby via SO_RCVTIMEO socket option" have code to investigate for that. Also, one of those links to "Socket Timeouts in Ruby" which has useful code, however be aware that it was written for Ruby 1.8.6.
The version of Ruby can make a difference too. Pre-1.9 the threading wasn't capable of stopping a blocking IP session so the code would hang until the socket timed out, then the Timeout would fire. Both the above questions go over that.

Blocking findAndModify in Ruby MongoDB Driver

I'm trying to achieve something like this in MonogDB:
require 'base64'
require 'mongo'
class MongoDBQueue
def enq(thing)
collection.insert({ payload: Base64.encode64(Marshal.dump(thing))})
end
alias :<< :enq
def deq
until _r = collection.find_and_modify({ sort: {_id: Mongo::ASCENDING}, remove: true})
Thread.pass
end
return Marshal.load(Base64.decode64(_r["payload"]))
end
alias :pop :deq
private
def collection
# database, collection & mongodb index semantics here
end
end
Naturally enough I want a Disk-backed queue in Ruby that doesn't destroy my available memory, I'm using this with the Anemone web spider framework which by default uses the Queue class, there's a fork which can use the SizedQueue class, however when using a SizedQueue for both the "page queue" and "links queue", it often deadlocks, presumably because it's trying to dequeue a page and process it, and it's found new links, and that situation cannot be reconciled.
There's also an existing implementation of a Redis queue, however that also exhausts all my available memory on this machine (Available memory is 16Gb, so it's not trivial)
Because of that I want to use this MongoDB backend, but I think the implementation is insane. The Thread.pass feels like a horrible solution, but Anemone is multi-threaded, and MongoDB doesn't support blocking reads, so it's a tricky situation.
Here's my references:
Redis queue implementation for anemone: https://github.com/chriskite/anemone/blob/queueadapter/lib/anemone/queue/redis.rb
MongoDB findAndModify: http://www.mongodb.org/display/DOCS/findAndModify+Command
Questions:
Can anyone comment about how sane this is, compared to sleep (which should trigger the VM to pass control to the next thread, anyway, but sleep feels dirtier)
Should I perhaps Thread.pass and sleep? ( I guess not, see above)
Can I make that read from MongoDB block? There was talk of that here, but never came to anything: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/rqnHNFXaZ0w
1) Reads in MongoDB are blocking. If you do a findOne() or a findAndModify(), the call will not return until the data is present in the client side. If you do a find(), the call will not return until you get a cursor: you can then iterate on the cursor as much as you need.
2) By default, writes to MongoDB are "fire and forget". If you care about data integrity, you need to do either safe writes by setting :safe => true in your connection, database, or collection object
Kernel.sleep is actually a better solution, as otherwise you'll spin there (albeit passing control to other threads after each query).
As the findAndModify is atomic, only one thread (even on JRuby) will take the job, so I don't quite understand what's the "blocking" issue here.

How to use ruby fibers to avoid blocking IO

I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.
Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.
Any way I can solve this problem without threads?
I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html
Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.
If you want to upload multiple files at once, simply launch a new Thread for each file:
t = Thread.new do
upload_file(param1, param2)
end
#all_threads << t
Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):
#all_threads.each do |t|
t.join if t.alive?
end
The Queue can either be a #member_variable or a $global.
To answer your actual questions:
Can Fibers help me with this at all?
No they can't. Jörg W Mittag explains why best.
No, you cannot do concurrency with Fibers. Fibers simply aren't a concurrency construct, they are a control-flow construct, like Exceptions. That's the whole point of Fibers: they never run in parallel, they are cooperative and they are deterministic. Fibers are coroutines. (In fact, I never understood why they aren't simply called Coroutines.)
The only concurrency construct in Ruby is Thread.
When he says that the only concurrency contruct in Ruby is Thread, remember that there are many different implimentations of Ruby and that they vary in their threading implementations. Jörg once again provides a great answer to these differences; and correctly concludes that only something like JRuby (that uses JVM threads mapped to native threads) or forking your process is how you can achieve true parallelism.
Any way I can solve this problem without threads?
Other than forking your process, I would also suggest that you look at EventMachine and something like em-http-request. It's an event driven, non-blocking, reactor pattern based HTTP client that is asynchronous and does not incur the overhead of threads.
Aaron Patterson (#tenderlove) uses an example almost exactly like yours to describe exactly why you can and should use threads to achieve concurrency in your situation.
Most I/O libraries are now smart enough to release the GVL (Global VM Lock, or most people know it as the GIL or Global Interpreter Lock) when doing IO. There is a simple function call in C to do this. You don't need to worry about the C code, but for you this means that most IO libraries worth their salt are going to release the GVL and allow other threads to execute while the thread that is doing the IO waits for the data to return.
If what I just said was confusing, you don't need to worry about it too much. The main thing that you need to know is that if you are using a decent library to do your HTTP requests (or any other I/O operation for that matter... database, interprocess communication, whatever), the Ruby interpreter (MRI) is smart enough to be able to release the lock on the interpreter and allow other threads to execute while one thread awaits IO to return. If the next thread has its own IO to grab, the Ruby interpreter will do the same thing (assuming that the IO library is built to utilize this feature of Ruby, which I believe most are these days).
So, to sum up what I am saying, use threads! You should see the performance benefit. If not, check to see whether your http library is using the rb_thread_blocking_region() function in C and, if not, find out why not. Maybe there is a good reason, maybe you need to consider using a better library.
The link to the Aaron Patterson video is here: http://www.youtube.com/watch?v=kufXhNkm5WU
It is worth a watch, even if just for the laughs, as Aaron Patterson is one of the funniest people on the internet.
You could use separate processes for this instead of threads:
#!/usr/bin/env ruby
$stderr.sync = true
# Number of children to use for uploading
MAX_CHILDREN = 5
# Hash of PIDs for children that are working along with which file
# they're working on.
#child_pids = {}
# Keep track of uploads that failed
#failed_files = []
# Get the list of files to upload as arguments to the program
#files = ARGV
### Wait for a child to finish, adding the file to the list of those
### that failed if the child indicates there was a problem.
def wait_for_child
$stderr.puts " waiting for a child to finish..."
pid, status = Process.waitpid2( 0 )
file = #child_pids.delete( pid )
#failed_files << file unless status.success?
end
### Here's where you'd put the particulars of what gets uploaded and
### how. I'm just sleeping for the file size in bytes * milliseconds
### to simulate the upload, then returning either +true+ or +false+
### based on a random factor.
def upload( file )
bytes = File.size( file )
sleep( bytes * 0.00001 )
return rand( 100 ) > 5
end
### Start a child uploading the specified +file+.
def start_child( file )
if pid = Process.fork
$stderr.puts "%s: uploaded started by child %d" % [ file, pid ]
#child_pids[ pid ] = file
else
if upload( file )
$stderr.puts "%s: done." % [ file ]
exit 0 # success
else
$stderr.puts "%s: failed." % [ file ]
exit 255
end
end
end
until #files.empty?
# If there are already the maximum number of children running, wait
# for one to finish
wait_for_child() if #child_pids.length >= MAX_CHILDREN
# Start a new child working on the next file
start_child( #files.shift )
end
# Now we're just waiting on the final few uploads to finish
wait_for_child() until #child_pids.empty?
if #failed_files.empty?
exit 0
else
$stderr.puts "Some files failed to upload:",
#failed_files.collect {|file| " #{file}" }
exit 255
end

Resources