Multi-threading with Sequel gem - ruby

I have a program that is storing JSON request data into a Postgres DB using the Sequel gem (it's basically a price aggregator). However, the data is being pulled from multiple locations rapidly using threads.
Once I get the appropriate data, I currently have a mutex.synchronize with the following:
dbItem = Item.where(:sku => sku).first
dbItem = Item.create(:sku => sku, :name => itemName) if dbItem == nil
dbPrice = Price.create(:price => foundPrice, :quantity => quantity)
dbItem.add_price(dbPrice)
store.add_price(dbPrice)
If I run this without a mutex, I get threading issues - for example, the code will try to create an item in the DB, but the item will have just been created by another thread.
However, with the mutex, this code gets slowed down significantly - I'm seeing my program take ~four-six times longer.
I'm new to the whole database thing honestly, so I'm just trying to figure out the best way to handle threading. Am I doing this wrong? The documentation for Sequel actually states that almost everything threadsafe... except for model instances, which I believe my item situation falls under. It states I should freeze models first, but I don't know how to apply that here..

You aren't dealing with shared model instances, and this isn't a thread-safety issue, it is a race condition (it would happen using multiple processes, not just multiple threads). You need to use database-specific support to handle this in a race-condition free way. On PostgreSQL 9.5+, you would need to use Dataset#insert_conflict.
Sequel documentation: http://sequel.jeremyevans.net/rdoc-adapters/classes/Sequel/Postgres/DatasetMethods.html#method-i-insert_conflict
PostgreSQL documentation: https://www.postgresql.org/docs/9.5/static/sql-insert.html

Related

When to use transaction in laravel

I am currently making a turn based strategy game with laravel (mysql DB with InnoDB) engine and want to make sure that I don't have bugs due to race conditions, duplicate requests, bad actors etc...
Because these kind of bugs are hard to test, I wanted to get some clarification.
Many actions in the game can only occur once per turn, like buying a new unit. Here is a simplified bit of code for purchasing a unit.
$player = Player::find($player_id);
if($player->gold >= $unit_price && $player->has_purchased == false){
$player->has_purchased = true;
$player->gold -= $unit_price;
$player->save();
$unit = new Unit();
$unit->player_id = $player->id;
$unit->save();
}
So my concern would be if two threads both made it pass the if statement and then executed the block of code at the same time.
Is this a valid concern?
And would the solution be to wrap everything in a database transaction like https://betterprogramming.pub/using-database-transactions-in-laravel-8b62cd2f06a5 ?
This means that a good portion of my code will be wrapped around database transactions because I have a lot of instances that are variations of the above code for different actions.
Also there is a situation where multiple users will be able to update a value in the database so I want to avoid a situation where 2 users increment the value at the same time and it only gets incremented once.
Since you are using Laravel to presumably develop a web-based game, you can expect multiple concurrent connections to occur. A transaction is just one part of the equation. Transactions ensure operations are performed atomically, in your case it ensures that both the player and unit save are successful or both fail together, so you won't have the situation where the money is deducted but the unit is not granted.
However there is another facet to this, if there is a real possibility you have two separate requests for the same player coming in concurrently then you may also encounter a race condition. This is because a transaction is not a lock so two transactions can happen at the same time. The implication of this is (in your case) two checks happen on the same player instance to ensure enough gold is available, both succeed, and both deduct the same gold, however two distinct units are granted at the end (i.e. item duplication). To avoid this you'd use a lock to prevent other threads from obtaining the same player row/model, so your full code would be:
DB::transaction(function () use ($unit_price) {
$player = Player::where('id',$player_id)->lockForUpdate()->first();
if($player->gold >= $unit_price && $player->has_purchased == false){
$player->has_purchased = true;
$player->gold -= $unit_price;
$player->save();
$unit = new Unit();
$unit->player_id = $player->id;
$unit->save();
}
});
This will ensure any other threads trying to retrieve the same player will need to wait until the lock is released (which will happen at the end of the first request).
There's more nuances to deal with here as well like a player sending a duplicate request from double-clicking for example, and that can get a bit more complex.
For you purchase system, it's advisable to implement DB:transaction since it protects you from false records. Checkout the laravel docs for more information on this https://laravel.com/docs/9.x/database#database-transactions As for reactive data you need to keep track of, simply bind a variable to that data in your frontEnd, then use the variable to update your DB records.
In the case you need to exit if any exception or error occurs. If an exception is thrown the data will not save and rollback all the transactions. I recommand to use transactions as possible as you can. The basic format is:
DB::beginTransaction();
try {
// database actions like create, update etc.
DB::commit(); // finally commit to database
} catch (\Exception $e) {
DB::rollback(); // roll back if any error occurs
// something went wrong
}
See the laravel docs here

In these caching scenarios, where is the code executed?

I'm reading about caching strategies such as cache-aside, write-through, write-back, ... In the specific cases of write-through and write-back, it is implied that the cache itself is responsible for writing to the database and the event queue, respectively (For full context, here is the article - https://github.com/donnemartin/system-design-primer#when-to-update-the-cache)
For example, write-through is illustrated as
Application code:
set_user(12345, {"foo":"bar"})
Cache code:
def set_user(user_id, values):
user = db.query("UPDATE Users WHERE id = {0}", user_id, values)
cache.set(user_id, user)
For now, let's assume we're using Redis.
In the concrete example above, is the hypothetical set_user function invoked on the Redis client's machine, or on the Redis server?
Now, there seems to be ways to invoke custom logic on the Redis server, e.g., by writing Lua scripts, but I'm skeptical that that's done in practice in order to implement this caching strategy, partly because I've never heard of anyone doing it.
I've seen other articles showing this strategy is implemented solely on the Redis client's machine, but I'm not sure what resources to believe at this point.
Thanks for any help!
It's part of the application. In fact, it would be more appropriate to call the example "data store code", instead of "cache code". The set_user method belongs to a base UserStore class, with different implementations based on data store type, write policy etc. For "write-through", it would be:
class WriteThroughUserStore(UserStore):
def __init__(self, cache_user_store, db_user_store):
self.cache_user_store = cache_user_store
self.db_user_store = db_user_store
def get_user(self, user_id):
return self.cache_user_store.get_user(user_id)
def set_user(self, user):
self.db_user_store.set_user(user)
self.cache_user_store.set_user(user)
The key point of "write-through" is that the write operation is confirmed complete only after writing data to both cache and database synchronously. The order does not matter: you could update cache first, or update database first, or even do them in parallel.

How to memoize MySQL connection client cleverly in external module used from e.g. sinatra?

I think the question does not pin-point to the real problem, I have difficulties to nail it down precisely and concisely.
I have a gem that implements i.e. MySQL-database "queries" (also inserts, updates...)
module DBGEM::Query
def self.client settings=DBGEM.settings
##client ||= Mysql2::Client.new settings
end
def query_this
client.query(...)
end
def process_insert_that list_of_things
list_of_things.each do |thing|
# process
client.query(...)
end
end
Furthermore, this gem is used by a sinatra app sitting on a forking webserver like puma.
Within the sinatra-app i can now
get '/path' do
happy = DBGEM::Query.query_this
# process happy
great = DBGEM::Query.process_insert_that 1..20
# go on
end
I like that API and this code should open only one database connection.
But as far as I understood, because the code within the 'get' definition is not guaranteed to be the only one accessing the DBGEM::Query stuff at that time, weird things could happen (through race-conditions, shared internal state?).
Is there a clever way to keep the nice syntax and the connection sharing without boilerplate object creation (query = DBGEM::Query.new() #...) wrapping the stuff in a block (DBGEM::Query.process do |query| #...)?
The example above is obviously simplified. The sinatra handling might be more involved, the Queries actually done in a Service object etc.pp. Also, afaiu in a forking webserver environment, the GC would destroy the client (closing the connection - thats how mysql2 is implemented).
I think that the connection will not be closed every time.
##client is shared between DBGEM::Query object itself (in Ruby modules and classes are also objects) and all the instances of that object (to be precise: all the instances of classes to which that object is mixed in).
So, this variable will live as long as the DBGEM::Query object will live.
You can check out when DBGEM::Query object will be garbage collected, by defining finalizer logging a text and observe the server console.
module DBGEM::Query
ObjectSpace.define_finalizer(self, proc { print 'garbage collected' })
..
end
Im not sure, however I guess that DBGEM::Query object will be garbage collected only when you stop the server.
As it goes for weird "things could happen", I believe you mean potential conflicts, race conditions, situations where you create double records, or update the same record nearly at the same time overwriting something, etc. And when that happen you lose data integrity.
IMHO you can't prevent it by allowing only one client instance. I'd suggest aiming for solid database design (unique constrains, indexes, foreign keys, validations) which can raise errors when race condition occure and then handling that errors in your application.

Mutex for ActiveRecord Model

My User model has a nasty method that should not be called simultaneously for two instances of the same record. I need to execute two http requests in a row and at the same time make sure that any other thread does not execute the same method for the same record at the same time.
class User
...
def nasty_long_running_method
// something nasty will happen if this method is called simultaneously
// for two instances of the same record and the later one finishes http_request_1
// before the first one finishes http_request_2.
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
update_model
end
end
For example this would break everything:
user = User.first
Thread.new { user.nasty_long_running_method }
Thread.new { user.nasty_long_running_method }
But this would be ok and it should be allowed:
user1 = User.find(1)
user2 = User.find(2)
Thread.new { user1.nasty_long_running_method }
Thread.new { user2.nasty_long_running_method }
What would be the best way to make sure the method is not called simultaneously for two instances of the same record?
I found a gem Remote lock when searching for a solution for my problem. It is a mutex solution that uses Redis in the backend.
It:
is accessible for all processes
does not lock the database
is in memory -> fast and no IO
The method looks like this now
def nasty
$lock = RemoteLock.new(RemoteLock::Adapters::Redis.new(REDIS))
$lock.synchronize("capi_lock_#{user_id}") do
http_request_1
http_request_2
update_user
end
end
I would start with adding a mutex or semaphore. Read about mutex: http://www.ruby-doc.org/core-2.1.2/Mutex.html
class User
...
def nasty
#semaphore ||= Mutex.new
#semaphore.synchronize {
# only one thread at a time can enter this block...
}
end
end
If your class is an ActiveRecord object you might want to use Rails' locking and database transactions. See: http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html
def nasty
User.transaction do
lock!
...
save!
end
end
Update: You updated your question with more details. And it seems like my solutions do not really fit anymore. The first solutions does not work if you have multiple instances running. The second locks only the database row, it does not prevent multiple thread from entering the code block at the same time.
Therefore if would think about building a database based semaphore.
class Semaphore < ActiveRecord::Base
belongs_to :item, :polymorphic => true
def self.get_lock(item, identifier)
# may raise invalid key exception from unique key contraints in db
create(:item => item) rescue false
end
def release
destroy
end
end
The database should have an unique index covering the rows for the polymorphic association to item. That should protect multiple thread from getting a lock for the same item at the same time. Your method would look like this:
def nasty
until semaphore
semaphore = Semaphore.get_lock(user)
end
...
semaphore.release
end
There are a couple of problems to solve around this: How long do you want to wait to get the semaphore? What happens if the external http requests take ages? Do you need to store additional pieces of information (hostname, pid) to identifier what thread lock an item? You will need some kind of cleanup task the removes locks that still exist after a certain period of time or after restarting the server.
Furthermore I think it is a terrible idea to have something like this in a web server. At least you should move all that stuff into background jobs. What might solve your problem, if your app is small and needs just one background job to get everything done.
You state that this is an ActiveRecord model, in which case the usual approach would be to use a database lock on that record. No need for additional locking mechanisms as far as I can see.
Take a look at the short (one page) Rails Guides section on pessimistic locking - http://guides.rubyonrails.org/active_record_querying.html#pessimistic-locking
Basically you can get a lock on a single record or a whole table (if you were updating a lot of things)
In your case something like this should do the trick...
class User < ActiveRecord::Base
...
def nasty_long_running_method
with_lock do
// something nasty will happen if this method is called simultaneously
// for two instances of the same record and the later one finishes http_request_1
// before the first one finishes http_request_2.
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
update_model
end
end
end
I recently created a gem called szymanskis_mutex. It is a module that you can include in the class User and provides the method mutual_exclusion(concern) to provide the functionality you want.
It doesnt rely on databases and doesn't depend on how many processes want to enter the critical section at any given moment.
Note that if the class is initialized in different servers it will not work.
I may suite your needs if your app is small enough. Your code would look like this:
class User
include SzymanskisMutex
...
def nasty_long_running_method
mutual_exclusion(:nasty_long) do
http_request_1 // Takes 1-3 seconds.
http_request_2 // Takes 1-3 seconds.
end
update_model
end
end
I suggest rethinking your architecture as this is not going to be scalable - imagine having multiple ruby processes, failing processes, timeouts etc. Also in-process locking and spawning threads is quite dangerous for application servers.
If you want to sleep well with production then try some async background processing framework for long running tasks with serial queue which will ensure order of running tasks. Just simple RabbitMQ or check this QA Best practice for Rails App to run a long task in the background? , eventually try DB but Optimistic Locking.

Blocking findAndModify in Ruby MongoDB Driver

I'm trying to achieve something like this in MonogDB:
require 'base64'
require 'mongo'
class MongoDBQueue
def enq(thing)
collection.insert({ payload: Base64.encode64(Marshal.dump(thing))})
end
alias :<< :enq
def deq
until _r = collection.find_and_modify({ sort: {_id: Mongo::ASCENDING}, remove: true})
Thread.pass
end
return Marshal.load(Base64.decode64(_r["payload"]))
end
alias :pop :deq
private
def collection
# database, collection & mongodb index semantics here
end
end
Naturally enough I want a Disk-backed queue in Ruby that doesn't destroy my available memory, I'm using this with the Anemone web spider framework which by default uses the Queue class, there's a fork which can use the SizedQueue class, however when using a SizedQueue for both the "page queue" and "links queue", it often deadlocks, presumably because it's trying to dequeue a page and process it, and it's found new links, and that situation cannot be reconciled.
There's also an existing implementation of a Redis queue, however that also exhausts all my available memory on this machine (Available memory is 16Gb, so it's not trivial)
Because of that I want to use this MongoDB backend, but I think the implementation is insane. The Thread.pass feels like a horrible solution, but Anemone is multi-threaded, and MongoDB doesn't support blocking reads, so it's a tricky situation.
Here's my references:
Redis queue implementation for anemone: https://github.com/chriskite/anemone/blob/queueadapter/lib/anemone/queue/redis.rb
MongoDB findAndModify: http://www.mongodb.org/display/DOCS/findAndModify+Command
Questions:
Can anyone comment about how sane this is, compared to sleep (which should trigger the VM to pass control to the next thread, anyway, but sleep feels dirtier)
Should I perhaps Thread.pass and sleep? ( I guess not, see above)
Can I make that read from MongoDB block? There was talk of that here, but never came to anything: https://groups.google.com/forum/?fromgroups=#!topic/mongodb-user/rqnHNFXaZ0w
1) Reads in MongoDB are blocking. If you do a findOne() or a findAndModify(), the call will not return until the data is present in the client side. If you do a find(), the call will not return until you get a cursor: you can then iterate on the cursor as much as you need.
2) By default, writes to MongoDB are "fire and forget". If you care about data integrity, you need to do either safe writes by setting :safe => true in your connection, database, or collection object
Kernel.sleep is actually a better solution, as otherwise you'll spin there (albeit passing control to other threads after each query).
As the findAndModify is atomic, only one thread (even on JRuby) will take the job, so I don't quite understand what's the "blocking" issue here.

Resources