Ruby Thread & Mutex : why does my code failed to fetch JSON in sequence? - ruby

I wrote a crawler which uses 8 threads to download JSON from the Internet:
#encoding: utf-8
require 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'
$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads
$cnt = 0 # number of lines in this COMMIT to database
db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000
def fetch(http, url, timeout = 10)
# ...
end
def parsePrice( i, db)
ss = fetch(Net::HTTP.start('p.3.cn',80), 'http://p.3.cn/prices/get?skuid=J_'+i.to_s)
doc = JSON.parse(ss)[0]
puts "processing "+i.to_s
STDOUT.flush
begin
$mutex.synchronize {
$cnt = $cnt+1
db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
if $cnt > 20
db.execute('COMMIT')
db.execute('BEGIN')
$cnt = 0
end
}
rescue SQLite3::ConstraintException
warn("duplicate id: "+i.to_s)
$cntMutex.synchronize {
$threadCnt -= 1;
}
Thread.terminate
rescue NoMethodError
warn("Matching failed")
rescue
raise
ensure
end
$cntMutex.synchronize {
$threadCnt -= 1;
}
end
puts "will now start from " + start.to_s()
db.execute("BEGIN")
Thread.new {
for ii in start..12000000 do
sleep 0.1 while $threadCnt > 7
$cntMutex.synchronize {
$threadCnt += 1;
}
Thread.new {
parsePrice( ii, db)
}
end
db.execute('COMMIT')
} . join
Then I created a database named price.db:
sqlite3 > create table prices (id INT PRIMATY KEY, price REAL);
To make my code thread-safe, db, $cnt, $threadCnt are all protected by $mutex or $cntMutex.
However, when I tried to run this script, the following messages were printed:
[lz#lz crawl]$ ruby priceCrawler.rb
will now start from 10000000
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000008
http://p.3.cn/prices/get?skuid=J_10000002http://p.3.cn/prices/get?skuid=J_10000002
processing 10000002
processing 10000002processing 10000008processing 10000008processing 10000002
duplicate id: 10000002
duplicate id: 10000002processing 10000008
processing 10000008duplicate id: 10000008
duplicate id: 10000008processing 10000008
duplicate id: 10000008
It seems that this script skipped some id and called parsePrice with the same id more than once.
So why did this error occur? Any help would be appreciated.

It seems to me that your thread scheduling is wrong. I have modified your code to illustrates some possible race conditions you were triggering.
re 'net/http'
require 'sqlite3'
require 'zlib'
require 'json'
require 'thread'
$mutex = Mutex.new # Lock of database and $cnt
$cntMutex = Mutex.new # Lock of $threadCnt
$threadCnt = 0 # number of running threads
$cnt = 0 # number of lines in this COMMIT to database
db = SQLite3::Database.new "price.db"
db.results_as_hash = true
STDOUT.sync = true
start = 10000000
def fetch(http, url, timeout = 10)
# ...
end
def parsePrice(i, db)
must_terminate = false
ss = fetch(Net::HTTP.start('p.3.cn',80), "http://p.3.cn/prices/get?skuid=J_#{i}")
doc = JSON.parse(ss)[0]
puts "processing #{i}"
STDOUT.flush
begin
$mutex.synchronize {
$cnt = $cnt+1
db.execute("insert into prices (id, price) VALUES (?,?)", [i,doc["p"].to_f])
if $cnt > 20
db.execute('COMMIT')
db.execute('BEGIN')
$cnt = 0
end
}
rescue SQLite3::ConstraintException
warn("duplicate id: #{i}")
must_terminate = true
rescue NoMethodError
warn("Matching failed")
rescue
# Raising here does not prevent ensure from running.
# It will raise after we decrement $threadCnt on
# ensure clause.
raise
ensure
$cntMutex.synchronize {
$threadCnt -= 1;
}
end
Thread.terminate if must_terminate
end
puts "will now start from #{start}"
# This begin makes no sense for me.
db.execute("BEGIN")
for ii in start..12000000 do
should_redo = false
# Instead of sleeping, we acquire the lock and check
# if we can create another thread. If we can't, we just
# release the lock and retry latter (using for-redo).
$cntMutex.synchronize{
if $threadCnt <= 7
$threadCnt += 1;
Thread.new { parsePrice(ii, db) }
else
# We use this flag since we don't know for sure redo's
# behavior inside a lock.
should_redo = true
end
}
# Will redo this iteration if we can't create the thread.
if should_redo
# Mitigate busy waiting a bit.
sleep(0.1)
redo
end
end
# This commit makes no sense to me.
db.execute('COMMIT')
Thread.list.each { |t| t.join }
Also, most databases already implement locks themselves. You can probably remove the mutex that locks the database. And another advice is that you be more consistent with your commits. You have a lot of scattered begins and commits in the code. I suggest that you either make the operation and then commit or use a commit buffer and then commit everything in a single place.
The race condition, it seems you were not being careful enough when dealing with $threadCnt. The implementation I gave you makes more sense to me, but I have not tested it.
The redo in the main loop is a form of busy waiting, which is bad for performance. You can and you should put a sleep clause there. But it is essential that you maintain the $threadCnt checking and updating inside the lock. The way you implemented it before did not ensure the check and updating was an atomic operation.

Related

Redis semaphore locks can't be released

I am using the redis-semaphore gem, version 0.3.1.
For some reason, I occasionally can't release a stale Redis lock. From my analysis it seems to happen if my Docker process crashed after the lock was created.
I have described my debugging process below and would like to know if anyone can suggest how to further debug.
Assume that we want to create a redis lock with this name:
name = "test"
We insert this variable in two different terminal windows. In the first, we run:
def lock_for_15_secs(name)
job = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
if job.lock(-1) == "0"
puts "Locked and starting"
sleep(15)
puts "Now it's stale, try to release in another process"
sleep(15)
puts "Now trying to unlock"
unlock = job.unlock
puts unlock == false ? "Wuhuu, already unlocked" : "Hm, should have been unlocked by another process, but wasn't"
end
end
lock_for_15_secs(name)
In the second we run:
def release_and_lock(name)
job = Redis::Semaphore.new(name.to_sym, redis: NonBlockingRedis.new(), custom_blpop: true, :stale_client_timeout => 15)
release = job.release_stale_locks!
count = job.available_count
puts "Release reponse is #{release.inspect} and available count is #{count}"
if job.lock(-1) == "0"
puts "Wuhuu, we can lock it"
job.unlock
else
puts "Hmm, we can't lock it"
end
end
release_and_lock(name)
This usually plays out as expected. For 15 seconds, the second terminal can't relase the lock, but when run after 15 seconds, it releases. Below is the output from release_and_lock(name).
Before 15 seconds have passed:
irb(main):1:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 0
Hmm, we can't lock it
=> nil
After 15 seconds have passed:
irb(main):2:0> release_and_lock(name)
Release reponse is {"0"=>"1580292557.321834"} and available count is 1
Wuhuu, we can lock it
=> 1
irb(main):3:0> release_and_lock(name)
Release reponse is {} and available count is 1
Wuhuu, we can lock it
But whenever I see that a stale lock isn't released, and I run release_and_lock(name) to diagnose, this is returned:
irb(main):4:0> release_and_lock(name)
Release reponse is {} and available count is 0
Hmm, we can't lock it
And at this point my only option is to flush redis:
require 'non_blocking_redis'
non_blocking_redis = NonBlockingRedis.new()
non_blocking_redis.flushall
P.s. My NonBlockingRedis inherits from Redis:
class Redis
class Semaphore
def initialize(name, opts = {})
#custom_opts = opts
#name = name
#resource_count = opts.delete(:resources) || 1
#stale_client_timeout = opts.delete(:stale_client_timeout)
#redis = opts.delete(:redis) || Redis.new(opts)
#use_local_time = opts.delete(:use_local_time)
#custom_blpop = opts.delete(:custom_blpop) # false=queue, true=cancel
#tokens = []
end
def lock(timeout = 0)
exists_or_create!
release_stale_locks! if check_staleness?
token_pair = #redis.blpop(available_key, timeout, #custom_blpop)
return false if token_pair.nil?
current_token = token_pair[1]
#tokens.push(current_token)
#redis.hset(grabbed_key, current_token, current_time.to_f)
if block_given?
begin
yield current_token
ensure
signal(current_token)
end
end
current_token
end
alias_method :wait, :lock
end
end
class NonBlockingRedis < Redis
def initialize(options = {})
if options.empty?
options = {
url: Rails.application.secrets.redis_url,
db: Rails.application.secrets.redis_sidekiq_db,
driver: :hiredis,
network_timeout: 5
}
end
super(options)
end
def blpop(key, timeout, custom_blpop)
if custom_blpop
if timeout == -1
result = lpop(key)
return result if result.nil?
return [key, result]
else
super(key, timeout)
end
else
super
end
end
def lock(timeout = 0)
exists_or_create!
release_stale_locks! if check_staleness?
token_pair = #redis.blpop(available_key, timeout, #custom_blpop)
return false if token_pair.nil?
current_token = token_pair[1]
#tokens.push(current_token)
#redis.hset(grabbed_key, current_token, current_time.to_f)
if block_given?
begin
yield current_token
ensure
signal(current_token)
end
end
current_token
end
alias_method :wait, :lock
end
require 'non_blocking_redis'
😜 An awesome bug 👏
The bug
I think it happens if you kill the process when it does lpop on the SEMAPHORE:test:AVAILABLE
Most probably here https://github.com/dv/redis-semaphore/blob/v0.3.1/lib/redis/semaphore.rb#L67
To replicate it
NonBlockingRedis.new.flushall
release_and_lock('test');
NonBlockingRedis.new.lpop('SEMAPHORE:test:AVAILABLE')
Now initially you have:
SEMAPHORE:test:AVAILABLE 0
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
After the above code you get:
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
The code checks the SEMAPHORE:test:EXISTS and then expects to have SEMAPHORE:test:AVAILABLE / SEMAPHORE:test:GRABBED
Solution
From my brief check I don't think it is possible to make the gem work without a modification. I tried adding an expiration: but somehow it managed to disable the expiration for SEMAPHORE:test:EXISTS
NonBlockingRedis.new.ttl('SEMAPHORE:test:EXISTS') # => -1 and it should have been e.g. 20 seconds and going down
So.. maybe a fix will be
class Redis
class Semaphore
def exists_or_create!
token = #redis.getset(exists_key, EXISTS_TOKEN)
if token.nil? || all_tokens.empty?
create!
else
# Previous versions of redis-semaphore did not set `version_key`.
# Make sure it's set now, so we can use it in future versions.
if token == API_VERSION && #redis.get(version_key).nil?
#redis.set(version_key, API_VERSION)
end
true
end
end
end
end
the all_tokens is https://github.com/dv/redis-semaphore/blob/v0.3.1/lib/redis/semaphore.rb#L120
I'll open a PR to the gem shortly -> https://github.com/dv/redis-semaphore/pull/66 maybe 🤷‍♂️
Note 1
Not sure how you use the NonBlockingRedis but it is not in use in Redis::Semaphore. You do lock(-1) which does in the code lpop. Also the code never calls your lock.
Random
Here is a helper to dump the keys
class Test
def self.all
r = NonBlockingRedis.new
puts r.keys('*').map { |k|
[
k,
((r.hgetall(k) rescue r.get(k)) rescue r.lrange(k, 0, -1).join(' | '))
].join("\t\t")
}
end
end
> Test.all
SEMAPHORE:test:AVAILABLE 0
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
For completeness here is how it looks when you have grabbed the lock
SEMAPHORE:test:VERSION 1
SEMAPHORE:test:EXISTS 1
SEMAPHORE:test:GRABBED {"0"=>"1583672948.7168388"}

repeatedly read Ruby IO until X bytes have been read, Y seconds have elapsed, or EOF, whichever comes first

I want to forward logs from an IO pipe to an API. Ideally, there would be no more than e.g. 10 seconds of latency (so humans watching the log don't get impatient).
A naive way to accomplish this would be to use IO.each_byte and send each byte to the API as soon as it becomes available, but the overhead of processing a request per byte causes additional latency.
IO#each(limit) also gets close to what I want, but if the limit is 50 kB and after 10 seconds, only 20 kB has been read, I want to go ahead and send that 20 kB without waiting for more. How can I apply both a time and size limit simultaneously?
A naïve approach would be to use IO#each_byte enumerator.
The contrived, not tested example:
enum = io.each_byte
now = Time.now
res = while Time.now - now < 20 do
begin
send_byte enum.next
rescue e => StopIteration
# no more data
break :closed
end
end
puts "NO MORE DATA" if res == :closed
Here's what I ended up with. Simpler solutions still appreciated!
def read_chunks(io, byte_interval: 200 * 1024, time_interval: 5)
buffer = last = nil
reset = lambda do
buffer = ''
last = Time.now
end
reset.call
mutex = Mutex.new
cv = ConditionVariable.new
[
lambda do
IO.select [io]
mutex.synchronize do
begin
chunk = io.readpartial byte_interval
buffer.concat chunk
rescue EOFError
raise StopIteration
ensure
cv.signal
end
end
end,
lambda do
mutex.synchronize do
until io.eof? || Time.now > (last + time_interval) || buffer.length > byte_interval
cv.wait mutex, time_interval
end
unless buffer.empty?
buffer_io = StringIO.new buffer
yield buffer_io.read byte_interval until buffer_io.eof?
reset.call
end
raise StopIteration if io.eof?
end
end,
].map do |function|
Thread.new { loop { function.call } }
end.each(&:join)
end

Two version of the same code not giving the same result

I am trying to implement a simple timeout class that handles timeouts of different requests.
Here is the first version:
class MyTimer
def handleTimeout mHash, k
while mHash[k] > 0 do
mHash[k] -=1
sleep 1
puts "#{k} : #{mHash[k]}"
end
end
end
MAX = 3
timeout = Hash.new
timeout[1] = 41
timeout[2] = 5
timeout[3] = 14
t1 = MyTimer.new
t2 = MyTimer.new
t3 = MyTimer.new
first = Thread.new do
t1.handleTimeout(timeout,1)
end
second = Thread.new do
t2.handleTimeout(timeout,2)
end
third = Thread.new do
t3.handleTimeout(timeout,3)
end
first.join
second.join
third.join
This seems to work fine. All the timeouts work independently of each other.
Screenshot attached
The second version of the code however produces different results:
class MyTimer
def handleTimeout mHash, k
while mHash[k] > 0 do
mHash[k] -=1
sleep 1
puts "#{k} : #{mHash[k]}"
end
end
end
MAX = 3
timeout = Hash.new
timers = Array.new(MAX+1)
threads = Array.new(MAX+1)
for i in 0..MAX do
timeout[i] = rand(40)
# To see timeout value
puts "#{i} : #{timeout[i]}"
end
sleep 1
for i in 0..MAX do
timers[i] = MyTimer.new
threads[i] = Thread.new do
timers[i].handleTimeout( timeout, i)
end
end
for i in 0..MAX do
threads[i].join
end
Screenshot attached
Why is this happening?
How can I implement this functionality using arrays?
Is there a better way to implement the same functionality?
In the loop in which you are creating threads by using Thread.new, the variable i is shared between main thread (where threads are getting created) and in the threads created. So, the value of i seen by handleTimeout is not consistent and you get different results.
You can validate this by adding a debug statement in your method:
#...
def handleTimeout mHash, k
puts "Handle timeout called for #{mHash} and #{k}"
#...
end
#...
To fix the issue, you need to use code like below. Here parameters are passed to Thread.new and subsequently accessed using block variables.
for i in 0..MAX do
timers[i] = MyTimer.new
threads[i] = Thread.new(timeout, i) do |a, b|
timers[i].handleTimeout(a, b)
end
end
More on this issue is described in When do you need to pass arguments to Thread.new? and this article.

Ruby understanding multithreading

I'm trying to multithread loop in ruby following this exmaple: http://t-a-w.blogspot.com/2010/05/very-simple-parallelization-with-ruby.html.
I copied that coded and wrote this:
module Enumerable
def ignore_exception
begin
yield
rescue Exception => e
STDERR.puts e.message
end
end
def in_parallel(n)
t_queue = Queue.new
threads = (1..n).map {
Thread.new{
while x = t_queue.deq
ignore_exception{ yield(x[0]) }
end
}
}
each{|x| t_queue << [x]}
n.times{ t_queue << nil }
threads.each{|t|
t.join
unless t[:out].nil?
puts t[:out]
end
}
end
end
ids.in_parallel(10){ |id|
conn = open_conn(loc)
out = conn.getData(id)
Thread.current[:out] = out
}
The way I understand it is that it will dequeue 10 items at a time, process the block in the loop per id and join the 10 threads at the end, and repeat until finished. After running this code I get different results sometimes, especially if the size of my ids is less then 10, and I am confused why this is occuring. Half the time it will not output anything for upto half the ids, even though I can check on server side that output for those ids exists. For example if the correct output is "Got id 1" and "Got id 2", it will only print out {"Got id 1"} or {"Got id 2"} or {"Got id 1", "Got id 2"}. My question is that is that is my understanding of this code correct?
The issue in my code was the open_conn() function call, which was not thread safe. I fixed the issue by synchronizing around getting the connection handle:
connLock = Mutex.new
ids.in_parallel(10){ |id|
conn = nil
connLock.synchronize {
conn = open_conn(loc)
}
out = conn.getData(id)
Thread.current[:out] = out
}
Also should use http://peach.rubyforge.org/ for the loop parallelization by using:
ids.peach(10){ |id| ... }

Ruby Pause thread

In ruby, is it possible to cause a thread to pause from a different concurrently running thread.
Below is the code that I've written so far. I want the user to be able to type 'pause thread' and the sample500 thread to pause.
#!/usr/bin/env ruby
# Creates a new thread executes the block every intervalSec for durationSec.
def DoEvery(thread, intervalSec, durationSec)
thread = Thread.new do
start = Time.now
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = Thread
beginTime = Time.now
DoEvery(sample500, 0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")}}
when "pause thread\n"
sample500.stop
when "resume thread"
sample500.run
when "exit\n"
exit = TRUE
end
end
Passing Thread object as argument to DoEvery function makes no sense because you immediately overwrite it with Thread.new, check out this modified version:
def DoEvery(intervalSec, durationSec)
thread = Thread.new do
start = Time.now
Thread.current["stop"] = false
timeTakenToComplete = 0
loopCounter = 0
while(timeTakenToComplete < durationSec && loopCounter += 1)
if Thread.current["stop"]
Thread.current["stop"] = false
puts "paused"
Thread.stop
end
yield
finish = Time.now
timeTakenToComplete = finish - start
sleep(intervalSec*loopCounter - timeTakenToComplete)
end
end
thread
end
# User input loop.
exit = nil
while(!exit)
userInput = gets
case userInput
when "start thread\n"
sample500 = DoEvery(0.5, 30) {File.open('abc', 'a') {|file| file.write("a\n")} }
when "pause thread\n"
sample500["stop"] = true
when "resume thread\n"
sample500.run
when "exit\n"
exit = TRUE
end
end
Here DoEvery returns new thread object. Also note that Thread.stop called inside running thread, you can't directly stop one thread from another because it is not safe.
You may be able to better able to accomplish what you are attempting using Ruby Fiber object, and likely achieve better efficiency on the running system.
Fibers are primitives for implementing light weight cooperative
concurrency in Ruby. Basically they are a means of creating code
blocks that can be paused and resumed, much like threads. The main
difference is that they are never preempted and that the scheduling
must be done by the programmer and not the VM.
Keeping in mind the current implementation of MRI Ruby does not offer any concurrent running threads and the best you are able to accomplish is a green threaded program, the following is a nice example:
require "fiber"
f1 = Fiber.new { |f2| f2.resume Fiber.current; while true; puts "A"; f2.transfer; end }
f2 = Fiber.new { |f1| f1.transfer; while true; puts "B"; f1.transfer; end }
f1.resume f2 # =>
# A
# B
# A
# B
# .
# .
# .

Resources