I'm receiving this error:
RuntimeError (Circular dependency detected while autoloading constant Apps
when I'm multithreading. Here is my code below. Why is this happening?
The reason I am trying to multithread is because I am writing a HTML scraping app.
The call to Nokogiri::HTML(open()) is a synchronous blocking call that takes 1 second to return, and I have 100,000+ pages to visit, so I am trying to run several threads to overcome this issue. Is there a better way of doing this?
class ToolsController < ApplicationController
def getWebsites
t1=Thread.new{func1()}
t2=Thread.new{func1()}
t3=Thread.new{func1()}
t4=Thread.new{func1()}
t5=Thread.new{func1()}
t6=Thread.new{func1()}
t1.join
t2.join
t3.join
t4.join
t5.join
t6.join
end
def func1
puts Thread.current
apps = Apps.order("RANDOM()").where("apps.website IS NULL").take(1)
while apps.size == 1 do
app = apps[0]
puts app.name
puts app.iTunes
doc = Nokogiri::HTML(open(app.iTunes))
array = doc.css('div.app-links a').map { |link|
url = link['href']
url = Domainatrix.parse(url)
url.domain + "." + url.public_suffix
}
array.uniq!
if (array.size > 0)
app.website = array.join(', ')
puts app.website
else
app.website = "NONE"
end
app.save
apps = Apps.order("RANDOM()").where("apps.website IS NULL").take(1)
end
end
end
"require" isn't thread-safe
Change your methods so that everything that is to be "required" is done so before the threads start.
For example:
def get_websites
# values = Apps.all # try uncommenting this line if a second-try is required
ar = Apps.where("apps.website IS NULL")
t1 = Thread.new{ func1(ar) }
t2 = Thread.new{ func1(ar) }
t1.join
t2.join
end
def func1( ar )
apps = ar.order("RANDOM()").limit(1)
while (apps.size == 1)
puts Thread.current
end
end
But as somebody pointed out, the way you're multithreading within the controller isn't advised.
Related
I'm using httparty for making requests and currently have the following code:
def scr(users)
users.times do |id|
test_url = "siteurl/#{id}"
Thread.new do
response = HTTParty.get(test_url)
open('users.json', 'a') do |f|
f.puts "#{response.to_json}, "
end
p "added"
end
end
sleep
end
It works OK for 100-300 records.
I tried adding Thread.exit after sleep, but if I set users to something like 200000, after a while my terminal throws an error. I don't remember what it was but it's something about threads... resource is busy but some records. (About 10000 were added successfully.)
It looks like I'm doing it wrong and need to somehow break requests to batches.
up
here's what I got:
def scr(users)
threads = []
urls = []
users.times do |id|
test_url = "site_url/#{id}"
urls<<test_url
end
urls.each_slice(8) do |batch|
batch.each do |t|
threads << Thread.new do
response = HTTParty.get(t)
response.to_json
end
end
end
all_values = threads.map {|t| t.value}.join(', ')
open('users.json', 'a') do |f|
f.puts all_values
end
On quick inspection, the problem would seem to be that you have a race condition with regards to your JSON file. Even if you don't get an error, you'll definitely get corrupted data.
The simplest solution is probably just to do all the writing at the end:
def scr(users)
threads = []
users.times do |id|
test_url = "siteurl/#{id}"
threads << Thread.new do
response = HTTParty.get(test_url)
response.to_json
end
end
all_values = threads.map {|t| t.value}.join(', ')
open('users.json', 'a') do |f|
f.puts all_values
end
end
Wasn't able to test that, but it should do the trick. It's also better in general to be using Thread#join or Thread#value instead of sleep.
I'm picking up ruby language and get stuck at playing with the chatterbot i have developed. Similar issue has been asked here Click here , I did what they suggested to change the rescue in order to see the full message.But it doesn't seem right, I was running basic_client.rb at rubybot directory and fred.bot is also generated at that directory . Please see the error message below: Your help very be very much appreciated.
Snailwalkers-MacBook-Pro:~ snailwalker$ cd rubybot
Snailwalkers-MacBook-Pro:rubybot snailwalker$ ruby basic_client.rb
/Users/snailwalker/rubybot/bot.rb:12:in `rescue in initialize': Can't load bot data because: No such file or directory - bot_data (RuntimeError)
from /Users/snailwalker/rubybot/bot.rb:9:in `initialize'
from basic_client.rb:3:in `new'
from basic_client.rb:3:in `<main>'
basic_client.rb
require_relative 'bot.rb'
bot = Bot.new(:name => 'Fred', :data_file => 'fred.bot')
puts bot.greeting
while input = gets and input.chomp != 'end'
puts '>> ' + bot.response_to(input)
end
puts bot.farewell
bot.rb:
require 'yaml'
require './wordplay'
class Bot
attr_reader :name
def initialize(options)
#name = options[:name] || "Unnamed Bot"
begin
#data = YAML.load(File.read('bot_data'))
rescue => e
raise "Can't load bot data because: #{e}"
end
end
def greeting
random_response :greeting
end
def farewell
random_response :farewell
end
def response_to(input)
prepared_input = preprocess(input).downcase
sentence = best_sentence(prepared_input)
reversed_sentence = WordPlay.switch_pronouns(sentence)
responses = possible_responses(sentence)
responses[rand(responses.length)]
end
private
def possible_responses(sentence)
responses = []
#data[:responses].keys.each do |pattern|
next unless pattern.is_a?(String)
if sentence.match('\b' + pattern.gsub(/\*/, '') + '\b')
if pattern.include?('*')
responses << #data[:responses][pattern].collect do |phrase|
matching_section = sentence.sub(/^.*#{pattern}\s+/, '')
phrase.sub('*', WordPlay.switch_pronouns(matching_section))
end
else
responses << #data[:responses][pattern]
end
end
end
responses << #data[:responses][:default] if responses.empty?
responses.flatten
end
def preprocess(input)
perform_substitutions input
end
def perform_substitutions(input)
#data[:presubs].each {|s| input.gsub!(s[0], s[1])}
input
end
# select best_sentence by looking at longest sentence
def best_sentence(input)
hot_words = #data[:responses].keys.select do |k|
k.class == String && k =~ /^\w+$/
end
WordPlay.best_sentence(input.sentences, hot_words)
end
def random_response(key)
random_index = rand(#data[:responses][key].length)
#data[:responses][key][random_index].gsub(/\[name\]/, #name)
end
end
I'm assuming that you are trying to load the :data_file passed into Bot.new, but right now you are statically loading a bot_data file everytime. You never mentioned about bot_data in the question. So if I'm right it should be like this :
#data = YAML.load(File.read(options[:data_file]))
Instead of :
#data = YAML.load(File.read('bot_data'))
The code below is a simplification of a much bigger and complex code, what happens is when I invoke the stop function and do a queue clear I was expecting the lock on the get_new thread to be free and ending the whole thread, instead what happens its a dead lock on the thread.join statement.
If I do a pop instead of clear the desired behavior happens. Can you help me understand why?
class Controller
require 'thread'
require 'monitor'
require 'net/http'
attr_accessor :thread_count, :event_queue, :is_running, :producer_thread, :events
def initialize
#thread_count = 5
#event_queue = SizedQueue.new(#thread_count)
#events = [27242233, 27242232,27242231]
end
def start
#is_running = true
#producer_thread = Thread.new{get_new()}
end
def get_new
while #is_running do
#events.each do |e|
p e.to_s
#event_queue << e
end
sleep 1
end
p "thread endend"
end
def stop
p "Stoping!"
#event_queue.clear
p "Queue size: " + #event_queue.length.to_s
sleep 2
#is_running = false
sleep 2
producer_thread.join
puts "DONE!"
end
end
service = Controller.new
service.start
sleep 5
service.stop
This was a bug in ruby. It is fixed in ruby 1.9.3p545.
Early versions of ruby 2.1 and 2.0 were affected too. For those you want 2.1.2 or 2.0.0p481 respectively
Starting using rspec I have difficulties trying to test threaded code.
Here is a simplicfication of a code founded, and I made it cause i need a Queue with Timeout capabilities
require "thread"
class TimeoutQueue
def initialize
#lock = Mutex.new
#items = []
#new_item = ConditionVariable.new
end
def push(obj)
#lock.synchronize do
#items.push(obj)
#new_item.signal
end
end
def pop(timeout = :never)
timeout += Time.now unless timeout == :never
#lock.synchronize do
loop do
time_left = timeout == :never ? nil : timeout - Time.now
if #items.empty? and time_left.to_f >= 0
#new_item.wait(#lock, time_left)
end
return #items.shift unless #items.empty?
next if timeout == :never or timeout > Time.now
return nil
end
end
end
alias_method :<<, :push
end
But I can't find a way to test it using rspec. Is there any effective documentation on testing threaded code? Any gem that can helps me?
I'm a bit blocked, thanks in advance
When unit-testing we don't want any non-deterministic behavior to affect our tests, so when testing threading we should not run anything in parallel.
Instead, we should isolate our code, and simulate the cases we want to test, by stubbing #lock, #new_item, and perhaps even Time.now (to be more readable I've taken the liberty to imagine you also have attr_reader :lock, :new_item):
it 'should signal after push' do
allow(subject.lock).to receive(:synchronize).and_yield
expect(subject.new_item).to receive(:signal)
subject.push('object')
expect(subject.items).to include('object')
end
it 'should time out if taken to long to enter synchronize loop' do
#now = Time.now
allow(Time).to receive(:now).and_return(#now, #now + 10.seconds)
allow(subject.items).to receive(:empty?).and_return true
allow(subject.lock).to receive(:synchronize).and_yield
expect(subject.new_item).to_not receive(:wait)
expect(subject.pop(5.seconds)).to be_nil
end
etc...
I'm reading a Redis set within an EventMachine reactor loop using a suitable Redis EM gem ('em-hiredis' in my case) and have to check if some Redis sets contain members in a cascade. My aim is to get the name of the set which is not empty:
require 'eventmachine'
require 'em-hiredis'
def fetch_queue
#redis.scard('todo').callback do |scard_todo|
if scard_todo.zero?
#redis.scard('failed_1').callback do |scard_failed_1|
if scard_failed_1.zero?
#redis.scard('failed_2').callback do |scard_failed_2|
if scard_failed_2.zero?
#redis.scard('failed_3').callback do |scard_failed_3|
if scard_failed_3.zero?
EM.stop
else
queue = 'failed_3'
end
end
else
queue = 'failed_2'
end
end
else
queue = 'failed_1'
end
end
else
queue = 'todo'
end
end
end
EM.run do
#redis = EM::Hiredis.connect "redis://#{HOST}:#{PORT}"
# How to get the value of fetch_queue?
foo = fetch_queue
puts foo
end
My question is: how can I tell EM to return the value of 'queue' in 'fetch_queue' to use it in the reactor loop? a simple "return queue = 'todo'", "return queue = 'failed_1'" etc. in fetch_queue results in "unexpected return (LocalJumpError)" error message.
Please for the love of debugging use some more methods, you wouldn't factor other code like this, would you?
Anyway, this is essentially what you probably want to do, so you can both factor and test your code:
require 'eventmachine'
require 'em-hiredis'
# This is a simple class that represents an extremely simple, linear state
# machine. It just walks the "from" parameter one by one, until it finds a
# non-empty set by that name. When a non-empty set is found, the given callback
# is called with the name of the set.
class Finder
def initialize(redis, from, &callback)
#redis = redis
#from = from.dup
#callback = callback
end
def do_next
# If the from list is empty, we terminate, as we have no more steps
unless #current = #from.shift
EM.stop # or callback.call :error, whatever
end
#redis.scard(#current).callback do |scard|
if scard.zero?
do_next
else
#callback.call #current
end
end
end
alias go do_next
end
EM.run do
#redis = EM::Hiredis.connect "redis://#{HOST}:#{PORT}"
finder = Finder.new(redis, %w[todo failed_1 failed_2 failed_3]) do |name|
puts "Found non-empty set: #{name}"
end
finder.go
end