rTurk fails getting results from Amazon Mechanical Turk - ruby

I am experiencing a very strange problem. I am able to submit a HIT to Amazon Mechanical Turk correctly. I have a cron that keeps checking if there is any job ready to be reviewed. However, when a job is completed I am not receiving it but the HIT is disposed (what is done inside the function to review it). The strange thing is that this problem is not happening always, but quite often.
This is the code of the function to review the HITs:
def self.review_hits
hits = RTurk::Hit.all_reviewable
p "HITS"
p hits
puts "REVIEWABLE HITS: " + hits.count.to_s
hits_results = {}
unless hits.empty?
hits.each do |hit|
puts "IN EACH HIT"
p hit
results = []
hit.expire!
# Get results for each HIT assignment
hit.assignments.each do |assignment|
# Check if the assignmment has been submitted. It can be the case where the maximum waiting time
# for the job to finish expired and there remain assignments that are not submitted
if assignment.status == 'Submitted'
p "STATUS 1"
p assignment.status
temp = {}
temp[:worker_id] = assignment.worker_id
temp[:answer] = assignment.answers
p "STATUS 2"
p assignment.status
assignment.approve!
results << temp
end
end
begin
hit.dispose!
rescue
end
hits_results[hit.id] = {}
hits_results[hit.id][:results] = results
end
# Let Rails know that there are new results
AmazonTurkHit.store_results(hits_results)
end
end
So the puts "REVIEWABLE HITS:" is 0 but the HIT is disposed. Does anyone know why?

After some time becoming crazy about this I realised that the problem was very silly... I had two instances of the system running, one in production and one in staging but both with the same AWS account so sometimes the HIT was caught by the other system... :)

Related

How to update an attribute after create or update

I’m new to rails and I’m trying to figure out the following.
I’ve got a class Order which has attributes name, status and radius. The possible status are [draft, posted, taken]. The radius can change from 500 to 5k incrementing 500 each time a loop runs. I would like to know how can I make the radius to change depending on the status and the time that has pass.
If #order.status = draft, then status = 500.
If #order.status = posted, then the radius’ value will start with 500 but increment by 500 every 10 seconds until it gets to 40.
If #order.status = taken, then the radius’ value will be equal to whatever the radius is when the status changed to taken.
if order.status != taken after the 40 seconds, the #order.status = draft and #order.radius = 500
The following code runs well in ruby for visualization only, if you copied an run it on the teminal you will see it running well.
count = 1
initial_radius = 500
puts "Enter the status: "
status = gets
status = status.chomp
while status == 'posted' && count < 4 # this will be the one deciding (n) #times for 10k max radius of search otherwise get back to draft
puts ""
puts "Run #{count}"
radius = 500
puts "Radius = #{radius}"
n = 1
while status == 'posted' && n <10
status = 'posted'
puts " Status is now = #{status.capitalize}! "
sleep(1)
puts "n = #{n}"
puts "Enter the status: "
status = gets
status = status.chomp
n += 1
start = Time.now
puts "Radius = #{radius}"
end
if status == 'posted'
count += 1
radius *= count
puts "New Radius = #{radius}"
elsif status == 'taken'
radius *= count
puts ""
puts "Order has been Taken with radius #{radius}!!"
puts ""
end
#radius = 500 This was removed as it didn't affect
end
if status == 'taken'
puts ""
else
puts ""
puts "No company took the order"
puts "Order has gone back to status Draft and its radius is #{initial_radius} "
puts ""
end
But when I try the following to see how the server behaves and I create a new Order it hangs until it finishes processing any order that has been placed before. I know that it hangs because of the sleep method. The code below is done only to see how it will behave. It does update the radius after 15 seconds but no one would like to use an application that needs to wait until someone else finish their bit.
Order model
after_save :change_radius, on: [:create, :update]
def change_radius
if self.status == 'posted'
sleep 15
update_column(:radius, 1000)
elsif self.status == 'draft'
update_column(:radius, 500)
end
end
My question is. How can I make it to work so it will be possible for any other user to use the application (create an order) without any hanging until it finish dealing with another user’s order. I think it may need use of Active jobs or something similar but unsure how to set it up if this is the case. I would appreciate any guidance on this matter.
For a process that takes a long time to complete, consider putting it in a background process. I would check out using sidekiq with redis. It's a good option for background processing.

How do you have threads in Ruby send strings back to a parent thread

I want to be able to call a method that repeats x amount of times on a separate thread that sends messages such as "still running" every few moments to the console while I am free to call other methods that do the same thing.
This works in my test environment and everything checks out via rspec - but when I move the code into a gem and call it from another script, it appears that the code is working in additional threads, but the strings are never sent to my console (or anywhere that I can tell).
I will put the important parts of the code below, but for a better understanding it is important to know that:
The code will check stock market prices at set intervals with the intent of notifying the user when the value of said stock reaches a specific price.
The code should print to the console a message stating that the code is still running when the price has not been met.
The code should tell the user that the stock has met the target price and then stop looping.
Here is the code:
require "trade_watcher/version"
require "market_beat"
module TradeWatcher
def self.check_stock_every_x_seconds_for_value(symbol, seconds, value)
t1 = Thread.new{(self.checker(symbol, seconds, value))}
end
private
def self.checker(symbol, seconds, value)
stop_time = get_stop_time
pp stop_time
until is_stock_at_or_above_value(symbol, value) || Time.now >= stop_time
pp "#{Time.now} #{symbol} has not yet met your target of #{value}."
sleep(seconds)
end
if Time.now >= stop_time
out_of_time(symbol, value)
else
reached_target(symbol, value)
end
end
def self.get_stop_time
Time.now + 3600 # an hour from Time.now
end
def self.reached_target(symbol, value)
pp "#{Time.now} #{symbol} has met or exceeded your target of #{value}."
end
def self.out_of_time(symbol, value)
pp "#{Time.now} The monitoring of #{symbol} with a target of #{value} has expired due to the time limit of 1 hour being rached."
end
def self.last_trade(symbol)
MarketBeat.last_trade_real_time symbol
end
def self.is_stock_at_or_above_value(symbol, value)
last_trade(symbol).to_f >= value
end
end
Here are the tests (that all pass):
require 'spec_helper'
describe "TradeWatcher" do
context "when comparing quotes to targets values" do
it "can report true if a quote is above a target value" do
TradeWatcher.stub!(:last_trade).and_return(901)
TradeWatcher.is_stock_at_or_above_value(:AAPL, 900).should == true
end
it "can report false if a quote is below a target value" do
TradeWatcher.stub!(:last_trade).and_return(901)
TradeWatcher.is_stock_at_or_above_value(:AAPL, 1000).should == false
end
end
it "checks stock value multiple times while stock is not at or above the target value" do
TradeWatcher.stub!(:last_trade).and_return(200)
TradeWatcher.should_receive(:is_stock_at_or_above_value).at_least(2).times
TradeWatcher.check_stock_every_x_seconds_for_value(:AAPL, 1, 400.01)
sleep(2)
end
it "triggers target_value_reahed when the stock has met or surpassed the target value" do
TradeWatcher.stub!(:last_trade).and_return(200)
TradeWatcher.should_receive(:reached_target).exactly(1).times
TradeWatcher.check_stock_every_x_seconds_for_value(:AAPL, 1, 100.01)
sleep(2)
end
it "returns a 'time limit reached' message once a stock has been monitored for the maximum of 1 hour" do
TradeWatcher.stub!(:last_trade).and_return(200)
TradeWatcher.stub!(:get_stop_time).and_return(Time.now - 3700)
TradeWatcher.check_stock_every_x_seconds_for_value(:AAPL, 1, 100.01)
TradeWatcher.should_receive(:out_of_time).exactly(1).times
sleep(2)
end
end
And here is a very simple script that (in my understanding) should print "{Time.now} AAPL has not yet met your target of 800.54." every 1 second that the method is still running and should at least be visible for 20 seconds (I test this using sleep in rspec and am able to see the strings printed to the console):
require 'trade_watcher'
TradeWatcher.check_stock_every_x_seconds_for_value(:AAPL, 1, 800.54)
sleep (20)
However I get no output - although the program does wait 20 seconds to finish. If I add other lines to print out to the console they work just fine, but nothing within the thread triggered by my TradeWatcher method call actually work.
In short, I'm not understanding how to have threads communicate with each other appropriately - or how to sync them up with each other (I don't think thread.join is appropriate here because it would leave the main thread hanging and unable to accept another method call if I chose to send one at a time in the future). My understanding of Ruby multithreading is weak anyone able to understand what I'm trying to get at here and nudge me in the right direction?
It looks like the pp function is simply not yet loaded by ruby when you go to print. By adding:
require 'pp'
to the top of trade_watcher.rb I was able to get the output you're expecting. You might also want to consider adding:
$stdout.sync = $stderr.sync = true
to your binary/executable script so that your output is not buffered internally by the IO class and instead passed directly to the os.

workaround for Twitter api rate limiting

I've collected a bunch of users and put them in a variable 'users'. I'm looping through them and trying to follow them with my new twitter account. However, after about 15, I'm getting stopped by Twitter for exceeding rate limit. I want to run this again but without the users that i've already followed. How do I remove 'i' from the array of 'users' after they've been followed, or somehow return a new array out of this with the users I've yet to follow? I'm aware of methods like pop and unshift etc, but I'm not sure where 'i' is coming from within the 'users' array. I'm a perpetual newbie, so please include as much detail as possible
Not, users is actually a 'cursor' and not an array, therefore, it has no length method
>> users.each do |i|
?> myuseraccount.twitter.follow(i)
>> end
Twitter::Error::TooManyRequests: Rate limit exceeded
A simple hack would could make use of a call to sleep(n):
>> users.each do |i|
?> myuseraccount.twitter.follow(i)
?> sleep(3)
>> end
Increment the sleep count until twitter-api stops throwing errors.
A proper solution to this problem is achieved via rate-limiting.
A possible ruby solution for method call rate limiting would be glutton_ratelimit.
Edit - And, as Kyle has pointed out, there is a documented solution to this problem.
Below is an enhanced version of that solution:
def rate_limited_follow (account, user)
num_attempts = 0
begin
num_attempts += 1
account.twitter.follow(user)
rescue Twitter::Error::TooManyRequests => error
if num_attempts % 3 == 0
sleep(15*60) # minutes * 60 seconds
retry
else
retry
end
end
end
>> users.each do |i|
?> rate_limited_follow(myuseraccount, i)
>> end
There are a number of solutions, but the easiest in your case is probably shift:
while users.length > 0 do
myuseraccount.twitter.follow(users.first)
users.shift
end
This will remove each user from the array as they are processed.
Here is what I did
def self.careful(&block)
begin
client = get_current_client()
yield client
rescue Twitter::Error::TooManyRequests => error
current_user= User.find_by_token(client.instance_variable_get("#oauth_token"))
current_user.update_attribute(:rate_limit_at, Time.now)
change_current_client()
retry
end
end
this block executes an api call using the current client. If it hits a rate limit, it changes the client to another one using the change_current_client() method, then it retries the call using the new client. you can add a sleep() there if you want.
This can be used like
careful{|client| client.search("#something")}

Odd bug with DataMapper, Mutexes, and Threads?

I have a database full of URLs that I need to test HTTP response time for on a regular basis. I want to have many worker threads combing the database at all times for a URL that hasn't been tested recently, and if it finds one, test it.
Of course, this could cause multiple threads to snag the same URL from the database. I don't want this. So, I'm trying to use Mutexes to prevent this from happening. I realize there are other options at the database level (optimistic locking, pessimistic locking), but I'd at least prefer to figure out why this isn't working.
Take a look at this test code I wrote:
threads = []
mutex = Mutex.new
50.times do |i|
threads << Thread.new do
while true do
url = nil
mutex.synchronize do
url = URL.first(:locked_for_testing => false, :times_tested.lt => 150)
if url
url.locked_for_testing = true
url.save
end
end
if url
# simulate testing the url
sleep 1
url.times_tested += 1
url.save
mutex.synchronize do
url.locked_for_testing = false
url.save
end
end
end
sleep 1
end
end
threads.each { |t| t.join }
Of course there is no real URL testing here. But what should happen is at the end of the day, each URL should end up with "times_tested" equal to 150, right?
(I'm basically just trying to make sure the mutexes and worker-thread mentality are working)
But each time I run it, a few odd URLs here and there end up with times_tested equal to a much lower number, say, 37, and locked_for_testing frozen on "true"
Now as far as I can tell from my code, if any URL gets locked, it will have to unlock. So I don't understand how some URLs are ending up "frozen" like that.
There are no exceptions and I've tried adding begin/ensure but it didn't do anything.
Any ideas?
I'd use a Queue, and a master to pull what you want. if you have a single master you control what's getting accessed. This isn't perfect but it's not going to blow up because of concurrency, remember if you aren't locking the database a mutex doesn't really help you is something else accesses the db.
code completely untested
require 'thread'
queue = Queue.new
keep_running = true
# trap cntrl_c or something to reset keep_running
master = Thread.new do
while keep_running
# check if we need some work to do
if queue.size == 0
urls = URL.all(:times_tested.lt => 150)
urls.each do |u|
queue << u.id
end
# keep from spinning the queue
sleep(0.1)
end
end
end
workers = []
50.times do
workers << Thread.new do
while keep_running
# get an id
id = queue.shift
url = URL.get(id)
#do something with the url
url.save
sleep(0.1)
end
end
end
workers.each do |w|
w.join
end

Ruby thread callback weird behaviour

Creating a class which holds some threads, performing tasks and finally calling a callback-method is my current goal, nothing special on this road.
My experimental class does some connection-checks on specific ports of a given IP, to give me a status information.
So my attempt:
check = ConnectionChecker.new do | threads |
# i am done callback
end
check.check_connectivity(ip0, port0, timeout0, identifier0)
check.check_connectivity(ip1, port1, timeout1, identifier1)
check.check_connectivity(ip2, port2, timeout2, identifier2)
sleep while not check.is_done
Maybe not the best approach, but in general it fits in my case.
So what's happening:
In my Class I store a callback, perform actions and do internal stuff:
Thread.new -> success/failure -> mark as done, when all done -> call callback:
class ConnectionChecker
attr_reader :is_done
def initialize(&callback)
#callback = callback
#thread_count = 0
#threads = []
#is_done = false
end
def check_connectivity(host, port, timeout, ident)
#thread_count += 1
#threads << Thread.new do
status = false
pid = Process.spawn("nc -z #{host} #{port} >/dev/null")
begin
Timeout.timeout(timeout) do
Process.wait(pid)
status = true
end
rescue Process::TimeoutError => e
Process.kill('TERM', pid)
end
mark_as_done
#returnvalue for the callback.
[status, ident]
end
end
# one less to go..
def mark_as_done
#thread_count -= 1
if #thread_count.zero?
#is_done = true
#callback.call(#threads)
end
end
end
This code - yes, I know there is no start method so I have to trust that I call it all quite instantly - works fine.
But when I swap these 2 lines:
#is_done = true
#callback.call(#threads)
to
#callback.call(#threads)
#is_done = true
then the very last line,
sleep while not check.is_done
becomes an endless loop. Debugging shows me that the callback is called properly, when I check for the value of is_done, it really always is false. Since I don't put it into a closure, I wonder why this is happening.
The callback itself can also be empty, is_done remains false (so there is no mis-caught exception).
In this case I noticed that the last thread was at status running. Since I did not ask for the thread's value, I just don't get the hang here.
Any documentation/information regarding this problem? Also, a name for it would be fine.
Try using Mutex to ensure thread safety :)

Resources