Execution expired doing Twitter gem queries with `sleep` - ruby

I am calling Twitter's GET friends/ids and GET users/lookup resources. They work great until I take rate limits into account. Here's an example of the working code:
def followers_ids(screen_name)
cursor = "-1"
followers_ids = []
while cursor != 0 do
followers = Twitter.follower_ids(screen_name, :cursor => cursor)
cursor = followers.next_cursor
followers_ids << followers.ids
end
followers_ids.flatten!
followers_ids
end
When I throw a sleep into the while loop like so:
while cursor != 0 do
followers = Twitter.follower_ids(screen_name, :cursor => cursor)
cursor = followers.next_cursor
followers_ids << followers.ids
sleep(60)
end
I get different sets of errors at different times.
C:/RailsInstaller/Ruby1.9.3/lib/ruby/1.9.1/net/http.rb:799in 'connect': execution expired (Twitter::Error::ClientError)
from C:/RailsInstaller/Ruby1.9.3/lib/ruby/1.9.1/net/http.rb:799:in 'block in connect'
..
from C:/RailsInstaller/Ruby1.9.3/lib/ruby/gems/1.9.1/gems/faraday-0.8.7/lib/faraday/adapter/net_http.rb:73:in 'perform_request'
from C:/RailsInstaller/Ruby1.9.3/lib/ruby/gems/1.9.1/gems/faraday-0.8.7/lib/faraday/adapter/net_http.rb:38:in 'call'
and
C:/RailsInstaller/Ruby1.9.3/lib/ruby/gems/1.9.1/gems/twitter-4.6.2/lib/twitter/response/raise_error.rb:21:in 'on_complete': Twitter is down or being upgraded (Twitter::Error:BadGateway)
C:/RailsInstaller/Ruby1.9.3/lib/ruby/gems/1.9.1/gems/faraday-0.8.7/lib/faraday/response.rb:9:in 'block in call'
..
Anyone have any idea what could be going on? I had read somewhere that, when sleeping, your program will ignore all messages. Could this be part of it? Is there a better way I can be doing this?

Related

Click to all different next pages into a loop until the last page with Watir gem

i have a problem in my ruby watir script.
I want to click through all next pages until the last page, and then puts some first name and last name. I know that the last "next" link is called with one more class "disabled" stop = b.link(class: 'next-pagination page-link disabled').
I try to loop until this classes is reached break if stop.exists?
loop do
link = b.link(class: 'next-pagination page-link')
name_array = b.divs(class: 'name-and-badge-container').map { |e| e.div(class:'name-container').link(class: 'name-link profile-link').text.split("\n") }
puts name_array
stop = b.link(class: 'next-pagination page-link disabled')
break if stop.exists?
link.click
end
I have this error :
This code has slept for the duration of the default timeout waiting for an Element to exist. If the test is still passing, consider using Element#exists? instead of rescuing UnknownObjectException
/Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:496:in rescue in wait_for_exists': timed out after 30 seconds, waiting for #<Watir::Div: located: false; {:class=>"name-and-badge-container", :tag_name=>"div", :index=>13}> to be located (Watir::Exception::UnknownObjectException)
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:486:inwait_for_exists'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:487:in wait_for_exists'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:487:inwait_for_exists'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:639:in element_call'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/elements/element.rb:91:intext'
from /Users/vincentcheloudiakoff/Travail/Automation/lib/linkedin.rb:24:in block (2 levels) in start'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/element_collection.rb:28:ineach'
from /Users/vincentcheloudiakoff/.rbenv/versions/2.4.1/lib/ruby/gems/2.4.0/gems/watir-6.2.1/lib/watir/element_collection.rb:28:in each'
from /Users/vincentcheloudiakoff/Travail/Automation/lib/linkedin.rb:24:inmap'
from /Users/vincentcheloudiakoff/Travail/Automation/lib/linkedin.rb:24:in block in start'
from /Users/vincentcheloudiakoff/Travail/Automation/lib/linkedin.rb:22:inloop'
from /Users/vincentcheloudiakoff/Travail/Automation/lib/linkedin.rb:22:in start'
from start.rb:3:in'
It clicks on the next page, but does not find the next disabled button.
Use the text to locate that element
b.span(text: 'Suivant').click
You don't have to use parent link and then span like b.link().span() instead you can directly locate span the way I have explained.

Is it reasonable to use resque(ruby) to manage external long-running commands (and log tasks)

I have to run bash heavy-job.sh <data-num> (that takes 0.5~2 days) frequently on my computer to process data located at ~/a/data/num . The script call a few sub-processes sequentially and write a log to ~/a/result/num.log . I have done this manually until now.
I wanted to visualize processed tasks and it's status(success or fail), etc as html table. I wrote simple sinatra app to render a table that shows
the list of ~/a/data/num to be processed
~/a/result/num.log exists or not (process not-launched/processing/done)
it's status (the log file contains the word "error" or not)
I found that it would be convenient that if I could launch a bash heavy-job.sh <data-num> from the sinatra app, log the tasks (and info like time,date,etc..) and it's args (heavy-jobs takes some optional args ) and show them as html table.
So I need something that manages jobs and logs to files (or db).
First I wrote a code like below for test (! for test, not integrated with my system yet !), but later I found resque is what i wanted. I am a beginner and not sure if my decision is reasonable or not.
my questions are
is it reasonable to use resque to manage external long-running commands (and log tasks)
or should I use another tool (not necessarily ruby-tool).
(extra;) the task-manager and the sinatra app should work separately (and communicate each other over REST or something) OR not ?
The jobs are not critical since I can retry tasks manually later if failed.
I am not good at English and my question may be misleading. I appreciate any help :) .
class TaskSpawn
def initialize()
#pids = []
end
def spawn(command, options = {})
#opt = {:pgroup => true}
#pids << Kernel.spawn(command, options)
end
def pids()
return #pids.clone
end
def waitany_nohang()
delete_idx = nil
ret = nil
#pids.each_with_index do |p, idx|
pid,status = Process.waitpid2(p, Process::WNOHANG)
unless pid.nil?
delete_idx = idx
ret = [pid,status]
break
end
end
if delete_idx
#pids.delete_at(delete_idx)
return ret
else
# no task fininshed
return nil
end
end
def waitall()
ret = waitall
raise "interal error" if ret.size != pids.size
return ret
end
end

How to timeout named pipes in ruby?

I saw an article which suggests the following code for a writer:
output = open("my_pipe", "w+") # the w+ means we don't block
output.puts "hello world"
output.flush # do this when we're done writing data
and a reader:
input = open("my_pipe", "r+") # the r+ means we don't block
puts input.gets # will block if there's nothing in the pipe
But could it happen that open, puts, gets will block the program? Is there some kind of timeout in place? Can one change it? Also, how come w+ means non-blocking call? Which open system call flags is it converted to?
Okay, let me share with you my picture of the world. As rogerdpack said, there are two options: 1) using select in blocking mode, 2) using non-blocking mode (O_NONBLOCK flag, read_nonblock, write_nonblock, select methods). I haven't tried, so these are just speculations.
As to why open, puts and gets may block the thread. open call blocks until there are at least one reader and at least one writer. And that must be the reason why we need to specify r+, w+ for open call. Judging from strace output they both are converted to O_RDWR flag. Then there must be some buffer, where not yet received data are stored. And that must be the reason why write methods may block. Read methods may block because they expect more data to be available, than it really is.
UPD
If a process attempts to read from an empty pipe, then read(2) will block until data is available. If a process attempts to write to a full pipe (see below), then write(2) blocks until sufficient data has been read from the pipe to allow the write to complete.
-- http://linux.die.net/man/7/pipe
The FIFO must be opened on both ends (reading and writing) before data can be passed. Normally, opening the FIFO blocks until the other end is opened also.
Under Linux, opening a FIFO for read and write will succeed both in blocking and nonblocking mode. POSIX leaves this behavior undefined. This can be used to open a FIFO for writing while there are no readers available.
-- http://linux.die.net/man/7/fifo
And here's the implementation I came up with:
#!/home/yuri/.rbenv/shims/ruby
require 'timeout'
data = ((0..15).to_a.map { |v|
(v < 10 ? '0'.ord + v : 'a'.ord + v - 10).chr
} * 4096 * 2).reduce('', :+)
timeout = 10
start = Time.now
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
timeout_left = timeout - (Time.now - start)
if timeout_left < 0
puts Time.now - start
raise Timeout::Error
end
IO.select nil, [out], nil, timeout_left
retry
end while nwritten < data_len
}
puts Time.now - start
But for my problem at hand I decided to ignore this timeout thing. It probably will suffice to handle just situations when there is no reader on the other end of the pipe (Errno::ENXIO):
open('1.fifo', File::WRONLY | File::NONBLOCK) { |out|
out.flock(File::LOCK_EX)
nwritten = 0
data_len = data.length
begin
delta = out.write_nonblock data
data = data[delta..-1]
nwritten += delta
rescue IO::WaitWritable, Errno::EINTR
IO.select nil, [out]
retry
end while nwritten < data_len
}
P.S. Your feedback is appreciated.
This page should answer all your questions... http://www.ruby-doc.org/core-2.0.0/IO.html
In general, puts can always block the current thread, since they may have to wait for IO to complete for it to return. gets can also block the current thread because it will read and read forever until it hits the first newline, then it will return everything it read. HTH.

Ruby: begin, sleep, retry: where to put incrementer

I have a method 'rate_limited_follow' that takes my Twitter useraccount and follows all the users in an array 'users'. Twitter's got strict rate limits, so the method deals with that contingency by sleeping for 15 minutes and then retrying again. (I didn't write this method, rather got it from the Twitter ruby gem api). You'll notice that it checks to see if the number of attempts are less than the MAX_ATTEMPTS.
My users array has about 400 users that I'm trying to follow. It's adding 15 users at a time (when the rate limits seems to kick in), then sleeping for 15 minutes. Since I set the MAX_ATTEMPTS constant to 3 (just to test it), I expected it to stop trying once it had added 45 users (3 times 15) but it's gone past that, continuing to add 15 users around every fifteen minutes, so it seems as if num_attempts is somehow remaining below 3, even though it's gone through this cycle more than 3 times. Is there something I don't understand about the code? Once 'sleep' is finished and it hits 'retry', where does it start again? Is there some reason num_attempts isn't incrementing?
Calling the method in the loop
>> users.each do |i|
?> rate_limited_follow(myuseraccount, i)
>> end
Method definition with constant
MAX_ATTEMPTS = 3
def rate_limited_follow (account, user)
num_attempts = 0
begin
num_attempts += 1
account.twitter.follow(user)
rescue Twitter::Error::TooManyRequests => error
if num_attempts <= MAX_ATTEMPTS
sleep(15*60) # minutes * 60 seconds
retry
else
raise
end
end
end
Each call to rate_limited_follow resets your number of attempts - or, to rephrase, you are keeping track of attempts per user rather than attempts over your entire array of users.
Hoist num_attempt's initialization out of rate_limited_follow, so that it isn't being reset by each call, and you'll have the behavior that you're looking for.

PG::ERROR: another command is already in progress

I have an importer which takes a list of emails and saves them into a postgres database. Here is a snippet of code within a tableless importer class:
query_temporary_table = "CREATE TEMPORARY TABLE subscriber_imports (email CHARACTER VARYING(255)) ON COMMIT DROP;"
query_copy = "COPY subscriber_imports(email) FROM STDIN WITH CSV;"
query_delete = "DELETE FROM subscriber_imports WHERE email IN (SELECT email FROM subscribers WHERE suppressed_at IS NOT NULL OR list_id = #{list.id}) RETURNING email;"
query_insert = "INSERT INTO subscribers(email, list_id, created_at, updated_at) SELECT email, #{list.id}, NOW(), NOW() FROM subscriber_imports RETURNING id;"
conn = ActiveRecord::Base.connection_pool.checkout
conn.transaction do
raw = conn.raw_connection
raw.exec(query_temporary_table)
raw.exec(query_copy)
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
result_delete = raw.exec(query_delete)
result_insert = raw.exec(query_insert)
ActiveRecord::Base.connection_pool.checkin(conn)
{
deleted: result_delete.count,
inserted: result_insert.count,
updated: 0
}
end
The issue I am having is that when I try to upload I get an exception:
PG::ERROR: another command is already in progress: ROLLBACK
This is all done in one action, the only other queries I am making are user validation and I have a DB mutex preventing overlapping imports. This query worked fine up until my latest push which included updating my pg gem to 0.14.1 from 0.13.2 (along with other "unrelated" code).
The error initially started on our staging server, but I was then able to reproduce it locally and am out of ideas.
If I need to be more clear with my question, let me know.
Thanks
Found my own answer, and this might be useful if anyone finds the same issue when importing loads of data using "COPY"
An exception is being thrown within the CSV.read() block, and I do catch it, but I was not ending the process correctly.
begin
CSV.read(csv.path, headers: true).each do |row|
raw.put_copy_data row['email']+"\n" unless row.nil?
end
ensure
raw.put_copy_end
while res = raw.get_result do; end # very important to do this after a copy
end
This block ensures that the COPY command is completed. I also added this at the end to release the connection back into the pool, without disrupting the flow in the case of a successful import:
rescue
ActiveRecord::Base.connection_pool.checkin(conn)

Resources