Getting Large Twitter User Lists (And Retrying) - ruby

I'm trying to get the list of follower user IDs for a certain handle. Something like
client = Twitter::REST::Client.new(...)
client.follower_ids(handle).each { |id| puts id }
But for a handle with thousands of followers I get Twitter::Error:TooManyRequests. I considered trying
client = Twitter::REST::Client.new(...)
begin
client.follower_ids(handle).each { |id| puts id }
rescue Twitter::Error::TooManyRequests => error
sleep error.rate_limit.reset_in
retry
end
But won't that restart the each loop from the beginning every time I get TooManyRequests (and thus never finish)?
I am using the twitter gem v5.8.0.

The gem repo was recently updated with an example of how to do this properly.
The key is to put only the loop inside the block.
client = Twitter::REST::Client.new(...)
follower_ids = client.follower_ids(handle)
begin
follower_ids.each { |id| puts id }
rescue Twitter::Error::TooManyRequests => error
sleep error.rate_limit.reset_in
retry
end
Warning:
My example is no good!
This code works because the cursor caches the results of each request, so that each time it retries, it doesn't need to request everything from the start. However it does loop through everything already cached before making another request.
This code will output IDs multiple times, whenever a retry occurs. If you write code like this, you need to include something that prevents duplication of loop iterations.

Related

rake db:seed not working to seed from API in Ruby CLI app - will seed manually written data - Ruby/ActiveRecord

I’m trying to make improvements to a project for school (super beginner) using seeded data from an API to make a CLI app using Ruby and ActiveRecord, no Rails. I have had to kind of "cheat" the data by taking it (a hash of object IDs), appending that ID to the end of another URL link (creating an array of these links) and then iterating over each one and making a GET request, putting it into final hash from which I iterate over and seed into my database.
I was able to successfully do it once - but I wanted to expand the data set, so I cleared the db and went to re-seed and it no longer works. It hangs for quite a bit, then seems to complete, but the data isnt there. The only change I made in my code was to the URL, but even when I change it back it no longer works. However, it does seed anything I've manually written. The URL works fine in my browser. I tried rake:db:migrate:reset but that didnt seem to work for me.
I apologize if my code is a bit messy, I'm just trying to get to the bottom of this issue and it is my first time working with APIs / creating a project like this. I appreciate any help. Thanks!
response = RestClient.get("https://collectionapi.metmuseum.org/public/collection/v1/search?departmentId=11&15&19&21&6q=*")
metData = JSON.parse(response)
url = "https://collectionapi.metmuseum.org/public/collection/v1/objects/"
urlArray = []
metData["objectIDs"].each do |e|
urlArray.push(url.to_s + e.to_s)
end
# urlArray.slice!(0,2)
urlArray
end
object_id_joiner
def finalHash
finalHash =[]
object_id_joiner.each do |e|
response = RestClient.get(e)
data = JSON.parse(response)
finalHash.push(data)
end
finalHash
end
finalHash
finalHash.each do |artist_hash|
if artist_hash["artistDisplayName"] == nil
next
end
if (!artist_hash["artistDisplayName"])
art1 = Artist.create(artist_name:artist_hash["artistDisplayName"])
else
next
end
if (!artist_hash["objectID"])
Artwork.create(title: artist_hash["title"],image: artist_hash["primaryImage"], department: artist_hash["department"], artist: art1, object_id: artist_hash["objectID"])
else
next
end
end
As mentioned in comments you had some rogue ! in your code.
Here is a simpler version of your last loop.
finalHash.each do |artist_hash|
next if artist_hash["artistDisplayName"] == nil
# Now you don't need conditional for artistDisplayName
art1 = Artist.create(artist_name: artist_hash["artistDisplayName"])
# Now create artwork if you HAVE objectID
if (artist_hash["objectID"])
Artwork.create(title: artist_hash["title"],image: artist_hash["primaryImage"], department: artist_hash["department"], artist: art1, object_id: artist_hash["objectID"])
end
end

Rails 5 API - Redundantly passing ID of Association

I have a simple structure of "A has_many B has_many C"
If I go into Rails console and do something like A.first.Bs.first.C.create() it'll create without an issue, however, if I use the API (or even Seeds actually) and so something like POST to /api/v1/a/1/b with the below create, I will always get rejected due to "Must belong to A" - Basically meaning it's trying to save as a.id = null.
A = Campaign. B = Party for the below snippet.
def create
#campaign = Campaign.find_by_id(params[:campaign_id])
if #campaign.user_id == current_user.id
#party = Party.new(party_params)
# #party.campaign_id = params[:campaign_id]
if #party.save!
render status: 201, json: {
message: "Successfully saved the party!",
party: #party,
user: current_user
}
else
render status: 404, json: {
message: "Something went wrong: Check line 27 of Party Controller"
}
end
end
end
The line I have commented out where I manually assigned #party.campaign_id resolved the error, but I am curious why it doesn't automatically pull from the information? Do route resources not function the same way as a Campaign.first.parties.create would?
Welcome any revision to this create method; It feels bulky, and likely not secure at all presently.
(Note #campaign.user_id == current_user.id is kind of a generic catch in case someone is trying to update someone else's campaign. I will likely re-visit this logic to make it more secure.)
Rails does not find anything automatically basing on routes, you need to do it by yourself.
In this case you can either assign id basing on params (as you did in the comment) or build Party as an element of Campaign.parties association
#campaign = Campaign.find_by_id(params[:campaign_id])
#party = #campaign.parties.new(party_params)

EventMachine read and write files in chunks

I'm using EventMachine and EM-Synchrony in a REST API server. When a receive a POST request with a large binary file in body I receive it in chunks, writing that chunks to a Tempfile, not blocking the reactor.
Then, at some point, I need to read this file in chunks and write that chunks to a definitive file. This is working, but it blocks the reactor as expected, and cant find a way to make it work without blocking.
I call this function at some time, passing to it the tempfile and new file name:
def self.save(tmp_file, new_file)
tmp = File.open(tmp_file, "rb")
newf = File.open(new_file, "wb")
md5 = Digest::MD5.new
each_chunk(tmp, CHUNKSIZE) do |chunk|
newf << chunk
md5.update chunk
end
md5.hexdigest
end
def self.each_chunk(file, chunk_size=1024)
yield file.read(chunk_size) until file.eof?
end
I've been reading all other similar questions here at StackOverflow, trying to use EM#next_tick, which is perhaps the solution (not so much EM experience) but cant get it to work, perhaps I'm placing it in the wrong places.
Also, I've tried EM#defer, but I need the function to wait for the read/write process to complete before it returns the md5, as in my main file, after call this function I do a database update with the return value.
If someone can help me on this I would be grateful.
EDIT 1
I need that the save function only returns after complete the files read/write, as in the caller function I'm waiting for the final md5 value, something like this:
def copy_and_update(...)
checksum = SomeModule.save(temp_file, new_file)
do_database_update({:checksum => checksum}) # only with the final md5 value
end
You need to inject something in there to break it up:
def self.each_chunk(file, chunk_size=1024)
chunk_handler = lambda {
unless (file.eof?)
yield file.read(chunk_size)
EM.next_tick(&chunk_handler)
end
}
EM.next_tick(&chunk_handler)
end
It's kind of messy to do it this way, but such is asynchronous programming.

How do I output the % complete of resque-status?

I'm using resque-status for Resque/Redis...
https://github.com/quirkey/resque-status
I basically want to create a new Sinatra method .. something like below. I only have 2 JobsWithStatus so it could either return both or a specific one, i dont really care.
post '/getstatus' do
# return status here of all kinds (or specific)
end
Then I want to output the % complete via jquery on the frontend using a polling timer that checks the status every 5 seconds.
This is what I have
post '/refresh' do
job_id = PostSaver.create(:length => Forum.count)
status = Resque::Status.get(job_id)
redirect '/'
end
It says in the documentation i can just use status.pct_complete but it always returns 0? Even then, I'm new to ruby and even IF the variable showed the proper % complete, I'm not sure how to make that variable work inside of a separate sinatra entry (in /getstatus rather than /refresh).
I tried this however and it keeps returning 0
post '/refresh' do
job_id = PostSaver.create(:length => Forum.count)
status = Resque::Status.get(job_id)
sleep 20
status.pct_complete.to_s
end
Saw your question over on reddit…
To have the status come back as something other than 0, you need to use the at (http://rubydoc.info/github/quirkey/resque-status/master/Resque/JobWithStatus:at) method to set a percentage during the calculation you're running.
You probably don't want sleep calls inside an action. The timer should be in jQuery.
Sharing Status
post '/refresh' do
job_id = PostSaver.create(:length => Forum.count)
status = Resque::Status.get(job_id)
sleep 20
"{'percent_complete':#{status.pct_complete},'job_id':'#{job_id}'}"
end
Then in whatever is getting the status (some jQuery#ajax call?), you can grab the job_id from the returned JSON and then with your next request, you might do something like:
post '/status' do
status = Resque::Status.get(params['job_id'])
"{'percent_complete':#{status.pct_complete}}"
end

Sharing data between Sinatra condition and request block

I am just wondering if it is possible to have a condition that passes information to the request body once it is complete, I doubt conditions can do it and are the right place even if they could, because it implies they are to do conditional logic, however the authorisation example also redirects so it has a blur of concerns... an example would be something like:
set(:get_model) { |body| { send_to_request_body(Model.new(body)) } }
get '/something', :get_model => request.body.data do
return "model called #{#model.name}"
end
The above is all psudocode so sorry for any syntax/spelling mistakes, but the idea is I can have a condition which fetches the model and puts it into some local variable for the body to use, or do a halt with an error or something.
I am sure filters (before/after) would be a better way to do this if it can be done, however from what I have seen I would need to set that up per route, whereas with a condition I would only need to have it as an option on the request.
An example with before would be:
before '/something' do
#model = Model.new(request.body.data)
end
get '/something' do
return "model called #{#model.name}"
end
This is great, but lets say I now had 20 routes, and 18 of them needed these models creating, I would need to basically duplicate the before filter for all 18 of them, and write the same model logic for them all, which is why I am trying to find a better way to re-use this functionality. If I could do a catch-all Before filter which was able to check to see if the given route had an option set, then that could possibly work, but not sure if you can do that.
In ASP MVC you could do this sort of thing with filters, which is what I am ideally after, some way to configure certain routes (at the route definition) to do some work before hand and pass it into the calling block.
Conditions can set instance variables and modify the params hash. For an example, see the built-in user_agent condition.
set(:get_model) { |body| condition { #model = Model.new(body) } }
get '/something', :get_model => something do
"model called #{#model.name}"
end
You should be aware that request is not available at that point, though.
Sinatra has support for before and after filters:
before do
#note = 'Hi!'
request.path_info = '/foo/bar/baz'
end
get '/foo/*' do
#note #=> 'Hi!'
params[:splat] #=> 'bar/baz'
end
after '/create/:slug' do |slug|
session[:last_slug] = slug
end

Resources