I am using the Twitter rubygem and my goal is to retrieve all 1500 search results of a search.
Currently:
Twitter.search("test", :count => 1500).results.map do |status|
i = i+1
end
puts i
The :count does not seem to matter, it only retrieves 100 tweets at most.
This in itself is weird to me as one page of twitter results should return only 20 tweets.
I couldn't find an official doc (sigh~) and digging in the spec didn't help.
Anyone knows how to "turn pages" with this thing, or tell it to do it by itself?
Thanks
Twitter's API may be the limiting factor here, not the gem. – Andrew Marshall
I was mistaken, I mixed up "users/search" with "search/tweets" in the Twitter API, the limit seems to be a hundred for the later and a thousand for the first. It's working as intended =)
Related
I need to fetch some data but I'm completely stumped after trying a few things.
I want to access Airlines & Destinations from the Albuquerque_International_Sunport's wiki page - keep in mind, I'll be going through a prepopulated list of airports with this data.
There are multiple "types" of Airlines: Passenger, Cargo, sometimes there's other (sub?)sections; other times there are none:
Articles for multiple airports will be accessed automatically - including some less known airports. This means I need to:
Check if "Airlines & Destinations" section exists
Take all data inside of any table
Scrape it; otherwise do nothing
I've tried using the ruby wikipedia-client gem however, the .raw_data method isn't even returning the section data:
Next, I went to Wikipedia's API: unless I am mistaken, but it doesn't return "section" names! This doesn't seem right but I wasn't able to get it working.
So I suppose that leaves Nokogiri. I can grab and parse the pages fine, but:
How would I go about detecting "Airlines & Destinations" section presence, getting all table data BEFORE end of section? I have a suspicion I need some tricky Xpath for this.
Seems to be the only viable solution.
Any thoughts welcome. Putting a bounty on this question when I can.
Edit: Perhaps it's better to simply somehow grab a list of all airlines in the world and hit them against HTML? Seems like it could be computationally expensive.
Well, I'm not an expert user of Nokogiri but maybe this can give you some idea.
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("https://en.wikipedia.org/wiki/Albuquerque_International_Sunport"))
# this is the passenger table
page.xpath('//*[#id="mw-content-text"]/div/table[2]/tr').each do |tr|
p tr.text()
puts "-"*50
end
# this is the cargo table
page.xpath('//*[#id="mw-content-text"]/div/table[3]/tr').each do |tr|
p tr.text()
puts "-"*50
end
Using:
getSymbols("LMT")
I get the the following returns data
As can be seen the Adj. price is very different to closing. Going to yahoo you also see different results:
Here the Adj. prtice is $77 on the 9tnh vs $60 for the getSymobls data
Any Idea why the $17 difference or how to correct it?
Yahoo is just broken in some cases. Sometimes, what is displayed on their web page differs from what their API returns. If you click on the "download data link", you will see that what the Yahoo API returns and that it matches the quantmod results.
http://chart.finance.yahoo.com/table.csv?s=LMT&a=5&b=1&c=2010&d=5&e=30&f=2010&g=d&ignore=.csv
In this particular case, the API data seems to make more sense. If you add up the dividends (which yahoo adjust for, along with splits), you get the adjusted price. You can get the dividends with getDividends("LMT", src="yahoo", auto.assign = FALSE)
I have seen these internal discrepancies cropping up more and more frequently with Yahoo. Caveat emptor
I just found the same problem while looking into GSPC but both results website's and the API's disagree with my own extraction using this:
getSymbols('GSPC',src='yahoo',return.class = 'xts',from = Sys.Date()-10,auto.assign=FALSE,to = Sys.Date())
I want to use v5.0 of the Twitter gem, and I can't figure out the documentation to understand how to get a list of followers, given a handle.
It looks like previous versions had a method that looked something like Twitter.follower_ids('ID to lookup'), but that doesn't work any more.
I don't know if there's some easier way to navigate the RDoc documentation but I had to first root around in the code, to notice the RDoc comments, to then realize that this is the page which documents some of the behavior I wanted.
# configure client with secrets and access keys
client.followers 'screen_name_of_interest'
This returns a certain number of followers - but I still can't figure out how to figure out how many followers, and how to use cursors to retrieve more.
I want to create a complex query, e.g. return the first 100 Twitter Users that match the following criteria:
Have greater than X # of followers
Have greater than X # of tweets
Have the string "Rails developer" or "Rails" in their bio
Have tweeted in the last X days.
I was looking through their API docs and it seems so complex to just get something up and running quickly. I don't want to create a full blown app, I just want something simple that will help me do some research.
Am I overthinking this and it should be easy to do via their API (Ruby preferably) ?
I also don't mind it being run locally, and spitting out a text file or a csv file - but also if there is a nice way to have it spit out a nicely formatted HTML page that would be good too.
I just want to get at the data, that's all.
Your best bet is going to be using the GET users/search API method. You can search on "rails" and page through the results discarding any users who don't match your followers/status requirements. It isn't going to be perfect but in general Twitter tries to return popular/relevant users first.
I'm trying to do a google search using Ruby, and print the 1st 3 results.
Could anyone point me to a sample code? I'm unable to find it.
The gem googleajax is there for that:
require 'googleajax'
GoogleAjax.referer = "your_domain_name_here.com"
GoogleAjax::Search.web("Hello world")[:results][0...3]
Now Google wants you to use this http://code.google.com/p/google-api-ruby-client/ with a limit of 100 courtesy queries, and a pricing structure for anything above that.