Scraping tweets - better to use the site or api? - ruby

I'm using the twitter gem to build a Twitter bot in Ruby. I am trying to make it self-sustainable as it were, so I want it to generate its own content to tweet by scraping tweets of users outside its social circle (and then perhaps garbling them with Markov chain generator).
Which one is a better strategy?
Search for tweets via api
Load Twitter pages and scrape tweets with Hpricot or Nokogiri
Also, how can I try to ensure the base tweets come from outside my bot's followers' friends so it's harder to tell it's a bot?
At the moment I use a .yml file with tweets I generated by hand, which is far from ideal.

There's two questions here.
It's always better to use an API where one is available. This will future-proof you against the bot randomly breaking if a simple html element is changed, and it will also allow the website (ie, twitter) to rate limit your searches in case you put too high a load on the service. Although this is unlikely for twitter, it's good practice.
Sometimes, the information you want is unobtainable via the API. In this case, you should consider if you really need to scrape it, and if so, how to limit yourself to be polite.
Basically, if the API allows you to do what you want, use it for maintainability.
As for your second question, I do not have any experience with the twitter API. Is there a method to get twitter IDs of all your followers, and who they follow? If not, you'll be forced to scrape as earlier mentioned - if you really do need this information.
Once you have a list of those who your followers follow, you can check if the ID of the poster of what you want to repost falls inside this set.
Would you consider retweeting for this aspect of the bot?

One thing to also note is performance. If you were to scrape the website, you would have to download the entire page, then scrape the page(which is processor intensive as it is). As opposed to hitting the API, which would only return JSON/XML data.
So from strictly a performance standpoint, I would go with the API.

Related

Is it possible to Fetch Real-time data using Riot Valorant Api?

I am making an application that shows real-time status for a Valorant game. like players alive, the type of weapons each play has, time remaining, etc.
Is it possible to use Riot Valorant API to do this for live matches or for previously played matches?
As per my knowledge you couldn't. But I think you should try with Riot Games' official production API, not development API.
Let me know if you find something relatable.
(This is adding onto Sanskar's answer, which I cannot comment on as I lack the required 'reputation')
I'm aware that this is an old question, but for anyone who happens to have stumbled upon this question, there is no way to obtain real-time in-game events however, there is a way to retrieve certain data from a match-- only except, not in an official way that does go against Riot Game's TOS of using third party software. Though, I wouldn't worry about this too much as long as you do not ruin the competitive integrity of the game by providing yourself with an in-game advantage over others in the game. I personally have been using this for over a year now and have not received any form of punishment for doing so.
Anyhow, back to the actual question of this thread, check out this document of API endpoints that have been scraped through monitoring HTTP traffic of the Riot Client. https://github.com/techchrism/valorant-api-docs/tree/trunk/docs/ You'll need to obtain certain authorization tokens of the Valorant account through whatever methods are available to you (I pray that it is through lawful means :) ), which highly depends on the type of endpoint. There are certain wrappers for these endpoints already made by other users somewhere on GitHub, and you can always ask for help in the small community of developers that are using these endpoints in the README of the GitHub page I sent in this post.
REMEMBER TO NOT DO ANYTHING THAT WOULD CREATE AN UNFAIR ADVANTAGE, OR ANYTHING ELSE THAT A RIOT EMPLOYEE WOULD NOT APPROVE OF USING THIS :)

how to implement Complex Web API queries in ASP Core

I'm new to web API design, so I've tried to learn best practices of web API design using these articles:
1.Microsoft REST API Guidelines
2.Web API Design-Crafting Interfaces that Developers Love from "Apigee"
Apigee is recommending web API developers to use these recommendations to have better APIs.
I quote here two of the recommendations:
I need C# code for implementing these recommendations in my Web APIs (in ASP Core) which is a back-end for native mobile apps and AngularJs web site.
Sweep complexity behind the ‘?’
Most APIs have intricacies beyond the base level of a resource. Complexities can include many states that can be updated, changed, queried, as well as the attributes associated with
a resource.
Make it simple for developers to use the base URL by putting optional states and attributes behind the HTTP question mark. To get all red dogs running in the park:
GET /dogs?color=red&state=running&location=park
Partial response allows you to give developers just the information they need.
Take for example a request for a tweet on the Twitter API. You'll get much more than a typical twitter app often needs - including the name of person, the text of the tweet, a timestamp, how often the message was re-tweeted, and a lot of metadata.
Let's look at how several leading APIs handle giving developers just what they need in
responses, including Google who pioneered the idea of partial response.
LinkedIn
/people:(id,first-name,last-name,industry)
This request on a person returns the ID, first name, last name, and the industry.
LinkedIn does partial selection using this terse :(...) syntax which isn't self-evident.
Plus it's difficult for a developer to reverse engineer the meaning using a search engine.
Facebook
/joe.smith/friends?fields=id,name,picture
Google
?fields=title,media:group(media:thumbnail)
Google and Facebook have a similar approach, which works well.
They each have an optional parameter called fields after which you put the names of fieldsyou want to be returned.
As you see in this example, you can also put sub-objects in responses to pull in other information from additional resources.
Add optional fields in a comma-delimited list
The Google approach works extremely well.
Here's how to get just the information we need from our dogs API using this approach:
/dogs?fields=name,color,location
Now I need C# code that handles these kind of queries or even more complex like this:
api/books/?publisher=Jat&Writer=tom&location=LA?fields=title,ISBN?$orderBy=location desc,writerlimit=25&offset=50
So web API users will be able to send any kind of requests they want with different complexities, fields, ordering,... based on their needs.

Twitter - how to get user's timeline

My app, in one of its parts, should reproduce the same behaviour as a web page, where you can find a section with a table of Twitter posts, I guess they are a user's timeline. I took a look at Twitter api's and I found a call which could return it, but, If I got it right, you are supposed to be authenticated with that user credentials. Is there a way to achieve it without being that user (thus without using that user's credentials)? If not we have to assume that web plugins have more flexibility than queries which return xml, or json? Which kind of approach fits best, considering the app needs to support iOS from 4.3 to 6.x? Does Twitter+Oauth provide more flexibility than direct Twitter api calls?
Hm, if you are looking to just display user's feed you can do it as simple as:
https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=reMakeIn&count=200
Where you change the screen_name to the desired user that you want to show the feeds for.
No need what so ever to use authentication for this.
Not sure if this is what you want to achieve, but I use this approach to show random user's tweet feed.

How to fill out AJAX form programmatically and scrape results?

Basically, I want to use the Facebook Ads Manager Tool to estimate the number of users targeted by a particular set of targeting parameters. I know there is a published API available, but it is only usable if you are on their advertising application "whitelist." I am sure what I am asking is possible. Plus, it would be interesting to learn more about scraping.
Facebook's Ads Manager Tool is basically an AJAX UI for their ads API. In the process of creating a campaign, you can specify targeting parameters, and the page will dynamically report the number of users targeted as you modify the parameters. From what I've read on the web and here on stackOverflow, it is possible to use Firebug or a similar tool to pick apart what requests are being made by the page and to where, then mimicking these calls to get the information you want.
I'm having trouble interpreting the panels of Firebug. I think the URI I'm trying to send a request to is www.facebook.com/ajax/inventory_estimator.php, though I'm not sure how to form a call.
So, if I want to write a script or program that takes a list of words to use as keywords and returns the estimated number of users for each keyword, how could I do it?
Link to Facebook's Ads Manager Tool, Campaign Creation Page:
http://www.facebook.com/ads/create
yes using an extension like firebug to examine the HTTP requests is a good way to do this.
The Net tab is the one you want (last one).
Have you tried irobotsoft webscraper? It has a good ajax support.
Check their forum here: http://irobotsoft.org/bb/YaBB.pl

How to consume Facebook's "autocomplete anything" suggest-style dropdown

When you go to edit your favorite music or movies on Facebook, you will notice an autocomplete suggest list that is basically a list of "everything" (brand names, music artists, movies, etc.) How can someone consume that list in their own code? Is it part of the Facebook API?
They wrap some of the functionality in their FBML fields, but their developer wiki shows how they do what they do. If you want to consume their data though, you're going to have to play with an HTTP proxy and figure out what parameters to send to their server. There are also a couple parameters that seem to be session based, so I don't know how well you're going to be able to integrate this into your own application.
This was working for awhile, but now they require the session cookie, so we'll have to hope they add support for this to the graph api, unless you want to fight w/ the proxy.

Resources