What is an optimal algorithm to follow all Twitter users using the twitter API? I have been wrapping my mind about this issue and I cannot find any optimal iterative approach to this.
Thanks in advance for any suggestions.
Besides the case of "why would you do such a thing?" and "this will get your IP banned", etc.
This shouldn't be all that different from writing a web crawler. I would start off by finding a few root sources and throwing their follows/followers into a priority queue ordered by number of follows/followers the user has, ignoring follows/followers you've already visited. Then visit the users using the priority queue to find the user with the most new follows/followers, keeping the pq updated as you go along.
Again, this sounds like a terrible idea to implement in practice. Twitter had 190 million users in July 2010!
As long as you have a theoretical machine, so time and number of API calls doesn't matter, the solution is simple. Every user has a unique id. A user I am following who created his account last week has an id of 229,863,592, so let's use 250,000,000 as the theoretical end point. You can start with an ID of 1, and use the API to follow each user from 1 to 250000000. Anyone who has deleted their account or has been suspended will return an error when you try to follow them. The Twitter API for following 5,000 users at a time by id is:
http://dev.twitter.com/doc/post/friendships/create
Related
One customer had a problem where an incorrect email (from another customer) was assigned to a case. The incorrectly assigned email is a response to a case that was deleted. However, the current case has the same tracking token as the deleted one. It seems that the CRM system uses the same tracking token as soon as it is available again. This should not happen! Here Microsoft has a real programming error from our point of view. The only solution we see is to increase the number of numbers to the maximum so that it takes longer until all tracking tokens are used up. But in the end, you still reach the limit.
Is there another possibility or has Microsoft really made a big mistake in the way emails are allocated?
We also activated Smart Matching, but that didn't help in this case either, because the allocation was made via the Tracking Token first.
Thanks
The structure of the tracking token can be configured and is set to 3 digits by default. This means that as soon as 999 emails are reached, the tracking token starts again at 1, which is basically a thinking error on Microsoft's part.
If you have set "Automatic replies", these will be reached in the shortest possible time. We therefore had to increase the number to 9 digits, which is also not a 100% solution. At some point, this number of emails is also reached and then emails are again assigned to requests that do not belong together. Microsoft has to come up with another solution.
Trying to get all assignments for a given student but cannot find a reliable (fast) way to do it.
It seems like the only way would be:
Get the student courses via courses.list
Loop through the courses list and call courses.courseWork.list for each
Say that on average a student has 10 courses, then 10 requests have to be made. But this takes a while and is kind of overkill...
I would like to know if I am missing something, is there a better way?
I guess you are the user who posted the last comment in this Feature Request. Unfortunately the method you described is the only way.
For someone who faces the same issue, in the Feature Request, you can click on the star next to the issue number to receive updates and to give more priority to the Request.
I noticed a huge discrepancy in the count of sessions for one of our experiments in Google Analytics.
The API says 3,123 sessions for variation 0 and 3,039 for variation 1.
GA API screenshot
At the same time the report in google.com/analytics reads 5,743 for variation 0 and 5,620 for variation 1. GA Web screenshot
The above data is:
- on the exact same dates
- with no filters
- with no segments
- on the same Google Analytics view id
Could you please help me figure this out?
Thanks,
V.
The thing is that ga_sessions in query explorer (api) is not the same as experiment sessions in content experiments interface.
Read this for more information on experiment conversion rate and sessions calculation:
https://support.google.com/analytics/answer/6112437#
Here is a quotation:
Conversion rate is calculated using the same methodology as Analytics:
total converted visits divided by total visits (once a user becomes a
part of an experiment). A user is considered part of an experiment
once he or she has seen the experiment page. For example, if a user
sees the experiment page, then comes back the next day, the second
visit is counted, even if the user does not view the experiment page
again.
Can't find any information about invoices/billing in Facebook Ads Api documentation.
Is it possible at all to get Facebook ad account invoice data using any api?
I know I can get amount spent by ad account per day. Unfortunatelly this is not always identical to the amount on the invoices.
Unfortunately I don't think there is an API for this.
Spend is a little tricky to get right. Facebook can retroactively change the numbers, apply credits to your account, and so on, so you can't really rely on any spend numbers retrieved via the API being 100% accurate.
We ended up setting up a task that runs every night and retrieves spend data for the past 28 days. This was the only way we could ensure that our numbers matched FB's exactly. If you require a similar level of accuracy you might want to consider setting up something along those lines.
I am making an app using twitter API that needs to use the twitter REST API to find the followers of a user frequently . But twitter has a restriction of 350 API requests in an hour . My App in its current state is sure to exceed that. Please tell me what kind of caching strategy should I employ to reduce the number of API calls I make and thereby improve speed of my app and the follow twitter policies without any problem.
Abstract your access to Twitter API and do something along these lines.
If last call to Twitter at least 12 seconds ago
Make new call to Twitter and store returned info
Set Timestamp
else
Return last stored data
endif
This means that only one part of your program needs to know about the restriction and all other parts can treat the data as having come fresh from Twitter.
In the light of your comment, the above pseudo-code becomes
If last call to Twitter at least 12 seconds ago
Make new call to Twitter and save follower list in DB
Set Timestamp
endif
Return follower list from DB
I would be inclined to have this sort of structure in one table, at least at first.
twitter_id
.
.
whatever else you want to store about the person
.
.
followers VARCHAR space-separated list of follower IDs
Obviously, this would be a simplistic approach, but on the basis of 'the simplest thing that works' it would be fine.