What's google's position on programmatically querying pagerank? - pagerank

Is there a public API for programmatically querying pagerank? If so, for what level of volumes would it be permitted for a service to use it?

From my Experience, It's OK to query page-rank as long as you do it with 1 second intervals, otherwise Google blocks you IP address after a few queries. if you're using .net there is library to help query page-rank anagrammatically , you can find it here.

Just tried a PageRank tool, and the answer from Google is clear :
http://www.google.com/support/websearch/bin/answer.py?&answer=86640&hl=en
That says :
"If you tried the steps above and haven't resolved the issue, it's very likely that a user or a computer in your network is sending automated traffic to Google. Your network administrator may be able to locate and shut down the source of the automated traffic; feel free to refer them to this page. Sending automated queries of any sort to Google is against our Terms of Service. This includes, among other things, the following activities:
Using any software that sends queries to Google to determine how a website or webpage ranks on Google for various queries"
So the answer is NO : Google forbids and blocks those kind of queries !

This is from google guideline page
http://www.google.com/support/webmasters/bin/answer.py?answer=35769
"Don't use unauthorized computer
programs to submit pages, check
rankings, etc. Such programs consume
computing resources and violate our
Terms of Service. Google does not
recommend the use of products such as
WebPosition Gold™ that send automatic
or programmatic queries to Google"
So I guess that is a no.
But I found this: http://www.fourmilab.ch/webtools/PageRank/ that you could try. I do not know if it is "legal".

Related

Google APIs - API Key not Counting Against Queries

I've come to you today in hopes of getting some support in regards to the Google Distance Matrix API. Currently I'm using this in a very simple way with a Web Services request through an HTTP interface and am having no problems getting results. Unfortunately my project seems to be running into Query limits due to the 2,500 query Quota limit. I have added Billing to the project to allow for going over 2,500 queries, and it reflects that increased quota in my project. What's funky though is that the console is not showing any usage and so I'm not sure if these requests are being ran against what I have set up.
I am using a single API Key for the project which is present in my requests, and as I said before the requests ARE working, but again I'm hoping to see if someone can shed some light as to why I might not be seeing my queries reflected in my usage, and to see how I can verify that my requests are being run under the project for which I have attached billing.
If there is any information I can provide to help assist in finding an answer, please feel free to let me know and I'll be happy to give what information I can.
After doing some digging I was able to find the following relevant thread to answer my question:
Google API Key hits max request limit despite billing enabled

How to handle request traffic of a background location update application

I am working on a family networking app for Android that enables family members to share their location and track location of others simultaneously. You can suppose that this app is similar with Life360 or Sygic Family Locator. At first, I determined to use a MBaaS and then I completed its coding by using Parse. However, I realized that although a user read and write geolocation data per minute (of course, in some cases geolocation data is sent less frequently), the request traffic exceeds my forward-looking expectations. For this reason, I want to develop a well-grounded system but I have some doubts about whether Parse can still do its duty if number of users increases to 100-500k.
Considering all these, I am looking for an alternative method/service to set such a system. I think using a backend service like Parse is a moderate solution but not the best one. What are the possible ways to achieve this from bad to good? To exemplify, one of my friends say that I can use Sinch which is an instant messaging service in background between users that set the price considering number of active users. Nevertheless, it sounds weird to me, I have never seen such a usage of an instant messaging service as he said.
Your comments and suggestions will be highly appreciated. Thank you in advance.
Well sinch wouldn't handle location updates or storing of location data, that would be parse you are asking about.
And since you implied that the requests would be to much for your username maybe I wrongly assumed price was the problem with parse.
But to answer your question about sending location data I would probably throttle it if I where you to aile or so. No need for family members to know down to the feet in realtime. if there is a need for that I would probably inement a request method instead and ask the user for location when someone is interested.

the geo coder to fetch more requests

I am working with geocoder gem and like to process more number of requests from an IP. By default Google API provides only 2500 requests per day.
Please share your thoughts on how I can do more requests than the limit?
As stated before: Using only Google API the only way around the limitation is to pay for it. Or in a more shady way make the requests form more than one IP/API-Key which i would not recommend.
But to stay on the save side i would suggest mixing the services up since there a few more Geocoding APIs out there - for free.
With the right gem mixing them is also not a big issue:
http://www.rubygeocoder.com/
Supports a couple of them with a nice interface. You would pretty much only have to add some rate-limiting counters making sure you stay within the limits of each provider.
Or go the heavy way of implementing your own geocoding. With for example your own running Openstreetmaps database. The Database can be downloaded here: http://wiki.openstreetmap.org/wiki/Planet.osm#Worldwide_data
Which is the best way depends on what your actual requirements are and what ressources you have available.

Are there performance issues of being a client of your own API?

Take Twitter for example, they say twitter.com as a client of their own API. Could this be one of the reason why Twitter is quite 'slow'?
Reference: http://engineering.twitter.com/2010/09/tech-behind-new-twittercom.html
Would you recommend using your own API for you main website/app?
If using own API is OK, what are the ways to avoid performance issues?
Regarding using your own API: It's about trade offs. In the twitter example by using their own API they were able to "allocate more resources to the API team." That benefit for them outweighed a performance hit. There are other benefits not mentioned either, Like, being the first to vet your api and having a single unified entry point into the system. There are drawbacks as well that are mentioned in the link you posted.
For your application you should look at the architectural qualities you want to achieve and balance that with the constraints you are given and make your own choice. If ultra high performance is at the top of the list then craft your solution to meet that goal.
Regarding performance when using your own API: Again it depends. In the twitter case they knew they would be accessing the API in JavaScript. So the physical jumps are Browser --> Server --> DB. There is no way to get around these hops if you are doing client-server development. In the link you posted they talked about going directly to the DB. Yes that would be faster, but I'm not sure how to do that from a javascript client. I suppose if they had used websockets to a custom API then that would have been faster, but at what development cost.
Summary So it's not that they are using their own API that was the performance hit, it was that they wanted the client to be an HTTP hop away.
Please note that none of these comments talk about what the server --> db calls look like or their caching strategy, or any of the other dozen things which could be a bottleneck

Best traffic / performance / usage monitoring module?

Are there any open source (or I guess commercial) packages that you can plug into your site for monitoring purposes? I'd like something that we can hook up to our ASP.NET site and use to provide reporting on things like:
performance over time
current load
page traffic
SQL performance
PU time monitoring
Ideally in c# :)
With some sexy graphs.
Edit: I'd also be happy with a package that I can feed statistics and views of data to, and it would analyse trends, spot abnormal behaviour (e.g. "no one has logged in for the last hour. is this Ok?", "high traffic levels detected", "low number of API calls detected") and generally be very useful indeed. Does such a thing exist?
At my last office we had a big screen which showed us loads and loads of performance counters over a couple of time ranges, and we could spot weird stuff happening, but the data was not stored and there was no way to report on it. Its a package for doing this that I'm after.
It should be noted that google analytics is not an accurate representation of web site usage. This is because the web beacon (web bug) used on the page does not always load for these reasons:
Google analytics servers are called by millions of pages every second and can not always process the requests in a timely fashion.
Users often browse away from a page before the full page has loaded and thus there is not enough time to load Googles web beacon to record a hit.
Google analytics require javascript to be installed which can be disabled.
Quite a few (but not substantial amount) of people block google-analytics.com from their browsers, myself included.
The physical log files are the best 'real' representation of site usage as they record every request. Alternatively there are far better 'professional' packages, of which Omniture is my favourite, which have much better response times, alternative methods for recording actions and more functionality.
If you're after things like server data, would RRDTool be something you're after?
It's not really a webserver type stats program though, I have no idea how it would scale.
Edit:
I've also just found Splunk Swarm, if you're interested in something that looks "cool".
Google Analytics is free (up to 50,000 hits per month I think) and is easy to setup with just a little javascript snippet to insert into your header or footer and has great detailed reports, with some very nice graphs.
Google Analytics is quick to set up and provides more sexy graphs than you can shake a stick at.
http://www.google.com/analytics/
Not Invented here but it's on my todo list to setup.
http://awstats.sourceforge.net/
#Ian
Looks like they've raised the limit. Not very surprising, it is google after all ;)
This free version is limited to 5 million pageviews a month - however, users with an active Google AdWords account are given unlimited pageview tracking.
http://www.google.com/support/googleanalytics/bin/answer.py?hl=en&answer=55543
http://www.serverdensity.com/
One option is to use external monitoring tools, which will monitor the web performance from outside the firewall by simulating end user activities.
Catchpoint Systems has an interesting approach that requires very little coding and gives you the performance stats from outside the datacenter and from inside the asp.net (like processing time, etc)
http://www.catchpoint.com/products.html

Resources