How does Market Samurai and Long Tail Pro handle retrieving the top 10 Google search results for a keyword? - google-api

I'm curious to know how Market Samurai, Long Tail Pro and other software handle retrieving the top 10 Google search results and not running into limits. It appears that these software packages use the users own Google account. Google Custom Search limits users to 100 queries per day (the free limit) but people tend to do keyword research on hundreds or even thousands of keywords per day and don't pay any additional amounts to Google.
Are they paying extra for this service, are they using a different API (perhaps the Adwords API?) or are they scraping the Google search results page (violation of TOS)? Really would like to know! Thanks.

i have done this in one of my project (in java).
this is very simple, in java there is one library call JSoup by using this library you can send get request to google, for example:
https://www.google.co.in/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=<your url encoded search term>
this will return you an HTML code of google search result with your own term.
using Jsoup u can find specific HTML tag with specific class or id. this concept helps you to extract url link, title and description from google search result.
for working example check here, in that example you can extract google serach result links with custom search term.
i hope this will help you.

Related

Correct way to search videos with multiple keywords with OR condition for youtube search API

I'm trying to use youtube data search and video API in my web application to display top view-counted videos related with several keywords. I'm planing to use totally two calls: the first call get id list with search API, and the second call get details for the ids hit on the first call, with video API.
My question is with regard to search API. Based on my trial and error, If I input multiple keyword with space separation in the parameter q for search API, it's looks behaves as AND condition it's not same as common behavior such as google. To search with multiple keywords with OR condition, As far as I tried, it's looks working if I Include the OR between keywords, but I would like to confirm my assumption correct, officially if possible.
I should be able to find this kind of specification in the official documentation, but finally I have no luck. It's very helpful if you could share these links if exists or give me the official answer.
By the way, it is my first post to stackoverflow. If there is missing point of my question, please kindly advice.

Has anything changed on geocode API

I just wanted to know if anything changed on geocode API from 21 st February because before 21st it was validating zip code 9 digits but from yesterday it is giving an error on 9 digits zip code and now it only validating 5 digits zip code.
More information in your question would be helpful.
I haven't noticed any change, but I thought I'd take a look at the GeoCoder Documentation FAQ for you.
Yes, based on that date, I'd say something changed recently.
Perhaps this is what you're referring to, but that's only a speculation since you didn't provide any detail or examples.
Troubleshooting
I’m getting more queries that return ZERO_RESULTS with the new geocoder. What’s going on?
In the new geocoder, ambiguous, incomplete and badly formatted queries, such as misspelled or nonexistent addresses, are prone to produce ZERO_RESULTS. These queries would typically produce incorrect results in the old geocoder, such as returning the suburb if the address could not be found. We believe that returning ZERO_RESULTS is actually a more correct response in such situations.
If your application deals with user input of addresses, the Place Autocomplete feature in the Places API may produce better quality results. Place Autocomplete allows users to select from a set of results based on what they’ve typed, which allows users to choose between similarly named results, and to adjust their query if they misspell an address.
If you have an application dealing with ambiguous or incomplete queries or queries that may contain errors, we recommend you use the Place Autocomplete feature in the Places API rather than the forward geocoder available in the Geocoding API. For more details, see Best Practices When Geocoding Addresses and the Address Geocoding in the Google Maps APIs blog post.
More Information:
Documentation FAQ
Related Issue Tracker

Skip common/duplicate parts while indexing web pages with ElasticSearch

I don't have any experience with ElasticSearch yet, but from what I read I think it suits most my needs. I have a web scraper which scrapes pages of certain domains.
I want to feed these pages into SE and offer a front end interface to search the scraped content. I'm building some sort of vertical search engine.
But as we all know, web pages of one host often only contain a little bit of unique content, a great part of the pages are common. Footer, header, menu etc. are the same on every page.
Does ElasticSearch have some build in intelligence that can filter out the common parts and only search the real content??
It's not terribly difficult to pump web content into Elastic, so I'll assume you have that down. =)
I think this article is fantastic for understanding how to index/search web pages:
http://blog.urx.com/urx-blog/2014/9/4/the-science-of-crawl-part-1-deduplication-of-web-content
It's a complex problem and they have some great detail. There is nothing I know of natively in Elastic that has intelligence to help you eliminate duplicates etc.
The strategy you need to adopt here would be to create a unique key per document. Taking checksum using sha1 or similar algorithm will do the job for getting the unique key. Make this the document ID so that only one page occurs at all point of time. Again use _create API to index if you dont want new duplicates to be indexed ( More efficient ) , and in case you want the new ones to be the document use normal indexing.
In case you need to modify the orginal document in case of disocvery of duplicate document , use upser.
I have explained a great deal of this in this blog.

Google Custom Search API (CSE) - Retrieve only discussions

I would like to use the Google custom search API for searching only in discussions like using the query string &tbm=dsc.
Unfortunately there is no tbm parameter given in the API documentation.
Is it not possible to limit the search results to discussions only?
No, there is currently not a way to do the discussion search with CSE/GSS. The only special search is image which is documented in the API reference. You could use Labels and Refinements to limit your search to specific sites and/or patterns.
Limiting search results for Google Custom Search to only discussion websites is not possible. Just in case, remember that Google Custom Search is for searching over one website or a collection of websites. If your collection is all discussion sites, well, that doesn't seem to be the purpose of Google Custom Search. However, there may be some useful workarounds/solutions.
Workaround 0
Find or generate a collection of discussion sites you're interest in and create a custom search based on that. This would accomplish (almost) the same results you are after.
Workaround 1
You might be able to perform a redirection with refinement labels. This example redirects to a Google Scholar search. You might be able to accomplish the same result using &tbm=dsc.
<CustomSearchEngine>
<Title>Universities</Title>
<Context>
<Facet>
<FacetItem title="Papers">
<Label name="papers" mode="FILTER"/>
<Redirect url="http://scholar.google.com/scholar?q=$q"/>
</FacetItem>
</Facet>
</Context>
</CustomSearchEngine>

Using Scrapy to download images from a google search

I am trying to download google images for a particular search.
Currently, if i have the url, my code will download the first 10 images.
However, my question is: How would i get the url for a particular search on google?
When i look at the url for any search on google, it looks very complicated and it seems hard to understand how the url was created
http://www.google.com/m/search?q=hello&site=images
This URL pulls up the mobile website, which is static and is easier to harvest images off of. All parts of the query are self-explanatory
The &q= part of the url is the actual search string. Note that some characters are converted such as space becoming plus etc.
Easy enough to fake by doing https://www.google.com/search?q=a+search
For image search https://www.google.com/search?q=a+search&tbm=isch

Resources