GoogleCustom API number of results - google-api

How is possible to get more results then 10 with googlecoustom API? I think its just take results from 1st page... when I type to search more then 10 I get this error:
Here is request:
https://www.googleapis.com/customsearch/v1?q=Montenegro&cx=002715630024689775911%3Ajczmrpp_vpo&num=10&key={YOUR_API_KEY}
num=10 is number of results
400 Bad Request
- Show headers -
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalid",
"message": "Invalid Value"
}
],
"code": 400,
"message": "Invalid Value"
}
}

Well It is not possible to get more than 10 result from Google Custom Search API.
https://developers.google.com/custom-search/v1/using_rest#query-params
As You can see for num parameter you valid values are only between 1 and 10 inclusive.
To get more result you should make multiple calls. in each different call, increase the value of parameter 'start' by 10. That should do it

For first page result, use
https://www.googleapis.com/customsearch/v1?q=Montenegro&cx=002715630024689775911%3Ajczmrpp_vpo&num=10&start=1&key={YOUR_API_KEY}
This query asks google to provide 10 results starting from position 1. Now you can not ask google for more than 10 results at a time. So you have to query again asking for 10 result starting from 11. So In next query, keep num=10 and start=11. Now you can get all the results by changing start value.

Related

ElasticSearch not able to return data going above 10,000 offset, I am not allowed to make index level changes. Can't use Scroll API

I am running ES query step by step for different offset and limit. For example 100 to 149, then 150 to 199, then 200 to 249.. and so on.
When I keep offset+limit more than 10,000 then getting below exception:
{
"error": {
"root_cause": [
{
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "xyz",
"node": "123",
"reason": {
"type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level parameter."
}
}
]
},
"status": 500
}
I know we can solve this by increasing the "max_result_window". I tried it and it helped too. I increased it to 15,000 then 30,000. But I am not allowed to make index level changes.
So, I changed it back to default one 10,000.
How can I solve this problem? This query is getting hit by an API call.
There are two approach which worked for me-
increasing the max_result_window
Using filter
a. by knowing the unique id of data records
b. by knowing the time frame
First approach was applied using below
PUT /index/_settings
{ "max_result_window" : 10000 }
This worked and solved my problem, but number of records is dynamic element and increasing very fast. So, it is not good to keep increasing this window. Also in my case we use index on sharing basis. So,this change will effect all the users or group on this shared index. So, we moved on to second approach.
Second approach
Part1: First I applied filter on last update timestamp and if record count is greater than 10K then I divide the time frame by half and keep doing it until it reaches count less than 10k.
Part2: As same data is also available in OLTP, I got the complete list of a unique identifier and sorted it. Then applied filter on that identifier and only fetched data in range of 10K. Once 10K data is fetched using pagination, then change the filter and move to next batch of 10k data.
Part3: Applied sorting on last updated timestamp and started fetching data using pagination. Once record count reaches 10k, get the timestamp of 9999 record and apply greater_than filter on identifier and then fetch next 10k records.
All mentioned solution helped me. But I selected the Part3 of second approach. As it is easy to implement and give a sorted data quickly.
Consider scroll API - https://www.elastic.co/guide/en/elasticsearch/reference/2.2/search-request-scroll.html
This is also suggested in manual

GeoCoding returns ZERO_RESULTS

The following is my query:
https://maps.googleapis.com/maps/api/geocode/json?latlng=53.477752,-2.266695&result_type=street_address&key=*****
and it returns empty result:
{
"plus_code": {
"compound_code": "FPHM+48 Salford, UK",
"global_code": "9C5VFPHM+48"
},
"results": [],
"status": "ZERO_RESULTS"
}
I have tested the coordinates 53.477752,-2.266695 on https://www.latlong.net with the place name "Manchester" and the website is showing King Street which is correct. Why GoogleAPI returns ZERO_RESULTS? Is this issue a still bug in Google or do I need to add additional parameters?
Upon testing the coordinate (53.477752,-2.266695), I was able to get a couple of results. However, it seems that not one of the address results contains the result_type "street_address". This is the reason why you get zero results as a response, since you've added a result_type=street_address filter in the query.
Please see the Geocoder tool to see the actual results. Here's the sample request: https://maps.googleapis.com/maps/api/geocode/json?&latlng=53.477752,-2.266695&key=YOUR_API_KEY

ElasticSearch: Return the query within the response body when hits = 0

Please note that the following example is a very minified version of a real life use case, it is for the question to be easy to read and to make a point.
I have the following document structure:
{
"date" : 1400500,
"idc" : 1001,
"name": "somebody",
}
I am performing an _msearch query (multiple searchs at a time) based on different values (the "idc" and a "date" range)
When ES could not find any documents for the given date range it returns:
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
But, since there are N results, I cannot tell which "idc" and what "date" range was this result for.
I would like the response to have the "searched" date range and "idc" when there are no results for the given query. for example, if I am searching documents for IDC = 1001 and date between 1400100 and 1400200, but there are no results found, the response should have the query terms in the response body, something like this:
"hits":{
"total":0,
"max_score":null,
"query": {
"date": {
"gt": 1400100,
"lte": 1400200,
}
"idc": 1001,
}
}
That way I can tell what date range and "idc" combination has no results.
Please note that the above example is a very minified version of a real life use case, it is for the question to be easy to read and to make a point.
This is from the docs
multi search API(_msearch) response returns a responses array, which includes the search
response and status code for each search request matching its order in
the original multi search request.
since you know the order in which you sent the requests , you can find out which request failed.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html

Debugging data structure errors in BigQuery

BigQuery often ends bq load with ambiguous
Waiting on <jobid> ... (68s) Current status: DONE
BigQuery error in load operation: Error processing job
'<jobid>': Too many errors
encountered. Limit is: {1}.
When I do bq --format=prettyjson show -j <jobid> to find out what's wrong, I get:
"status": {
"errorResult": {
"message": "Too many errors encountered. Limit is: {1}.",
"reason": "invalid"
},
"errors": [
{
"message": "Too many errors encountered. Limit is: {1}.",
"reason": "invalid"
}
],
"state": "DONE"
},
Which usually indicates that bq dislikes something about the data structure.
But how can I find out what is wrong? Which row or column bq exits on with error?
Update
Apparently, sometimes, bq returns "Failure details", where it says which column and line caused an error. But I couldn't replicate getting these details. They appear arbitrary on the same instance, data, and command.
I found a few options in bq help load to let the data pass through:
--[no]autodetect: Enable auto detection of schema and options for formats that are not self
describing like CSV and JSON.
--[no]ignore_unknown_values: Whether to allow and ignore extra, unrecognized values in CSV or
JSON import data.
--max_bad_records: Maximum number of bad records allowed before the entire job fails.
(default: '0')
(an integer)
They allow dropping bad values, but many rows may be lost. And I couldn't find where bq returns the number of dropped rows.

How to use multiple dimension and metric values in google adsense version api 1.4

When generating reports from google adsense management api v 1.4 can how listed here :
https://developers.google.com/adsense/management/v1.4/reference/accounts/reports/generate
I checked out the metrics and dimension given here :
https://developers.google.com/adsense/management/metrics-dimensions
I tried to pass these values in request and its working fine for me :
startDate=2015-07-07&
endDate=2015-07-12&
dimension=AD_UNIT_NAME&
metric=AD_REQUESTS_CTR
But how should i pass multiple metric values ?
for example if i want metric values :
AD_REQUESTS_RPM ,
AD_REQUESTS_RPM ,
CLICKS ,
EARNINGS etc
I tried to separate them with normal and url-encoded values of
":"
","
" " (space)
But nothing works for me , i am getting this error :
{
"error": {
"errors": [
{
"domain": "global",
"reason": "invalidParameter",
"message": "Invalid value 'AD_REQUESTS_CTR:AD_REQUESTS_RPM'. Values must match the following regular expression: '[a-zA-Z_]+'",
"locationType": "parameter",
"location": "metric[0]"
}
],
"code": 400,
"message": "Invalid value 'AD_REQUESTS_CTR:AD_REQUESTS_RPM'. Values must match the following regular expression: '[a-zA-Z_]+'"
}
}
so i found the solution to pass multiple metric and dimension value and i think its a really poor api design by google adsense team .
This is how it works :
GET https://www.googleapis.com/adsense/v1.4/accounts/pub-423423423432/reports?alt=json&
startDate=2015-07-07&
endDate=2015-07-12&
dimension=AD_UNIT_NAME&
metric=AD_REQUESTS_RPM&
metric=AD_REQUESTS_RPM&
metric=CLICKS&
metric=EARNINGS

Resources