Retrieve all docs using top_hits aggregation ElasticSearch - elasticsearch

I am using top_hits aggregation to retrieve documents along with counts.I need to retrieve all the document based on my earlier post here, for which thought passing size 0 will do it but it throws following error.
org.elasticsearch.search.query.QueryPhaseExecutionException: [my-demo][3]: query[ConstantScore(*:*)],from[0],size[10]: Query Failed [Failed to execute main query]
at org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:162)
at org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:261)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:206)
at org.elasticsearch.search.action.SearchServiceTransportAction$5.call(SearchServiceTransportAction.java:203)
at org.elasticsearch.search.action.SearchServiceTransportAction$23.run(SearchServiceTransportAction.java:517)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.lang.IllegalArgumentException: numHits must be > 0; please use TotalHitCountCollector if you just need the total hit count
at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:254)
at org.apache.lucene.search.TopScoreDocCollector.create(TopScoreDocCollector.java:238)
at org.elasticsearch.search.aggregations.metrics.tophits.TopHitsAggregator.collect(TopHitsAggregator.java:108)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectExistingBucket(BucketsAggregator.java:63)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucket(BucketsAggregator.java:55)
at org.elasticsearch.search.aggregations.bucket.terms.GlobalOrdinalsStringTermsAggregator$WithHash.collect(GlobalOrdinalsStringTermsAggregator.java:236)
at org.elasticsearch.search.aggregations.AggregatorFactories$1.collect(AggregatorFactories.java:114)
at org.elasticsearch.search.aggregations.BucketCollector$2.collect(BucketCollector.java:81)
at org.elasticsearch.search.aggregations.bucket.BucketsAggregator.collectBucketNoCounts(BucketsAggregator.java:74)

According to elasticseach, size - The maximum number of top matching hits to return per bucket. By default the top three matching hits are returned. so, size = 0 means no documents (i think) so try sending maximum values. – progrrammer 31 mins ago
Top hit aggregation response is of format,
"top_tags_hits": {
"hits": {
"total": 25365,
"max_score": 1,
"hits": [
{
"_index": "stack",
"_type": "question",
"_id": "602679",
"_score": 1,
"_source": {
"title": "Windows port opening"
},
"sort": [
1370143231177
]
}
]
}
here hits - > total give total no of hits, you can use pagination(from, and size) as in search api, to get documents or use maximum integer value [(2^31)-1] to get all the documents.
Hope this helps.

Related

return empty result for a nested bool query on fields that don't have data

I'm doing the following query:
the ns.ns field has configured (has both mapping and setting set up successfully) but there is no source data for this field. and I get empty result returned from ElasticSearch. is that right? I mean without data this query would return empty result, is that correct? Still learning ES and thanks for the help.
The ns.ns field has configured (has both mapping and setting set up
successfully) but there is no source data for this field. and I get
empty result returned from ElasticSearch. is that right?
without data this query would return an empty result, is that correct?
As you have mentioned above that the ns field is mapped as type nested, therefore when you hit the search query you will not get "index_not_found_exception", since the index already exists.
The search API returns search hits that match the query defined in the request.
When you hit the search query, mentioned in the question above, the following response is there:
{
"took": 17,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 0,
"relation": "eq"
},
"max_score": null,
"hits": []
}
}
The response provides the following information about the search request:
took – how long it took Elasticsearch to run the query, in
milliseconds
timed_out – whether or not the search request timed out
_shards – how many shards were searched and a breakdown of how many shards succeeded, failed, or were skipped.
max_score – the score of the most relevant document found
hits.total.value - how many matching documents were found
The hits.hits above returns a blank array([]), hits.hits is an array of found documents that meet your search query. As here no documents are indexed, therefore no documents are matched when a search query is hit.
Refer to this ES documentation, to know more about how scoring works in ES
In the above response max_score value is NULL, the _score in
Elasticsearch is a way of determining how relevant a match is to the
query.

Why return error result when use _routing field?

Such as url index-0/_search?routing=24320,i search data from 24230 routing,but the result is
"_index": "index-0",
"_type": "member",
"_id": "40865630",
"_score": 1,
"_routing": "22500",
Why 22500 match the search condition?
What happens is that when specifying ?routing=24320 in your search query, you're basically selecting the single shard on which documents with the routing value of 24320 have been stored.
Now, since your query doesn't specify any other constraints, you're basically getting all documents stored on that shard, which obviously means that you also get documents whose routing value is 22500 (and probably others, too).

Per thread event visualization from logfile using Kibana

Log records of the following shape produced by a multi-threaded application are pushed to my elasticsearch index 'log':
[2015-10-09T09:52:18.928] [Debug] [00000x2934 0x000026c0] [Visualization]: updated rendering
[2015-10-09T09:52:19.966] [Debug] [00000x2934 0x000013a0] [Database]: Query request accepted
...
The two hexadecimal entries are process and thread ID.
A document in the elasticsearch index looks like this:
{
"_index": "log",
"_type": "record",
"_id": "AVBXUnCah58TK-z65dea",
"_score": 1,
"_source": {
"process": "00000x2934",
"severity": "Debug",
"thread": "0x000026c0",
"recordId": 1,
"timestamp": "2015-10-09T09:52:18.928",
"message": "updated rendering",
"channel": "Visualization"
}
}
How can I create a Kibana visualization that has a time range as X axis and on its Y axis it has entries for the different thread IDs (in my application there is a thread pool with a fixed number of threads, <= 10). There should be a point (X/Y) for such an event with appropriate information (message or channel).
To make it short: How can I visualize the chronology of a multithreaded application by its events using this search index and Kibana?
One additional note: If there is an easy solution without Kibana, I am also okay with that. It doesn't need to be real-time.
I feel Gantt Chart might be useful to you. You can have one entry in Y axis per thread and visualize its function in each time frame. This is not available in Kibana4 as of now , but we can expect this in the future.

Inconsistent doc count

Hi I am running Elasticsearch 1.5.2
I indexed 6,761,727 documents in one of my indexes.
When I run the following query....
GET myindex/mytype/_search
{
"size": 0
}
The hits.total count keeps alternating between 2 values...
"hits": {
"total": 6761727,
"max_score": 0,
"hits": []
}
and
"hits": {
"total": 6760368,
"max_score": 0,
"hits": []
}
No matter how many times I run the query the count goes back and forth between the 2.
I searched around a bit and found out that it seems that primary vs replica shards don't have exact same number of docs. If I use preference=primary then the doc count returned is correct.
What is the easiest way to check which shard is the culprit and try to fix him without re-indexing everything?
Set the replica count to 0 for that index
PUT /my_index/_settings
{
"index": {
"number_of_replicas": 0
}
}
wait to see no more replicas for that index when you do GET /_cat/shards/my_index?v and then set back to the initial number of replicas.
This will delete all the replicas for that index and then make a new copy of the primaries.

ElasticSearch - how to SUM function_score by field

I have a query that is using function_score to rank the results. Here is a sample of what is returned:
{
"_index": "clone",
"_type": "authEvent",
"_id": "6431823",
"_score": 4.8,
"fields": {
"authInput.uID": "MPXWDKW2P",
"authResult.productValue": 1,
"authInput.userName": "F936F3AA-E26C-48DB-BDBC-44956B634260",
"authResult.authEventDate": "2014-02-27T09:29:30.703125-06:00",
"authResult.rulesFailed": [
"AuthCountByUser"
]
}
}
What I want to is take the results and run the equivalent of this SQL statement:
SELECT TOP 20 "authInput.userName", SUM("_score")
FROM foo
GROUP BY "authInput.userName"
ORDER BY SUM("_score") DESC
How can I do this with ES?
NOTE: I'm using ES 0.9x, we will be moving to 1.0.0 soon but we have not yet.
Use a facet query to get the total of the amount returned in the query where the facet contains the field where you need the count

Resources