How to get search hits results when executing aggregation? - elasticsearch

As stated in the ElasticSearch documentation:
In Elasticsearch, you have the ability to execute searches returning hits and at the same time return aggregated results separate from the hits all in one response. This is very powerful and efficient in the sense that you can run queries and multiple aggregations and get the results back of both (or either) operations in one shot avoiding network roundtrips using a concise and simplified API.
I want to execute searches returning hits when i have queries out for the aggregation. But i am not sure how can i achieve the above?
I am using the following query:
curl -XPOST 'localhost:9200/employee/_search?pretty' -d '
{
"size": 0,
"aggs": {
"group_by_domain": {
"terms": {
"field": "domain"
}
}
}
}'
and here is the result which i am getting,
{
"took" : 92,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1000,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"group_by_domain" : {
"doc_count_error_upper_bound" : 5,
"sum_other_doc_count" : 744,
"buckets" : [ {
"key" : "finance",
"doc_count" : 30
}]
}
}
}
As we can see that the hits array is empty. I am not sure how to get those hits array. Any suggestion?

the hits are empty because you have set the size of the returning query to 0 when you specify:
"size": 0,
you can remove size completely and in this case you'll get 10 hits that is the default or you can set the size you want, for instance if you specify 100 you'll get 100 hits in response. This is related to the search results.
Now, if you also want to get results in the aggregation you can use Top Hits Aggregation for that.

Related

How to pass custom data to response in kibana devtools

I want to execute multiple elastic search requests on AWS kibana. Because of amazon cognito auth I cannot do it using api, I have to use kibana devtools.
I've generated my requests, and I need to connect request to response. How can I attach custom string to request, so it's printed back to response?
I've tried attaching things to query or fragment, and they are displayed back, but apparently they alter search results.
Example devtools request:
GET /_search?rest_total_hits_as_int=true
{
"query": {
... stuff...
}
}
and response:
{
"took" : 18490,
"timed_out" : false,
"_shards" : {
"total" : 87,
"successful" : 87,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 126266,
"max_score" : null,
"hits" : [ ]
}
}
I would like to add custom string, i.e. "ccsfewafd332rs" to both request and get it back in response so I can correlate them when running multiple requests in UI.
You can use named queries exactly for this purpose.
Simply add "_name": "ccsfewafd332rs" to your top-level query and you'll get it back in the response, e.g.
{
"query": {
"match_all": {
"_name": "ccsfewafd332rs"
}
}
}

Elastic search nested aggregations - method from documentation is not working

I'm new to ES and am struggling with nested aggregations.
Here is my dummy data object ([Here is my data object][1][1]: https://i.stack.imgur.com/X7oaM.png). I'm just trying to get the minimum cost out of the "modern" field.
I have read the following posts in regards to the problem I'm trying to solve. None of them have helped me solve the problem
- Elastic Search 6 Nested Query Aggregations
- https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html
- https://madewithlove.be/elasticsearch-aggregations/
- https://iridakos.com/tutorials/2018/10/22/elasticsearch-bucket-aggregations.html
- https://github.com/elastic/elasticsearch/issues/9317
Moreover, I've searched all over stackoverflow and have had not had success (yes, I've tried just about every solution I've come across with no success).
According to the docs and the above posts and more, a nested aggregation should be ran as follows:
GET /loquesea/_search
{
"size": 0,
"aggs": {
"modern_costs": {
"nested": {
"path": "modern"
},
"aggs": {
"min_cost": {
"min": {
"field": "modern.cost1"
}
}
}
}
}
}
However, upon completion, this is what I get:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"modern_costs" : {
"doc_count" : 0
}
}
}
I've spent hours trying to make just a basic nested aggregation work. What am I doing wrong?
problem is solved. Turns out that since I didn't declare in the mapping the type of "cars" as "nested", the nested aggregation will not work. Reason being is that in Elastic Search, since the type was not declared as "nested", ES will treat "cars" as an object.

Elasticsearch Distinct query after setting fielddata to true

I am trying to get distinct values of a field "vip_name" on an index.
This is what I tried to begin with:
curl -XGET http://172.31.38.157:9200/cb_inventory/_search -d
'{"size":0,"aggs":{"vips":{"terms":{"field":"vip_name"}}}}'
{"error":{"root_cause":
[{"type":"illegal_argument_exception","reason":"Fielddata is disabled
on text fields by default. Set fielddata=true on [vip_name] in order to
load fielddata in memory by uninverting the inverted index. Note that
this can however use significant memory. Alternatively use a keyword
field
instead."}],"type":"search_phase_execution_exception","reason":"all
shards failed","phase":"query","grouped":true,"failed_shards": [{"shard":0,
"index":"cb_inventory","node":"7_t7zG82QsS__Q_vRHWy9A","reason":
{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields
by default.
OK. So I set fielddata to true as below:
curl -XPUT http://172.31.38.157:9200/cb_inventory/_mapping/cb_inventory -d '{"properties":{"vip_name":{"type":"text","fielddata":true}}}'
{"acknowledged":true}
Now I do the search and get back the below:
curl -XGET http://172.31.38.157:9200/cb_inventory/_search?pretty=true -d '{"size":0,"aggs":{"vips":{"terms":{"field":"vip_name","size":1000}}}}'
{"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"vips" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "domain.com",
"doc_count" : 3
},
{
"key" : "ppcbcl00021",
"doc_count" : 3
}
]
}
}
}
This is a bit funny, since I have only one distinct value ppcbcl00021.domain.com . Now it is showing up as 2 broken distinct values.
How Do I go about getting a distinct value as "ppcbcl00021.domain.com"
This is because vip_name is set into text not keyword. So, even though you have ppcbcl00021.domain.com, in the ES, it will be stored as chunk of text ie ppcbcl00021 and domain.com.
Try again by setting vip_name to keyword
curl -XPUT http://172.31.38.157:9200/cb_inventory/_mapping/cb_inventory -d '{"properties":{"vip_name":{"type":"keyword"}}}'

Score calculation for query of size = 0

I am using queries with size param set to 0 for getting fast counts without fetching docs data.
{
"query": {
<query_body>
},
"size": 0
}
Am I right with my assumption that the score calculation is not being performed in such cases?
I have some doubts. E.g. when I am querying with sort another than the _score I get "max_score": null which confirms that the score is not being calculated in that case. But in this current case ("size": 0) I get "max_score": 0 that looks more like the score is being calculated, but no docs are returned, so the max_score is 0.
Might not be the answers you are looking for, but still: It could well be that the score is still calculated. In your case I would use another solution. You should use the search type of a query:
?search_type=count
More information can be found here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-search-type.html#search-request-search-type
You can use a combination of size: 1 and _source: false to limit the size of the return results but you need to have at least size: 1 to have the max_score appear. By default it should be sorting by score so the top return result will have the highest score.
Here is what I did:
{
"size": 1,
"_source": false,
"query": {
<query_body>
}
}
Which returns a result like this:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.37365946,
"hits" : [
{
"_index" : "examples",
"_type" : "_doc",
"_id" : "9deff2cf-4e9b-46a9-8e56-2d97d1b2535a",
"_score" : 0.37365946
}
]
}
}
So you get one hit which lacks its _source field which means that this payload should always be this size exactly regardless of your documents.

ElasticSearch count returned result

I want to count number of document returned as a result of a query with size limit. For example, I run following query:
curl -XGET http://localhost:9200/logs_-*/a_logs/_search?pretty=true -d '
{
"query" : {
"match_all" : { }
},
"size" : 5,
"from" : 8318
}'
and I get:
{
"took" : 5,
"timed_out" : false,
"_shards" : {
"total" : 159,
"successful" : 159,
"failed" : 0
},
"hits" : {
"total" : 8319,
"max_score" : 1.0,
"hits" : [ {
....
Total documents matching my query are 8319, but I fetched at max 5. Only 1 document was returned since I queried "from" 8318.
In the response, I do not know how many documents are returned. I want to write a query such that the number of documents being returned are also present in some field. Maybe some facet may help, but I could not figure out. Kindly help.
Your query :
{
"query" : {
"match_all" : { }
},
=> Means that you ask all your data
"size" : 5,
=> You want to display only 5 results
"from" : 8318
=> You start from the 8318 records
ElasticSearch respons :
....
"hits" : {
"total" : 8319,
...
=> Elastic search told you that there is 8319 results in his index.
You ask him all the result and you start from the 8318.
8319 - 8318 = 1 So you have 1 result.
Try by removing the from.
Looking through the documentation, it's not clear how to make the query return this -- if indeed the API supports it. If you just want to have the count of the returned hits, the easiest way seems to be to actually count them yourself after parsing the response.

Resources