Elasticsearch Aggregation Scope Issue - elasticsearch

I have an Elasticsearch Index with more than 100 Millions of records.
If I run below query then response comes (1 record) within 1 second
{
"query": {
"bool": {
"must":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
}
}
But if I run below query then response comes in more than 2 minutes.
{
"query": {
"bool": {
"must":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}
I don't know why it is taking so much time after adding aggregation on the top of the query where _id = a36403af960840b86452bf1a6bd42fde3b4773e0 which matches with only 1 record.
As per my assumption elastic search is applying aggregation on the output of the data. So technically it should run aggregation on 1 record and response must come within 1 second too almost same as without using aggs.
How to fix this issue?
I am using Elastic Search Version 1.5

It's a good example where you need to consider choosing filter context over query context.
Try running the same query using filter as shown below:
GET my-index/_search
{
"query": {
"bool": {
"filter":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}

My first suggestion
is that to upgrade :-
I have tried your second query in 1.7.2 , it very fast . I think upgradation will definitely solve your issue .
Second suggestion
Not sure it will work with Elastic Search Version 1.5 .
try this query :-
{
"query": {
"constant_score": {
"filter":{
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
},
"aggs": {
"mywordcloud": {
"terms": {
"field": "post.content_terms"
}
}
}
}
OR
{
"aggregations": {
"bylife": {
"terms": {
"field": "post.content_terms"
},
"aggregations": {
"bylife2": {
"filter": {
"term": {
"_id": "a36403af960840b86452bf1a6bd42fde3b4773e0"
}
}
}
}
}
}
}
I know this will give different data , but u can change your logic with this approach .

Related

Elasticsearch aggregations significant_text without query block returns zero buckets

I want to learn elasticsearch and I am following this guide:
https://github.com/LisaHJung/Part-2-Understanding-the-relevance-of-your-search-with-Elasticsearch-and-Kibana-
This command worked correctly as described in the guide, it will return buckets with significant_texts:
GET news_headlines/_search
{
"query": {
"match": {
"category": "ENTERTAINMENT"
}
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
I thought I'd explore by trying to find significant_text against ALL documents in my index. But both these attempts gave my zero bucketed items:
GET news_headlines/_search
{
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
GET news_headlines/_search
{
"query": {
"match_all": { }
},
"aggregations": {
"popular_in_entertainment": {
"significant_text": {
"field": "headline"
}
}
}
}
What did I do wrong? Or is there something about aggregations that I don't understand?

elasticsearch multi field query is not working as expected

I've been facing some issues with multi field elasticsearch query. I am trying to query all the documents which matches the field called func_name to two hard coded strings, even though my index has documents with both these function names, but the query result is always fetching only one func_name. So far I have tried following queries.
1) Following returns only one function match, even though the documents have another function as well
GET /_search
{
"query": {
"multi_match": {
"query": "FEM_DS_GetTunerStatusInfo MDM_TunerStatusPrint",
"operator": "OR",
"fields": [
"func_name"
]
}
}
}
2) following intermittently gives me both the functions.
GET /_search
{
"query": {
"match": {
"func_name": {
"query": "MDM_TunerStatusPrint FEM_DS_GetTunerStatusInfo",
"operator": "or"
}
}
}
}
3) Following returns only one function match, even though the documents have another function as well
{
"query": {
"bool": {
"should": [
{ "match": { "func_name": "FEM_DS_GetTunerStatusInfo" }},
{ "match": { "func_name": "MDM_TunerStatusPrint" }}
]
}
}
}
Any help is much appreciated.
Thanks for your reply. Lets assume that I have following kind of documents in my elasticsearch. I want my search to return first two documents out of all as they matches my func_name.
{
"_index": "diag-178999",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "MDM_TunerStatusPrint",
"timestamp": "2017-06-01T02:04:51.000Z"
}
},
{
"_index": "diag-344563",
"_source": {
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "FEM_DS_GetTunerStatusInfo",
"timestamp": "2017-07-20T02:04:51.000Z"
}
},
{
"_index": "diag-101010",
"_source": {
"severity": "MIL",
"t_id": "03468500",
"p_id": "000007c6",
"func_name": "some_func",
"timestamp": "2017-09-15T02:04:51.000Z"
}
The "two best ways" to request your ES is to filter by terms on a particular field or to aggregate your queries so that you can rename the field, apply multiple rules, and give a more understandable format to your response
See : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html and the other doc page is here, very useful :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
In your case, you should do :
{
"from" : 0, "size" : 2,
"query": {
"filter": {
"bool": {
"must": {
"term": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint",
}
}
}
}
}
}
OR
"aggs": {
"aggregationName": {
"terms": {
"func_name" : "FEM_DS_GetTunerStatusInfo OR MDM_TunerStatusPrint"
}
}
}
}
The aggregation at the end is just here to show you how to do the same thing as your query filter. Let me know if it's working :)
Best regards
As I understand, you should use filtered query to match any document with one of the values of func_name mentioned above:
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
}
}
See:
Filtered Query, Temrs Query
UPDATE in ES 5.0:
{
"query": {
"bool": {
"must": [
{
"terms": {
"func_name": [
"FEM_DS_GetTunerStatusInfo",
"MDM_TunerStatusPrint"
]
}
}
]
}
}
}
See: this answer

Elasticsearch aggregations on nested inner hits

I got a large amount of data in Elasticsearch. My douments have a nested field called "records" that contains a list of objects with several fields.
I want to be able to query specific objects from the records list, and therefore I use the inner_hits field in my query, but It doesn't help because aggregation uses size 0 so no results are returned.
I didn't succeed to make an aggregation work only for inner_hits, as aggregation returns results for all the objects inside records no matter the query.
This is the query I am using:
(Each document has first_timestamp and last_timestamp fields, and each object in the records list has a timestamp field)
curl -XPOST 'localhost:9200/_msearch?pretty' -H 'Content-Type: application/json' -d'
{
"index":[
"my_index"
],
"search_type":"count",
"ignore_unavailable":true
}
{
"size":0,
"query":{
"filtered":{
"query":{
"nested":{
"path":"records",
"query":{
"term":{
"records.data.field1":"value1"
}
},
"inner_hits":{}
}
},
"filter":{
"bool":{
"must":[
{
"range":{
"first_timestamp":{
"gte":1504548296273,
"lte":1504549196273,
"format":"epoch_millis"
}
}
}
],
}
}
}
},
"aggs":{
"nested_2":{
"nested":{
"path":"records"
},
"aggs":{
"2":{
"date_histogram":{
"field":"records.timestamp",
"interval":"1s",
"min_doc_count":1,
"extended_bounds":{
"min":1504548296273,
"max":1504549196273
}
}
}
}
}
}
}'
Your query is pretty complex.
To be short, here is your requested query:
{
"size": 0,
"aggregations": {
"nested_A": {
"nested": {
"path": "records"
},
"aggregations": {
"bool_aggregation_A": {
"filter": {
"bool": {
"must": [
{
"term": {
"records.data.field1": "value1"
}
}
]
}
},
"aggregations": {
"reverse_aggregation": {
"reverse_nested": {},
"aggregations": {
"bool_aggregation_B": {
"filter": {
"bool": {
"must": [
{
"range": {
"first_timestamp": {
"gte": 1504548296273,
"lte": 1504549196273,
"format": "epoch_millis"
}
}
}
]
}
},
"aggregations": {
"nested_B": {
"nested": {
"path": "records"
},
"aggregations": {
"my_histogram": {
"date_histogram": {
"field": "records.timestamp",
"interval": "1s",
"min_doc_count": 1,
"extended_bounds": {
"min": 1504548296273,
"max": 1504549196273
}
}
}
}
}
}
}
}
}
}
}
}
}
}
}
Now, let me explain every step by aggregations' names:
size: 0 -> we are not interested in hits, only aggregations
nested_A -> data.field1 is under records so we dive our scope to records
bool_aggregation_A -> filter by data.field1: value1
reverse_aggregation -> first_timestamp is not in nested document, we need to scope out from records
bool_aggregation_B -> filter by first_timestamp range
nested_B -> now, we scope again into records for timestamp field (located under records)
my_histogram -> finally, aggregate date histogram by timestamp field
Inner_hits aggregation is not supported by elasticsearch. The reason behind it is that inner_hits is a very expensive operation and applying aggregation on inner_hits is like exponential increase in complexity of operation.
Here is the github link of the issue.
If you want aggregation on inner_hits you can probably use the following approach:
Make flexible query where you only get the required hit from elastic and aggregate over it. Repeat it multiple time to get all the hits and aggregate simultaneously. This approach may lead you with multiple search query which is not advisable.
You can make your application layer handle the aggregation logic by writing smart aggregation parser and run those parser on response from elasticsearch. This approach is a little better but you have an overhead of maintaining the parser according to changing needs.
I would personally recommend you to change your data-mapping style in elasticsearch so that you are able to run aggregation on it.
You can also check the code like this
PUT records
{
"mappings": {
"properties": {
"records": {
"type": "nested"
}
}
}
}
POST records/_doc
{
"records": [
{
"data": "test1",
"value": 1
},
{
"data": "test2",
"value": 2
}
]
}
GET records/_search
{
"size": 0,
"aggs": {
"all_nested_count": {
"nested": {
"path": "records"
},
"aggs": {
"bool_aggs": {
"filter": {
"bool": {
"must": [
{
"term": {
"records.data": "test2"
}
}
]
}
},
"aggs": {
"filtered_aggs": {
"sum": {
"field": "records.value"
}
}
}
}
}
}
}
}
Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/inner-hits.html

Elasticsearch aggregation using a bool filter

I've the following query which works fine on Elasticsearch 1.x but does not work on 2.x (I get doc_count: 0) since the bool filter has been deprecated. It's not quite clear to me how to re-write this query using the new Bool Query.
{
"aggregations": {
"events_per_period": {
"filter": {
"bool": {
"must": [
{
"terms": {
"message.facility": [
"facility1",
"facility2",
"facility3"
]
}
}
]
}
}
}
},
"size": 0
}
Any help is greatly appreciated.
I think you might want aggregation on multi fields with filter :-
Here I assume filter for id and aggregation on facility1 and facility2 .
{
"_source":false,
"query": {
"match": {
"id": "value"
}
},
"aggregations": {
"byFacility1": {
"terms": {
"field": "facility1"
},
"aggs": {
"byFacility2": {
"terms": {
"field": "facility2"
}
}
}
}
}
}
if you want aggregation on three field , check link.
For java implementation link2

elasticsearch filter aggs by doc count

I have a query that counts the number of images per user:
GET images/_search
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
It basically works fine, but I would like to exclude all aggregation results for users that have less than 200 images. How can I tweak the query above to achieve this?
Thanks.
You can achieve this by using a Minimum Document Count option.
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID",
"min_doc_count": 200
}
}
}
Add a filter aggregation to your terms aggregation with the query clause.
Filter Aggregations
You can modify your above query to look like this.
{
"query": {
"bool": {
"must": [
{
"term": {
"appID.raw": "myApp"
}
}
]
}
},
"size": 0,
"aggs": {
"filtered_users_with_images_count": {
"filter": {
"term": {
"count": 200
}
},
"aggs": {
"perDeviceAggregation": {
"terms": {
"field": "deviceID"
}
}
}
}
}
}
You can modify the filter inside filtered_users_with_images_count to match documents with images greater than 200.
Please also consider to post your data mappings along with query to support your questions.

Resources