Elasticsearch avoiding maxClauseCount error in aggregation - elasticsearch

I have an index that contains the following documents in Elasticsearch 5.X.
It holds the string of a line of a document file as a single record.
{"file_id":"file0001", "row_id":"1", "text":"(text field to search...)"}
{"file_id":"file0001", "row_id":"2", "text":"(text field to search...)"}
{"file_id":"file0001", "row_id":"3", "text":"(text field to search...)"}
{"file_id":"file0002", "row_id":"1", "text":"(text field to search...)"}
{"file_id":"file0002", "row_id":"2", "text":"(text field to search...)"}
...Millions of documents...
And send the following query to extract the top 500 hit rows for each file.
{
"_source":[
"file_id",
"text"
],
"size":0,
"query":{
"filtered":{
"query"{
"must":{
"regexp":{
"text":".*[o2].*"
}
}
},
"filter":{
"terms":{
"file_id":[
(Thousands of file_ids...)
]
}
}
}
},
"aggs":{
"top-docs":{
"terms":{
"field":"file_id",
"size":5000
},
"aggs":{
"top_file_hits":{
"top_hits":{
"size":500,
"highlight":{
"pre_tags":["<em>"],
"post_tags":["</em>"],
"fields":{
"text":{}
}
}
}
}
}
}
}
}
Then the following error is returned.
{
"error" : {
"root_cause" : [
{
"type" : "too_many_clauses",
"reason" : "maxClauseCount is set to 1024"
I consider the aggs process heavy, but I can't think of a way not to use it.
Any ideas?

Related

Nested aggregation with term agg

I have a document with 2 nested paths - path.to.node and different.path.
I want to be able to get a date histogram based on the path.to.node.date field but then group the buckets based on different.path.to.name.
Is that possible?
I tried something like this but it doesn't seem to work...
{
"size":0,
"query":{...},
"aggregations":{
"path.to.node.date":{
"nested":{
"path":"path.to.node"
},
"aggregations":{
"path.to.node.date":{
"filter":{
"range":{...}
}
},
"aggregations":{
"different.path.name":{
"nested":{
"path":"different.path"
},
"terms":{
"field":"different.path.name"
...
},
"aggregations":{
"path.to.node.date":{
"date_histogram":{
"field":"path.to.node.date",
"interval":"1M",
"offset":0,
"order":{"_key":"asc"},
"keyed":false,"min_doc_count":0}
}
}
}
}
}
}
}
}
}

Get result from aggs in script ElasticSearch/Painless

I'm new in ElasticSearch world. I've been trying write simple request and I need to get aggs result in my script to make simple condition. Is it possible to do it in this way?
The condition below is only for example.
GET _search
{
"aggs" : {
"sum_field" : { "sum" : { "field" : "someField" } }
},
"script_fields": {
"script_name": {
"script": {
"lang": "painless",
"source": """
// get there aggs result (sum_field)
if(sum_field > 5){
return sum_field
}
"""
}
}
}
}
The requirement is to execute sum aggregation over multiple indexes having the same field name
Now with multiple indexes, you'll have to check if that particular field exists in that indexes or not AND if the field is of the same datatype.
Indexes
I've created three indexes, having a single field called num.
index_1
- num: long
index_2
- num: long
index_3
- num: text
: fielddata: true
Also notice how if the field is of type text, then I've set its property fielddata:true. But if you do not set it, then the below query would give you aggregation result as well as an error saying you cannot retrieve the value of type text as its an analyzed string and you can only use doc for fields which are non_analyzed.
Sample Query:
POST /_search
{
"size":0,
"query":{
"bool":{
"filter":[
{
"exists":{
"field":"num"
}
}
]
}
},
"aggs":{
"myaggs":{
"sum":{
"script":{
"source":"if(doc['num'].value instanceof long) return doc['num'].value;"
}
}
}
}
}
Query if you cannot set fielddata:true
In that case, you need to explicitly mention the indexes on which you'd want to aggregate.
POST /_search
{
"size":0,
"query":{
"bool":{
"filter":[
{
"exists":{
"field":"num"
}
},
{
"terms":{
"_index":[
"index_1",
"index_2"
]
}
}
]
}
},
"aggs":{
"myaggs":{
"sum":{
"script":{
"source":"if(doc['num'].value instanceof long) return doc['num'].value;"
}
}
}
}
}
Hope this helps!

Elasticsearch sort search results without term

I want to retrieve all records inside a particular type and sort it using a date field. I am using this code:
{
"query":{
"filtered":{
"filter":{
"type" : {
"value" : "articles"
}
}
}
},
"from":0,
"size":10,
"sort":[
{
"date_entered":{
"order":"desc"
}
}
]
}
But, the output from this query seems to be not sorted with the specified field. The field date_entered is formatted like this: January 01, 1970 12:00 AM
How can I return all the records within a page that is sorted using the date_entered field?

Elastic search query working for numbers but not for strings

In my elastic search each doucument is in below format
{
"_index" : "logstash-2014.08.11",
"_type" : "apache-error",
"_id" : "b9vZxg-wRbudJbsV6-vD-A",
"_score" : 1.0,
"_source" : {
"#version":"1",
"#timestamp":"2014-08-11T02:07:26.000Z",
"host":"127.0.0.1:49558",
"type":"apache-error",
"loglevel":"error",
"clientip":"123.12.12.12",
"errormsg":"htaccess: require valid-user not given a valid session, are you using lazy sessions?",
"timestamp_submitted":"2014-08-14 19:25:11 UTC",
"geoip":{"ip":"123.12.12.12"
"country_code2":"US",
"country_code3":"USA",
"country_name":"United States",
"continent_code":"NA",
"latitude":38.0,
"longitude":-97.0,
"dma_code":0,
"area_code":0,
"location":[-97.0,38.0]
}
}
}
i want to write a query where country_code2 in geoip is equal to US
when i try to run a query on geoip.ip the query is executing perfectly , works fine even for geo.latitude but when i try to run for geo.country_code2 i am not getting any results .
below is the query i am using
curl -XGET http://abcd.dev:9200/_search?pretty=true -d
{
"query":{
"term":{
"geoip.ip":"123.12.12.12"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2014-08-12","lte":"now
}
}
}
}
the actually query for which the i am not getting any results
curl -XGET http://abcd.dev:9200/_search?pretty=true -d
{
"query":{
"term":{
"geoip.country_code2":"US"
}
},
"filter":{
"range":{
"#timestamp":{
"gte":"2014-08-12","lte":"now
}
}
}
}
Use a match query, instead of a term query.
{
"query":{
"match":{
"geoip.country_code2":"US"
}
}
}
The match query will use the same analyzer to search on the field as what was used to index it.

Elasticsearch get the last ten added events

I have an index with multiple types, one of these being event and I would like to get the last 10 events sorted by their start date
{
"from":0,
"size":10,
"query":{
"range":{
"start":{
"from":"2014-02-25 00:00:01 UTC",
"to":"2014-03-04 23:59:00 UTC"
}
}
},
"filter" :{
"and": [
{
"type": {
"value": "event"
}
}
]
},
"sort":[
{ "start":
{"order":"asc"}
}
]
}
I have tried variations of the above query but cannot seem to get it working, elastic-search does not apply the type filter
the filter syntax above is correct (the and is not needed).
if you are just interested in events you might as well just query their endpoint (like localhost:9200/idx/event/_search)
In fact if you want to use the 'type' in your query, you have to do use the '_type' name with the underscore. This here is an example:
POST /items/_search
{
"query": {
"match": {
"_type": "item"
}
}
}

Resources