Elasticsearch increase term limit - elasticsearch

I have this as part of a bool query
"should": [
{
"term": {
"column1.raw": "value1"
}
},
{
"term": {
"column1.raw": "value2"
}
}
]
Here the term section will repeat multiple times as per the value of an array which I am using to calculate it. If I limit this array to 500 it is working. But it is not working if the array count is greater than 1000. How can I increase this limit ? most of the times, I have above 4000 values in the array and so the term filter need to repeat for 4000 times. Is there any alternative method of doing the same ?

Have a look at
http://george-stathis.com/2013/10/18/setting-the-booleanquery-maxclausecount-in-elasticsearch/
You can set the
index.query.bool.max_clause_count: n
parameter in the elasticsearch.yml file, where n is the number of terms you want to allow.

Related

Is there a way to specify percentage value in ES DSL Sampler aggregation

I am trying to do a sum aggregation on a certain sample of data, I want to get the sum of costs (field) of only the top 25% records (with the highest cost).
I know I have an option to run a sampler aggregation which can help me achieve this, but there I need to pass the exact number of records on which I want to run the sampler aggregation.
{
"aggs": {
"sample": {
"sampler": {
"shard_size": 300
},
"aggs": {
"total_cost": {
"sum": {
"field": "cost"
}
}
}
}
}
}
But is there a way to specify a percentage instead of an absolute number here, because in my case the total number of document changes pretty regularly and I need to get the top 25% (costliest).
How I get it today is by doing 2 queries
first to get the total number of records
divide the number by 4 and do the sampler query with that number (also I have added a descending sort for the cost field, which is not shown in the query above)

Elasticsearch "size" value not working in terms aggregation with partitions

I am trying to paginate over a specific field using the terms aggregation with partitions.
The problem is that the number of returned terms for each partition is not equal to the size parameter that I set.
These are the steps that I am doing:
Retrieve the number of different unique values for the field with "cardinality" aggregation.
In my data, the result is 21.
From the web page, the user wants to display a table with 10 items per page.
if unique_values % page_size != 0:
partitions_number = (unique_values // page_size) + 1
else:
partitions_number = (unique_values // page_size)
Than I am making this simple query:
POST my_index/_search?pretty
{
"size": 0,
"query": {
"bool": {
"filter": [
{
"match": {
"field_to_paginate": "foo"
}
}
]
}
},
"aggs": {
"by_pchostname": {
"terms": {
"size": 10,
"field": "field_to_paginate",
"include": {
"partition": 0,
"num_partitions": 3
}
}
}
}
}
I am expecting to retrieve 10 results. But if I run the query I have only 7 results.
What am I missing here? Do I need to use a different solution here?
As a side note, I can't use composite aggregation because I need to sort results by doc_count over the whole dataset.
Partitons in terms aggregation divide the values in equal chunks.
In your case no of partition num_partitions is 3 so 21/3 == 7.
Partitons are meant for getting large values in the order of 1000 s.
You may be able to leverage shard_size parameter. My suggestion is to read this part of manual and work with the shard_size param
Terms aggregation does not allow pagination. Use composite aggregation instead (requires ES >= 6.1.0). Below is the quote from reference docs:
If you want to retrieve all terms or all combinations of terms in a
nested terms aggregation you should use the Composite aggregation
which allows to paginate over all possible terms rather than setting a
size greater than the cardinality of the field in the terms
aggregation. The terms aggregation is meant to return the top terms
and does not allow pagination.

Elasticsearch query to find range overlap

Let's say I have the following indexed document:
{
"field1": [400, 800]
}
I want to create a query using 2 search parameters (min_val = 300 and max_val = 500) to select documents where these two ranges overlaps.
In my example, the above document should be selected, as we can see:
300 500
[======================]
[=====================]
400 800
What is the most efficient way to find documents that overlap two numeric ranges?
I can make it using multiple comparisons, and many ands and ors, but I'm looking for a simpler and efficient way to achieve this.
In ES, a range of numbers like you have for field1 is not actually a range but simply two distinct values, namely 400 and 800. All you have to do is to use a simple range query and compare field1 with the lower and upper bound of the range, i.e.
The range [300, 500] should include either 400 or 800
Expressed with the DSL, you end up with a single range query like this one:
{
"query": {
"range": {
"field1": {
"gte": 300,
"lte": 500
}
}
}
}

Can Elasticsearch do a decay search on the log of a value?

I store a number, views, in Elasticsearch. I want to find documents "closest" to it on a logarithmic scale, so that 10k and 1MM are the same distance (and get scored the same) from 100k views. Is that possible?
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#exp-decay describes field value factor and decay functions but can they be "stacked"? Is there another approach?
I'm not sure if you can achieve this directly with decay, but you could easily do it with the script_score function. The example below uses dynamic scripting, but please be aware that using file-based scripts is the recommended, far more secure approach.
In the query below, the offset parameter is set to 100,000, and documents with that value for their 'views' field will score the highest. Score decays logarithmically as the value of views departs from offset. Per your example, documents with 1,000,000 and/or 10,000 have identical scores (0.30279312 in this formula).
You can invert the order of these results by changing the beginning of the script to multiply by _score instead of divide.
$ curl -XPOST localhost:9200/somestuff/_search -d '{
"size": 100,
"query": {
"bool": {
"must": [
{
"function_score": {
"functions": [
{
"script_score": {
"params": {
"offset": 100000
},
"script": "_score / (1 + ((log(offset) - log(doc['views'].value)).abs()))"
}
}
]
}
}
]
}
}
}'
Note: you may want to account for the possibility of 'views' being null, depending on your data.

In ElasticSearch is there is limit to the number of items in a terms query?

In the ES docs it lists this sample query:
{
"terms": {
"tags": [
"blue",
"pill"
],
"minimum_should_match": 1
}
}
Is there a limit (or a practical limit) on the number of items I could put in the list of possible strings to search for? Could I have a hundred items here?
Yaa, you can put thousands of item there(i've tested).. just follow the syntax. Then you are ok.

Resources