Max Clause Count in ElasticSearch / Rewrite queries - elasticsearch

I know that ElasticSearch has an internal limit on how many clauses you can use in a bool query. This is controlled by the max_clause_count in the ElasticSearch.yml file.
But I thought that this limit did not apply to the values that were passed in the searches
So a query like the following would work, with more than 1024 values in the
terms query
{
"query":{
"bool":{
"should":[
{ "terms": {"id": ["cafe-babe-0000","cafe-babe-0001",... ]}}
]
}
}
}
But this query will launch a TooManyClauses Exception. So, in this case, the
number of values in the query also counts for this limit. Is it correct?
Also, I now that it's not the best way to perform this kind of queries, but
Is it possible to rewrite the previous query so that the limit is not exceeded?

You can use the ids query.
"query": {
"ids": {
"values": [ "cafe-babe-0000","cafe-babe-0001",... ]
}
}
For the best of i know there is no limitation on this query.

Related

Elasticsearch sort by terms values' order

I'm connecting my recommendation service with product service. The recommendation service, no matter what the parameters are, always returns a list of product ID sorted by relevancy. Example:
["ID1", "ID2", "ID3"]
The product service owns Elasticsearch indices that store the details of the products. The client expects the data of the recommended products along with the product details ordered by the relevancy. Hence I'm using this search query:
{
"query":{
"bool":{
"filter":[
{
"terms": {
"product_id": ["ID1", "ID2", "ID3"]
}
}
]
}
}
}
The problem is the result from that query is not sorted by the terms values' order. What changes can I make to achieve the goals?
P.S.: Any advice or reference in Elasticsearch index design, services' response format, or the system design for recommendation system would be much welcomed.
The terms query functions as an OR filter that scores the matches in a bool manner (true -> 1, false -> 0).
Having said that, you could generate a similar OR query via a query_string query that'd boost the individual IDs, thus increase their score, and consequently sort them higher:
{
"query":{
"bool":{
"should": [
{
"query_string": {
"default_field": "product_id",
"query": "ID1^3 OR ID2^2 OR ID3^1"
}
}
],
"filter":[
{
"terms": {
"product_id": ["ID1", "ID2", "ID3"]
}
}
]
}
}
}
The boost values above can of course be dynamically changed to account for the varying length of the list of IDs.

Aggregation after sorting and limit in Elastic Search 5.6

I have to do a aggregation on Elasticsearch documents after sorting the results and picking top n from it.
I tried to do this:
{
"size":1,
"query":{
"bool":{
"must":[
{
"terms":{
"name.keyword":[
"some_name"
]
}
},
{
"exists":{
"field":"3g_duration_count"
}
}
]
}
},
"sort":[
{
"tmst":{
"order":"desc"
}
}
],
"aggs":{
"fieldNameAgg":{
"avg":{
"field":"3g_duration_count"
}
}
}
}
Here the fetching of top n results is happening after aggregation (which makes no sense), I want to pick top n records based on the sort criteria and then apply aggregation. How do I achieve this?
I am using Elasticsearch 5.6.
Is there a way that I can assign the results of the inner query along with the sort and limit clauses to a child query and then apply the avg aggregator on top of that ? In that way I can ensure the limit is applied before the aggregation is happening .
An equivalent sql query might look like this :
select avg(field_value) from (select field_value from t1 order by tmst desc fetch first n rows ) t2
Is this something possible to accomplish in ElasticSearch 5.6 ?

aggregration to return all values not do group by

can aggregatin return all values? is there any way to do this with scripts?
{
"size": 0,
"_source":["docDescription","datasource"],
"query": {
"match_all":{}
},
"aggs":{
"projectNameMatchCount": {
"filter" : { "match": { "docDescription": ".ppt" } },
"aggs":{
"names":{
"terms":{"field":"_id"}
}
}
},
"datasourceSourceMatchCount": {
"filter" : { "match": { "datasource": "NGA" } }
}
}
}
in aggeration projectMatchCount, I am applying filter , and call other aggegration, to return the values, but term will do a group by, I don't want group by, all I want is return the field values
Aggregations are for grouping together data sets to drive a certain metric. If you want individual elements to be returned, you should run direct queries/filter instead. Aggregations are post processes which runs on the data set narrowed down by your query and comparatively expensive than your queries/filter. So, they should be avoided till you need aggregated metrics.
Having said that, from what I understood from your query is that you are using two aggregations. You want one to return some document IDs and the other to just return a count based on a different filter. It is possible to do so by making use of top-hits aggregation within the filter aggregation in projectNameMatchCount. For more details: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html
But still, I believe you will benefit more by simply making two separate queries in terms of total query time and the resources consumed at ElasticSearch side, one with a query to return the IDs and the other with aggregation to return the count of docs.

Elasticsearch term query does not give any results

I am very new to Elasticsearch and I have to perform the following query:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"title":"Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
According to the Elasticsearch API, it is equivalent to pseudo-SQL:
SELECT document
FROM book-lists
WHERE title = "Sociology"
AND idOwner = 17xxxxxxxxxxxx45
The problem is that my document looks like this:
{
"_index":"book-lists",
"_type":"book-list",
"_id":"AVBRSvHIXb7carZwcePS",
"_version":1,
"_score":1,
"_source":{
"title":"Sociology",
"books":[
{
"title":"The Tipping Point: How Little Things Can Make a Big Difference",
"isRead":true,
"summary":"lorem ipsum",
"rating":3.5
}
],
"numberViews":0,
"idOwner":"17xxxxxxxxxxxx45"
}
}
And the Elasticsearch query above doesn't return anything.
Whereas, this query returns the document above:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"term":{
"numberViews":"0"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
This makes me suspect that the fact that "title" is the same name for the two fields is for something.
Is there a way to fix this without having to rename any of the fields. Or am I missing it somewhere else?
Thanks for anyone trying to help.
Your problem is described in the documentation.
I suspect that you don't have any explicit mapping on your index, which means elasticsearch will use dynamic mapping.
For string fields, it will pass the string through the standard analyzer which lowercases it (among other things). This is why your query doesn't work.
Your options are:
Specify an explicit mapping on the field so that it isn't analyzed before storing in the index (index: not_analyzed).
Clean your term query before sending it to elasticsearch (in this specific query lowercasing will work, but note that the standard analyzer also does other things like remove stop words, so depending on the title you may still have issues).
Use a different query type (e.g., query_string instead of term which will analyze the query before running it).
Looking at the sort of data you are storing you probably need to specify an explicit not_analyzed mapping.
For option three your query would look something like this:
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"query_string":{
"fields": ["title"],
"analyzer": "standard",
"query": "Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
Note that the query_string query has special syntax (e.g., OR and AND are not treated as literals) which means you have to be careful what you give it. For this reason explicit mapping with a term filter is probably more appropriate for your use case.
I have described this issue in this blog.
The issue is coming due to default tokenization in Elasticsearch.
In the same , I have outlined 2 solutions.
One is enabling not_analyzed flag on the required field and other is to use keyword tokenizer.
To expand on solarissmoke's solution, while the contents of that field will be passed through the standard analyzer, your query will not. If you refer to the Elasticsearch documentation on the term query, you will see that term queries are not analyzed.
The match query is probably more appropriate for your case. What you query will be analyzed in the same way as the contents of the title field by default. The query_string query brings a lot more to the table and you should review the documentation if you plan on using that.
So again pretty much what you had with the small tweak:
GET book-lists/book-list/_search
{
"query":{
"filtered":{
"filter":{
"bool":{
"must":[
{
"match":{
"title":"Sociology"
}
},
{
"term":{
"idOwner":"17xxxxxxxxxxxx45"
}
}
]
}
}
}
}
}
It is important to note passing lowercase version of the terms to the term query (hack - does not seem like a good idea given what solarissmoke describe about the other features of the Standard analyzer like the stop filter), using the query_string query, or using the match query is still very different from the SQL query you described:
SELECT document
FROM book-lists
WHERE title = "Sociology"
AND idOwner = 17xxxxxxxxxxxx45
With those Elasticsearch queries, you can match records where idOwner might be the same but title might be something like "Another Sociology Title" which is different from what you would expect with that SQL. Here is some great stuff from the documentation and another stackoverflow post that will elaborate on what was going on, where term queries and filters are appropriate, and getting exact matches:
Elasticsearch : Finding Exact Values
Stackoverflow : Exact (not substring) matching in Elasticsearch

Elastic Search SUM of aggregated values

We are using elastic search to get some statics.
I need to get average values for each group.
Sum all this values
So far, step no. 1 was pretty straight forward. However I really don't know how to sum all values at the end. Is this possible? If yes, how?.
Thanks for suggestions.
Here is my aggs query >
{
"query":{
"filtered":{
"query":{
"query_string":{
"analyze_wildcard":true,
"query":"*"
}
}
}
},
"aggs":{
"2":{
"terms":{
"field":"person",
"size":5000,
"order":{
"1":"desc"
}
},
"aggs":{
"1":{
"avg":{
"field":"company"
}
}
}
}
}
}
Aggregating over aggregation results are not yet supported in elasticsearch. Apparently there is a concept called reducers that are being developed for 2.0. I would suggest having a look at scripted metric aggregations. Basically, you can create your own aggregation by controlling the collection and computation aspects yourself using scripts.
Alternatively, if possible you can precompute and store the average when indexing and then use the sum aggregation when querying.
Have a look at the following question for an example of this aggregation: Elasticsearch: Possible to process aggregation results?

Resources