Compare query with and without score calculation - elasticsearch

I would like to know if it is possible to disable score calculation for should types of queries or maybe it is possible to have an OR for filter context?
ES version: 6+
For example:
this query will search matches in either records OR voIds and will have score calculation
POST customers/_search
{
"size": 10000,
"version": true,
"query": {
"bool": {
"should": [
{
"terms": {
"voIds": [
78031203, ...
]
}
},
{
"terms": {
"records.keyword": [
"S3G82U", ....
]
}
}
]
}
}
}
this query will filter documents that match in both records AND voIds and will not have score calculation. not what I need because it uses AND
POST customers/_search
{
"size": 10000,
"version": true,
"query": {
"bool": {
"filter": [
{
"terms": {
"voIds": [
78031203
]
}
},
{
"terms": {
"records.keyword": [
"S3G82U"
]
}
}
]
}
}
}
The goal for me to troubleshoot performance of the same queries with and without score. So I have first query that has score. how to write second query without score?
Thanks.

This is not possible. And I don't see much use case functionality wise. Are you seeing slowness in elasticsearch or query itself?

You can't disable scoring compltely. But you can disable query coordination. Not sure how much it helps performance wise if at all.

Related

Elasticsearch - filter conditions order

Can you tell me please if the conditions in Elasticsearch filter are evaluated in the order as they are in the request json or if Elasticsearch will make some optimization in it?
I have a query like:
{
"sort": {
"publishDate": "desc"
},
"query": {
"bool": {
"filter": [
{
"range": {
"publishDate": {
"lte": "2018-10-26",
"gt": "2018-08-31"
}
}
},
{
"terms": {
"ico": [
31322832,
34444444
]
}
}
]
}
}
}
and I think the optimal order of filters when evaluating is terms first and range next. So what happens in Elasticsearch? Filters will be evaluated in request order or will be optimized? Also if somebody knows how is it in Elasticsearch 2?
Thanks.
Check out this article about execution order of filters and queries, it is really great. I hope it help you ES execution order

Elasticsearch random scoring with filters

I've got a situation, using ES 6.5, where I can either perform a bool or a random_score, but not both at the same time. Here is one of those potential queries:
{
"from": 0,
"size": 50,
"query": {
"function_score": {
"random_score": {
"seed": 10,
"field": "_seq_no"
}
},
"bool": {
"filter": [
{
"terms": {
"primary_category": [
"foobar"
]
}
},
{
"terms": {
"primary_type": [
"barbaz"
]
}
}
]
}
}
}
If I were to remove either the function_score block or the bool block, the query works, but in combination, it does not:
[function_score] malformed query, expected [END_OBJECT] but found [FIELD_NAME]
Am I missing something about the example at: https://www.elastic.co/guide/en/elasticsearch/reference/6.2/query-dsl-function-score-query.html#function-random
All I want to do is "randomly sort" my results in a predictable way which will work across pagination, etc. Really I am just trying to display the filtered results with high variance, as any sort of standard sorting will create patterns in the result which I am trying to avoid.
Any help would be appreciated, and I'll keep tinkering with it.
I figured it out. The function_score should be part of the bool block.

Elastic Search Filter performing much slower than Query

As my ES index/cluster has scaled up (# ~2 billion docs now), I have noticed more significant performance loss. So I started messing around with my queries to see if I could squeeze some perf out of them.
As I did this, I noticed that when I used a Boolean Query in my Filter, my results would take about 3.5-4 seconds to come back. But if I do the same thing in my Query it is more like 10-20ms
Here are the 2 queries:
Using a filter
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[{"match_all":{}}]}},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
Using a query
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]}}
}
Like I said, the second method where I don't use a Filter at all takes mere milliseconds, while the first query takes almost 4 seconds. This seems completely backwards from what the documentation says. They say that the Filter should actually be very quick and the Query should be the one that takes longer. So why am I seeing the exact opposite here?
Could it be something with my index mapping? If anyone has any idea why this is happening I would love to hear suggestions.
Thanks
The root filter element is actually another name for post_filter element. Somehow, it was supposed to be removed (the filter) in ES 1.1 but it slipped through and exists in 2.x versions as well.
It is removed completely in ES 5 though.
So, your first query is not a "filter" query. It's a query whose results are used afterwards (if applicable) in aggregations, and then the post_filter/filter is applied on the results. So you basically have a two steps process in there: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-post-filter.html
More about its performance here:
While we have gained cacheability of the tag filter, we have potentially increased the cost of scoring significantly. Post filters are useful when you need aggregations to be unfiltered, but hits to be filtered. You should not be using post_filter (or its deprecated top-level synonym filter) if you do not have facets or aggregations.
A proper filter query is the following:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [],
"must": [
{
"match_all": {}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
}
}
A filter is faster. Your problem is that you include the match_all query in your filter case. This matches on all 2 billion of your documents. A set operation has to then be done against the filter to cull the set. Omit the query portion in your filter test and you'll see that the results are much faster.

Can _score from different queries be compared?

In my application, I issue multiple queries, each of which to a different index. Then, I merge the results from these queries, and sort them using the _score attribute, in order to rank them according to their relavance. But I wonder if this makes sense at all, since the results came from different queries?
I guess my question is: can _scores from different queries be compared?
Instead of issuing multiple queries , it would be a good idea to club them together in a single query.
You can use index query to do index specefic operation.
So something like
{
"bool": {
"should": [
{
"indices": {
"indices": [
"index1"
],
"query": {
"term": {
"tag": "wow"
}
}
}
},
{
"indices": {
"indices": [
"index2"
],
"query": {
"term": {
"name": "laptop"
}
}
}
}
]
}
}
Once this is done , results would be sorted based on the _score.
Hope that helps.

Is there a way to have elasticsearch return a hit per generated bucket during an aggregation?

right now I have a query like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
},
"aggs": {
"query": {
"terms": [
{
"field": "query",
"size": 3
}
]
}
}
}
The aggregation works perfectly well, but I can't seem to find a way to control the hit data that is returned, I can use the size parameter at the top of the dsl, but the hits that are returned are not returned in the same order as the bucket so the bucket results do not line up with the hit results. Is there any way to correct this or do I have to issue 2 separate queries?
To expand on Filipe's answer, it seems like the top_hits aggregation is what you are looking for, e.g.
{
"query": {
... snip ...
},
"aggs": {
"query": {
"terms": {
"field": "query",
"size": 3
},
"aggs": {
"top": {
"top_hits": {
"size": 42
}
}
}
}
}
}
Your query uses exact matches (match and range) and binary logic (must, bool) and thus should probably be converted to use filters instead:
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"uuid": "xxxxxxx-xxxx-xxxx-xxxxx-xxxxxxxxxxxxx"
}
},
{
"range": {
"date": {
"from": "now-12h",
"to": "now"
}
}
}
]
}
}
As for the aggregations,
The hits that are returned do not represent all the buckets that were returned. so if have buckets for terms 'a', 'b', and 'c' I want to have hits that represent those buckets as well
Perhaps you are looking to control the scope of the buckets? You can make an aggregation bucket global so that it will not be influenced by the query or filter.
Keep in mind that Elasticsearch will not "group" hits in any way -- it is always a flat list ordered according to score and additional sorting options.
Aggregations can be organized in a nested structure and return computed or extracted values, in a specific order. In the case of terms aggregation, it is in descending count (highest number of hits first). The hits section of the response is never influenced by your choice of aggregations. Similarly, you cannot find hits in the aggregation sections.
If your goal is to group documents by a certain field, yes, you will need to run multiple queries in the current Elasticsearch release.
I'm not 100% sure, but I think there's no way to do that in the current version of Elasticsearch (1.2.x). The good news is that there will be when version 1.3.x gets released:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

Resources