Difference between elasticsearch queries - elasticsearch

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.

The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Related

Elasticsearch "boost" not working when inside "filter"

I'm trying to boost matches on a certain field over another.
This works fine:
{
"query": {
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
}
When i see the documents matched on mainField, i see they have a _score of 2.0 as expected.
But when i wrap this same query in a filter:
{
"query": {
"bool": {
"filter": [
{
"bool": {
"should": [
{
"terms": {
"boost": 2,
"mainField": "foo"
}
},
{
"terms": {
"otherField": "foo"
}
}
]
}
}
]
}
}
}
The _score for all documents is 0.0.
The same thing happens for multi_match. By itself (e.g inside a query) it works fine, but inside a bool + filter, it doesn't work.
Can someone explain why this is the case? I need to wrap in a filter due to the way my app composes queries.
Some context might also help: I'm trying to return documents that match on either mainField or otherField, but sort the ones matching on mainField first, so i figured boost would be the most appropriate choice here. But let me know if there is a better way.
The filter queries are always executed in the filter context. It will always return a score of zero and only contribute to the filtering of documents.
Refer to this documentation, to know more about filter context
Due to this, you are not getting a _score of 2.0, even after applying boost, in the second query

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Elastic Search Filter performing much slower than Query

As my ES index/cluster has scaled up (# ~2 billion docs now), I have noticed more significant performance loss. So I started messing around with my queries to see if I could squeeze some perf out of them.
As I did this, I noticed that when I used a Boolean Query in my Filter, my results would take about 3.5-4 seconds to come back. But if I do the same thing in my Query it is more like 10-20ms
Here are the 2 queries:
Using a filter
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[{"match_all":{}}]}},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
Using a query
POST /backup/entity/_search?routing=39cd0b95-efc3-4eee-93d1-93e6f5837d6b
{
"query": {"bool":{"should":[],"must":[
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]}}
}
Like I said, the second method where I don't use a Filter at all takes mere milliseconds, while the first query takes almost 4 seconds. This seems completely backwards from what the documentation says. They say that the Filter should actually be very quick and the Query should be the one that takes longer. So why am I seeing the exact opposite here?
Could it be something with my index mapping? If anyone has any idea why this is happening I would love to hear suggestions.
Thanks
The root filter element is actually another name for post_filter element. Somehow, it was supposed to be removed (the filter) in ES 1.1 but it slipped through and exists in 2.x versions as well.
It is removed completely in ES 5 though.
So, your first query is not a "filter" query. It's a query whose results are used afterwards (if applicable) in aggregations, and then the post_filter/filter is applied on the results. So you basically have a two steps process in there: https://www.elastic.co/guide/en/elasticsearch/reference/1.5/search-request-post-filter.html
More about its performance here:
While we have gained cacheability of the tag filter, we have potentially increased the cost of scoring significantly. Post filters are useful when you need aggregations to be unfiltered, but hits to be filtered. You should not be using post_filter (or its deprecated top-level synonym filter) if you do not have facets or aggregations.
A proper filter query is the following:
{
"query": {
"filtered": {
"query": {
"bool": {
"should": [],
"must": [
{
"match_all": {}
}
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"serviceId": "39cd0b95-efc3-4eee-93d1-93e6f5837d6b"
}
},
{
"term": {
"subscriptionId": "3eb5021e-2f1d-4292-9fd5-95788ebfafa0"
}
},
{
"term": {
"subscriptionType": 0
}
},
{
"terms": {
"entityType": [
"4"
]
}
}
]
}
}
}
}
}
A filter is faster. Your problem is that you include the match_all query in your filter case. This matches on all 2 billion of your documents. A set operation has to then be done against the filter to cull the set. Omit the query portion in your filter test and you'll see that the results are much faster.

Must match multiple values

I have a query that works fine when I need the property of a document
to match just one value.
However I also need to be able to search with must with two values.
So if a banana has id 1 and a lemon has id 2 and I search for yellow
I will get both if I have 1 and 2 in the must clause.
But if i have just 1 I will only get the banana.
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match":
{ "fruit.color": "yellow" }}
],
"must" : [
{ "match": { "fruit.id" : "1" } }
]
}
}
}
I havenĀ“t found a way to search with two values with must.
is that possible?
If the document "must" be returned only if the id is 1 or 2, that sounds like another should clause. If I'm understanding your question properly, you want documents with either id 1 OR id 2. Additionally, if the color is yellow, give it a higher score.
Here's one way you might achieve what you're looking for:
{
"query": {
"bool": {
"should": {
"match": {
"fruit.color": "yellow"
}
},
"must": {
"bool": {
"should": [
{
"match": {
"fruit.id": "1"
}
},
{
"match": {
"fruit.id": "2"
}
}
]
}
}
}
}
}
Here I put the two match queries in the should clause of a separate bool query. This achieves the OR behavior you are looking for.
Have another look at the Bool Query documentation and take note of the nuances of should. It behaves differently by default depending on whether or not there is a sibling must clause and whether or not the bool query is being executed in filter context.
Another key option that is adjustable and can help you achieve your expected results is the minimum_should_match parameter. Have a look at this documentation page.
Instead of a match query, you could simply try the terms query for ORing between multiple terms.
Match queries are generally used for analyzed fields. For exact matching, you should use term queries
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [
{ "match": { "fruit.color": "yellow" } }
],
"must" : [
{ "terms": { "fruit.id": ["1","2"] } }
]
}
}
}
term or terms query is the perfect way to fetch the exact text or id, using match query result in search inside the id or text
Ex:
id = '4'
id = '44'
Search using match query with id = 4 return both 4 & 44 since it matches 4 in both. This is where terms query come into play.
same search using terms query will return 4 only.
So the accepted is absolutely wrong. Use the #Rahul answer. Just one more thing you need to do, Instead of text you need to analyse the field as a keyword
Example for indexing a field both as a text and keyword (mapping is for flat level for nested change it accordingly).
{
"index_patterns": [ "test" ],
"mappings": {
"kb_mapping_doc": {
"_source": {
"enabled": true
},
"properties": {
"id": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}
}
using #Rahul's answer doesn't worked because you might be analysed as a text.
id - access a text field
id.keyword - access a keyword field
it would be
{
"from": 0,
"size": 20,
"query": {
"bool": {
"should": [{
"match": {
"color": "yellow"
}
}],
"must": [{
"terms": {
"id.keyword": ["1", "2"]
}
}]
}
}
}
So I would say accepted answer will return falsy results Please use #Rahul's answer with the corresponding mapping.

Elasticsearch Nested Filters being inclusive vs. exclusive

I have an object mapping that uses nested objects (props in our example) in a tag-like fashion.
Each tag can belong to a client/user and when we want to allow our users to generate query_string style searches against the props.name.
Issue is that when we run our query if an object has multiple props and if one of the many props match the filter when others don't the object is returned, when we want the opposite - if one returns false don't return vs. if one returns true return.
I have posted a comprehensive example here: https://gist.github.com/d2kagw/1c9d4ef486b7a2450d95
Thanks in advance.
I believe here you might need the advantage of a flattened list of values, like an array of values. The major difference between an array and nested objects is that the latter "knows" which value of a nested property corresponds to another value of another property in the same nested object. The array of values, on the other hand will flatten the values of a certain property and you lose the "association" between a client_id and a name. Meaning, with arrays you have props.client_id = [null, 2] and props.name = ["petlover", "premiumshopper"].
With your nested filter you want to match that string to all values for props.name meaning ALL nested props.names of one parent doc needs to match. Well, this doesn't happen with nested objects, because the nested documents are separate and are queried separately. And, if at least one nested document matches then it's considered a match.
In other words, for a query like "query": "props.name:(carlover NOT petlover)" you basically need to run it against a flattened list of values, just like arrays. You need that query ran against ["carlover", "petlover"].
My suggestion for you is to make your nested documents "include_in_parent": true (meaning, keep in parent a flattened, array-like list of values) and change a bit the queries:
for the query_string part, use the flattened properties approach to be able to match your query for a combined list of elements, not element by element.
for the match (or term, see below) and missing parts use the nested properties approach because you can have nulls in there. A missing on an array will match only if the whole array is missing, not one value in it, so here one cannot use the same approach as for the query, where the values were flattened in an array.
optional, but for the query match integer I would use term, as it's not string but integer and is by default not_analyzed.
These being said, with the above changes, these are the changes:
{
"mappings" : {
...
"props": {
"type": "nested",
"include_in_parent": true,
...
should (and does) return zero results
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "props.name:((carlover AND premiumshopper) NOT petlover)" }
}
},
{
"nested": {
"path": "props",
"filter": {
"or": [ { "query": { "match": { "props.client_id": 1 } } }, { "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 1
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{"query": {"query_string": { "query": "props.name:(carlover NOT petlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "match": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 2
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{ "query": {"query_string": { "query": "props.name:(* NOT carlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "term": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } }
]
}
}
}
]
}
}
}
}

Resources