Elasticsearch Nested Filters being inclusive vs. exclusive - elasticsearch

I have an object mapping that uses nested objects (props in our example) in a tag-like fashion.
Each tag can belong to a client/user and when we want to allow our users to generate query_string style searches against the props.name.
Issue is that when we run our query if an object has multiple props and if one of the many props match the filter when others don't the object is returned, when we want the opposite - if one returns false don't return vs. if one returns true return.
I have posted a comprehensive example here: https://gist.github.com/d2kagw/1c9d4ef486b7a2450d95
Thanks in advance.

I believe here you might need the advantage of a flattened list of values, like an array of values. The major difference between an array and nested objects is that the latter "knows" which value of a nested property corresponds to another value of another property in the same nested object. The array of values, on the other hand will flatten the values of a certain property and you lose the "association" between a client_id and a name. Meaning, with arrays you have props.client_id = [null, 2] and props.name = ["petlover", "premiumshopper"].
With your nested filter you want to match that string to all values for props.name meaning ALL nested props.names of one parent doc needs to match. Well, this doesn't happen with nested objects, because the nested documents are separate and are queried separately. And, if at least one nested document matches then it's considered a match.
In other words, for a query like "query": "props.name:(carlover NOT petlover)" you basically need to run it against a flattened list of values, just like arrays. You need that query ran against ["carlover", "petlover"].
My suggestion for you is to make your nested documents "include_in_parent": true (meaning, keep in parent a flattened, array-like list of values) and change a bit the queries:
for the query_string part, use the flattened properties approach to be able to match your query for a combined list of elements, not element by element.
for the match (or term, see below) and missing parts use the nested properties approach because you can have nulls in there. A missing on an array will match only if the whole array is missing, not one value in it, so here one cannot use the same approach as for the query, where the values were flattened in an array.
optional, but for the query match integer I would use term, as it's not string but integer and is by default not_analyzed.
These being said, with the above changes, these are the changes:
{
"mappings" : {
...
"props": {
"type": "nested",
"include_in_parent": true,
...
should (and does) return zero results
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "props.name:((carlover AND premiumshopper) NOT petlover)" }
}
},
{
"nested": {
"path": "props",
"filter": {
"or": [ { "query": { "match": { "props.client_id": 1 } } }, { "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 1
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{"query": {"query_string": { "query": "props.name:(carlover NOT petlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "match": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 2
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{ "query": {"query_string": { "query": "props.name:(* NOT carlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "term": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } }
]
}
}
}
]
}
}
}
}

Related

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Elasticsearch: Find all parents where all children have a property with a specific value

I have the current schema:
Object parent with property date and n children child
Object child contains a single property foo
I want to retrieve all parent where all child have their property foo equals to 0
I tried different approach but whatever I do, some parent are retrived while one child has the property foo at 1
Example of my query:
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"lt": "2018-07-05T10:00:00.000Z"
}
}
},
{
"nested": {
"path": "child",
"query": {
"bool": {
"must": {
"term": {
"child.foo": 0
}
}
}
}
}
}
]
}
}
}
I tried with should, match, range... even must_not/should_not. I also tried filtering without any luck, I keep getting hits with the foo property at 1.
I also tried the aggregation path but I don't understand how to apply it to my need.
EDIT: I looked at the possible duplicate. While it did not answer my question, it put me on the right track. My issue was that I was thinking in a SQL way, with joins and such. While I should have thought in the elastic way.
Thus, what I wanted to do could not be done. What I needed was to look for parent where at least one child had the foo property at 1 or more. Then, ignore these results and take the others. Thus the answer is simple: I just add to change the must of the nested query by a must_not and that was it!
As I explained in the edit of my question, the answer is fairly easy once you start thinking in the right way./
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"lt": "2018-07-05T10:00:00.000Z"
}
}
},
{
"nested": {
"path": "child",
"query": {
"bool": {
"must_not": {
"term": {
"child.foo": 0
}
}
}
}
}
}
]
}
}
}
The trick was just to set a must_not

Boolean AND with exact matches oin Elasticsearch

In our Elasticsearch collection of products, we have an an array of hashes, called "nutrients". A partial example of the data would be:
"_source": {
"quantity": "150.0",
"id": 1001,
"barcode": "7610809001066",
"nutrients": [
{
"per_hundred": "1010.0",
"name_fr": "Énergie",
"per_portion": "758.0",
"name_de": "Energie",
"per_day": "9.0",
"name_it": "Energia",
"name_en": "Energy"
},
{
"per_hundred": "242.0",
"name_fr": "Énergie (kCal)",
"per_portion": "181.0",
"name_de": "Energie (kCal)",
"per_day": "9.0",
"name_it": "Energia (kCal)",
"name_en": "Energy (kCal)"
},
{
"per_hundred": "18.0",
"name_fr": "Matières grasses",
"per_portion": "13.5",
"name_de": "Fett",
"per_day": "19.0",
"name_it": "Grassi",
"name_en": "Fat"
},
In the search, we are trying to bring back the products based on an exact match of two of the fields contained in the nutrients array. What I am finding is the conditions seemed to be OR and not AND.
The two attempts have been:
"query": {
"bool": {
"must": [
{ "match": { "nutrients.name_fr": "Énergie" } },
{ "match": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
and
"query": {
"filtered": {
"filter": {
"and": [
{ "term": { "nutrients.name_fr": "Énergie" } },
{ "term": { "nutrients.per_hundred": "242.0" } }
]
}
}
}
Both of these are in fact bringing back entries with Énergie and 242.0, but are also match on different name_fr, eg:
{
"per_hundred": "242.0",
"name_fr": "Acide folique",
"per_portion": "96.0",
"name_de": "Folsäure",
"per_day": "48.0",
"name_it": "Acido folico",
"name_en": "Folic acid"
},
They are also matching on a non exact match, i.e: matching also on "Énergie (kCal)" when we want to match only on "Énergie"
On your first problem:
You have to make the nutrients field nested, so you can query each object inside it for itself Elasticsearch Nested Objects.

Elasticsearch terms query on array of values

I have data on ElasticSearch index that looks like this
{
"title": "cubilia",
"people": [
"Ling Deponte",
"Dana Madin",
"Shameka Woodard",
"Bennie Craddock",
"Sandie Bakker"
]
}
Is there a way for me to do a search for all the people whos name starts with
"ling" (should be case insensitive) and get distinct terms properly cased "Ling Deponte" not "ling deponte"?
I am find with changing mappings on the index in any way.
Edit does what I want but is really bad query:
{
"size": 0,
"aggs": {
"person": {
"filter": {
"bool":{
"should":[
{"regexp":{
"people.raw":"(.* )?[lL][iI][nN][gG].*"
}}
]}
},
"aggs": {
"top-colors": {
"terms": {
"size":10,
"field": "people.raw",
"include":
{
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
}
}
people.raw is not_analyzed
Yes, and you can do it without a regular expression by taking advantage of Elasticsearch's full text capabilities.
GET /test/_search
{
"query": {
"match_phrase": {
"people": "Ling"
}
}
}
Note: This could also be match or match_phrase_prefix in this case. The match_phrase* queries imply an order of the values in the text. match simply looks for any of the values. Since you only have one value, it's pretty much irrelevant.
The problem is that you cannot limit the document responses to just that name because the search API returns documents. With that said, you can use nested documents and get the desired behavior via inner_hits.
You do not want to do wildcard prefixing whenever possible because it simply does not work at scale. To put it in SQL terms, that's like doing a full table scan; you effectively lose the benefit of the inverted index because it has to walk it entirely to find the actual start.
Combining the two should work pretty well though. Here, I use the query to widdle down results to what you are interested in, then I use your inner aggregation to only include based on the value.
{
"size": 0,
"query": {
"match_phrase": {
"people": "Ling"
}
}
"aggs": {
"person": {
"terms": {
"size":10,
"field": "people.raw",
"include": {
"pattern": ["(.* )?[lL][iI][nN][gG].*"]
}
}
}
}
}
Hi Please find the query it may help for your request
GET skills/skill/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"wildcard": {
"skillNames.raw": "jav*"
}
}
]
}
}
}
}
}
My intention is to find documents starting with the "jav"

Resources