Elasticsearch: Find all parents where all children have a property with a specific value - elasticsearch

I have the current schema:
Object parent with property date and n children child
Object child contains a single property foo
I want to retrieve all parent where all child have their property foo equals to 0
I tried different approach but whatever I do, some parent are retrived while one child has the property foo at 1
Example of my query:
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"lt": "2018-07-05T10:00:00.000Z"
}
}
},
{
"nested": {
"path": "child",
"query": {
"bool": {
"must": {
"term": {
"child.foo": 0
}
}
}
}
}
}
]
}
}
}
I tried with should, match, range... even must_not/should_not. I also tried filtering without any luck, I keep getting hits with the foo property at 1.
I also tried the aggregation path but I don't understand how to apply it to my need.
EDIT: I looked at the possible duplicate. While it did not answer my question, it put me on the right track. My issue was that I was thinking in a SQL way, with joins and such. While I should have thought in the elastic way.
Thus, what I wanted to do could not be done. What I needed was to look for parent where at least one child had the foo property at 1 or more. Then, ignore these results and take the others. Thus the answer is simple: I just add to change the must of the nested query by a must_not and that was it!

As I explained in the edit of my question, the answer is fairly easy once you start thinking in the right way./
{
"query": {
"bool": {
"must": [
{
"range": {
"date": {
"lt": "2018-07-05T10:00:00.000Z"
}
}
},
{
"nested": {
"path": "child",
"query": {
"bool": {
"must_not": {
"term": {
"child.foo": 0
}
}
}
}
}
}
]
}
}
}
The trick was just to set a must_not

Related

Is it possible to access a query term in a script field?

I would like to construct an elasticsearch query in which I can search for a term and on-the-fly compute a new field for each found document, which is calculated based on some existing fields as well as the query term. Is this possible?
For example, let's say in my EL query I am searching for documents which have the keyword "amsterdam" in the "text" field.
"filter": [
{
"match_phrase": {
"text": {
"query": "amsterdam"
}
}
}]
Now I would also like to have a script field in my query, which computes some value based on other fields as well as the query.
So far, I have only found how to access the other fields of a document though, using doc['someOtherField'], for example
"script_fields" : {
"new_field" : {
"script" : {
"lang": "painless",
"source": "if (doc['citizens'].value > 10000) {
return "large";
}
return "small";"
}
}
}
How can I integrate the query term, e.g. if I wanted to add to the if statement "if the query term starts with a-e"?
You're on the right track but script_fields are primarily used to post-process your documents' attributes — they won't help you filter any docs because they're run after the query phase.
With that being said, you can use scripts to filter your documents through script queries. Before you do that, though, you should explore alternatives.
In other words, scripts should be used when all other mechanisms and techniques have been exhausted.
Back to your example. I see three possibilities off the top of my head.
Match phrase prefix queries as a group of bool-should subqueries:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"bool": {
"should": [
{
"match_phrase_prefix": {
"text_field": "a"
}
},
{
"match_phrase_prefix": {
"text_field": "b"
}
},
{
"match_phrase_prefix": {
"text_field": "c"
}
},
... till the letter "e"
]
}
}
]
}
}
}
A regexp query:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"regexp": {
"text_field": "[a-e].+"
}
}
]
}
}
}
Script queries using .charAt comparisons:
POST your-index/_search
{
"query": {
"bool": {
"must": [
{
"script": {
"script": {
"source": """
char c = doc['text_field.keyword'].value.charAt(0);
return c >= params.gte.charAt(0) && c <= params.lte.charAt(0);
""",
"params": {
"gte": "a",
"lte": "e"
}
}
}
}
]
}
}
}
If you're relatively new to ES and would love to see real-world examples, check out my recently released Elasticsearch Handbook. One chapter is dedicated to scripting and as it turns out, you can achieve a lot with scripts (if of course executed properly).

Elasticsearch : filter results based on the date range

I'm using Elasticsearch 6.6, trying to extract multiple results/records based on multiple values (email_address) passed to the query (Bool) on a date range. For ex: I want to extract information about few employees based on their email_address (annie#test.com, charles#test.com, heman#test.com) and from the period i.e project_date (2019-01-01).
I did use should expression but unfortunately it's pulling all the records from elasticsearch based on the date range i.e. it's even pulling other employees information from project_date 2019-01-01.
{
"query": {
"bool": {
"should": [
{ "match": { "email_address": "annie#test.com" }},
{ "match": { "email_address": "chalavadi#test.com" }}
],
"filter": [
{ "range": { "project_date": { "gte": "2019-08-01" }}}
]
}
}
}
I also tried must expression but getting no result. Could you please help me on finding employees using their email_address with the date range?
Thanks in advance.
Should(Or) clauses are optional
Quoting from this article.
"In a query, if must and filter queries are present, the should query occurrence then helps to influence the score. However, if bool query is in a filter context or has neither must nor filter queries, then at least one of the should queries must match a document."
So in your query should is only influencing the score and not actually filtering the document. You must wrap should in must, or move it in filter(if scoring not required).
GET employeeindex/_search
{
"query": {
"bool": {
"filter": {
"range": {
"projectdate": {
"gte": "2019-01-01"
}
}
},
"must": [
{
"bool": {
"should": [
{
"term": {
"email.raw": "abc#text.com"
}
},
{
"term": {
"email.raw": "efg#text.com"
}
}
]
}
}
]
}
}
}
You can also replace should clause with terms clause as in #AlwaysSunny's answer.
You can do it with terms and range along with your existing query inside filter in more shorter way. Your existing query doesn't work as expected because of should clause, it makes your filter weaker. Read more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html
{
"query": {
"bool": {
"filter": [
{
"terms": {
"email_address.keyword": [
"annie#test.com", "chalavedi#test.com"
]
}
},
{
"range": {
"project_date": {
"gte": "2019-08-01"
}
}
}
]
}
}
}

Parent child relationship in ElasticSearch - search for a sentence in all the child docs combined

I am super new to Elastic Search. I have a use case which seems can be solved by the parent-child relationship. Parent doc contains the description of an NGO. Child doc contains various feedbacks sent to the NGO.
Parent Doc structure
{
name
address
description
}
Child doc
{
feedbackContent
}
Let's say, NGO-A 4 feedbacks (meaning 4 child documents)
best teachers
best facilities
good students
location is too far
Another NGO-B has 2 feedbacks (meaning 2 child documents)
good food quality
awesome location
The client should be able to look up the NGOs which has all the terms in the query string passed. Example - client searched for
"best" AND "location".
Since best is present in child1 and child2 and location is present in child 4, NGO-A is a valid output. However, for NGO-B child2 contains one search term and the other search term is not present in any other child doc so NGO-B is not a valid result.
I read the doc - https://blog.mimacom.com/parent-child-elasticsearch/ which is quite good but unable to conclude if this can be done.
Examples I tried
PUT message_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"mapping.single_type": true
},
"mappings": {
"doc": {
"properties": {
"ngo": {"type": "text"},
"feedback": {"type": "text"},
"ngo_relations": {
"type": "join",
"relations": {
"ngo": "feedback"
}
}
}
}
}
}
POST message_index/doc/_bulk
{"index": {"_id":1}}
{"name":"teach for india", "ngo_relations": {"name":"ngo"}}
{"index":{"_id":2}}
{"name":"hope for autism", "ngo_relations": {"name":"ngo"}}
PUT message_index/doc/3?routing=1
{"feedback":"best food","ngo_relations":{"name":"feedback", "parent":1}}
PUT message_index/doc/4?routing=1
{"feedback":"average location","ngo_relations":{"name":"feedback", "parent":1}}
PUT message_index/doc/5?routing=1
{"feedback":"awesome staff","ngo_relations":{"name":"feedback", "parent":1}}
PUT message_index/doc/6?routing=2
{"feedback":"best teachers","ngo_relations":{"name":"feedback", "parent":2}}
PUT message_index/doc/7?routing=2
{"feedback":"awesome overload","ngo_relations":{"name":"feedback", "parent":2}}
For best and location search, just teach for india NGO should be returned.
No hits:
GET message_index/_search
{
"query": {
"has_child": {
"type": "feedback",
"query": {
"bool": {
"must": {
"term": {"feedback": "best"}
},
"must": {
"term": {"feedback": "location"}
}
}
}
}
}
}
Both the documents are returned
GET message_index/_search
{
"query": {
"has_child": {
"type": "feedback",
"query": {
"bool": {
"should": {
"term": {"feedback": "best"}
},
"should": {
"term": {"feedback": "location"}
}
}
}
}
}
}
This can be done. You were close just a small mistake in the query.
In your child query, you are performing a bool with two must/should. Therefore, your query is: Give me all documents such that they have a child such that the child has both (or 'one of the' in case of should) the terms "best" and "location".
Whereas, what you want is: Give me all documents such that they have a child such that the child has the term "best" and also have a child such that the child has the term "location".
Tweak your query as follows:
GET message_index/_search
{
"query": {
"bool": {
"must": [
{
"has_child": {
"type": "feedback",
"query": {
"term": {
"feedback": "best"
}
}
}
},
{
"has_child": {
"type": "feedback",
"query": {
"term": {
"feedback": "location"
}
}
}
}
]
}
}
}

Difference between elasticsearch queries

I'm having a hard time trying to figure out why these two queries do not return the same number of results (I'm using elasticsearch 2.4.1):
{
"nested": {
"path": "details",
"filter": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
{
"nested": {
"path": "details",
"filter": {
"bool": {
"must": [
{ "match": { "details.id": "color" } },
{ "match": { "details.value_str": "red" } }
]
}
}
}
}
The first query has more results.
My guess was that the filter clause in the first query was working like an or/should, but if I replace the must in the second query with a should, the query yields a greater number of results than that of those two.
How does the meaning of those queries differ?
I'm afraid I have no knowledge of the structure of the indexed documents; all I know is how many rows each query returns.
The first query is wrong, the nested filter cannot be an array, so I suspect ES doesn't parse it correctly and only takes one match instead of both, which is probably why it returns more data than the second one.
The second query is correct in terms of nested filter and yields exactly what you expect.

Elasticsearch Nested Filters being inclusive vs. exclusive

I have an object mapping that uses nested objects (props in our example) in a tag-like fashion.
Each tag can belong to a client/user and when we want to allow our users to generate query_string style searches against the props.name.
Issue is that when we run our query if an object has multiple props and if one of the many props match the filter when others don't the object is returned, when we want the opposite - if one returns false don't return vs. if one returns true return.
I have posted a comprehensive example here: https://gist.github.com/d2kagw/1c9d4ef486b7a2450d95
Thanks in advance.
I believe here you might need the advantage of a flattened list of values, like an array of values. The major difference between an array and nested objects is that the latter "knows" which value of a nested property corresponds to another value of another property in the same nested object. The array of values, on the other hand will flatten the values of a certain property and you lose the "association" between a client_id and a name. Meaning, with arrays you have props.client_id = [null, 2] and props.name = ["petlover", "premiumshopper"].
With your nested filter you want to match that string to all values for props.name meaning ALL nested props.names of one parent doc needs to match. Well, this doesn't happen with nested objects, because the nested documents are separate and are queried separately. And, if at least one nested document matches then it's considered a match.
In other words, for a query like "query": "props.name:(carlover NOT petlover)" you basically need to run it against a flattened list of values, just like arrays. You need that query ran against ["carlover", "petlover"].
My suggestion for you is to make your nested documents "include_in_parent": true (meaning, keep in parent a flattened, array-like list of values) and change a bit the queries:
for the query_string part, use the flattened properties approach to be able to match your query for a combined list of elements, not element by element.
for the match (or term, see below) and missing parts use the nested properties approach because you can have nulls in there. A missing on an array will match only if the whole array is missing, not one value in it, so here one cannot use the same approach as for the query, where the values were flattened in an array.
optional, but for the query match integer I would use term, as it's not string but integer and is by default not_analyzed.
These being said, with the above changes, these are the changes:
{
"mappings" : {
...
"props": {
"type": "nested",
"include_in_parent": true,
...
should (and does) return zero results
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{
"query": {
"query_string": { "query": "props.name:((carlover AND premiumshopper) NOT petlover)" }
}
},
{
"nested": {
"path": "props",
"filter": {
"or": [ { "query": { "match": { "props.client_id": 1 } } }, { "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 1
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{"query": {"query_string": { "query": "props.name:(carlover NOT petlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "match": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } } ]
}
}
}
]
}
}
}
}
should (and does) return just 2
GET /nesting-test/_search?pretty=true
{
"query": {
"filtered": {
"filter": {
"and": [
{ "query": {"query_string": { "query": "props.name:(* NOT carlover)" } } },
{
"nested": {
"path": "props",
"filter": {
"or": [{ "query": { "term": { "props.client_id": 1 } } },{ "missing": { "field": "props.client_id" } }
]
}
}
}
]
}
}
}
}

Resources