"update by query" not working as expected with straight calls - elasticsearch

I've an script that calls Elasticsearch with some update_by_query.
Here I update the item with id=299966 and change the trash flag, trash=0:
_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=0"
}
}
Then I the item with id=299966 (same item as above) to trash=1:
_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=1"
}
}
The thing is that after doing this two operations, if I search for the item with id=299966, I get trash=0, when it's supposed to be trash=1 as it's the last one executed. I always mantain the order and my own log shows that the one with trash=0 is first executed, and then the one with trash=1.
Is there any stuff inside the update_by_query logic that avoids to make two calls? Do I have to wait some seconds or something to make the second update_by_query?
PS: Nervemind those double query on the codes. It's working ok.
Thanks in advance.

The solution I found is to use _flush after every _update or every _update_by_query.
myindex/_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=0"
}
}
myindex/_flush
myindex/_update_by_query
{
"query": {
"query": {
"bool": {
"must": [
{
"terms": {
"_id": [
299966
]
}
}
],
"should": [
]
}
}
},
"script": {
"inline": "ctx._source.trash=1"
}
}

Related

ElasticSearch: Query nested array for empty and specific value in single query

Documents structure -
{
"hits": [
{
"_type": "_doc",
"_id": "ef0a2c44179a513476b080cc2a585d95",
"_source": {
"DIVISION_NUMBER": 44,
"MATCHES": [
{
"MATCH_STATUS": "APPROVED",
"UPDATED_ON": 1599171303000
}
]
}
},
{
"_type": "_doc",
"_id": "ef0a2c44179a513476b080cc2a585d95",
"_source": {
"DIVISION_NUMBER": 44,
"MATCHES": [ ]
}
}
]
}
Question - MATCHES is a nested array inside there is a text field MATCH_STATUS that can have any values say "APPROVED","REJECTED".
I am looking to search ALL documents that contain MATCH_STATUS having values say "APPROVED", "RECOMMENDED" as well as where there is no data in MATCHES (empty array "MATCHES": [ ]). Please note I want this in a single query.
I am able to do this in two separate queries like this -
GET all matches with status = RECOMMENDED, APPROVED
"must": [
{
"nested": {
"path": "MATCHES",
"query": {
"terms": {
"MATCHES.MATCH_STATUS.keyword": [
"APPROVED",
"RECOMMENDED"
]
}
}
}
}
]
GET all matches having empty array "MATCHES" : [ ]
{
"size": 5000,
"query": {
"bool": {
"filter": [],
"must_not": [
{
"nested": {
"path": "MATCHES",
"query": {
"exists": {
"field": "MATCHES"
}
}
}
}
]
}
},
"from": 0
}
You can combine both queries using should clause.
{
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"nested": {
"path": "MATCHES",
"query": {
"bool": {
"minimum_should_match": 1,
"should": [
{
"terms": {
"MATCHES.MATCH_STATUS.keyword": [
"APPROVED",
"RECOMMENDED"
]
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"nested": {
"path": "MATCHES",
"query": {
"bool": {
"filter": {
"exists": {
"field": "MATCHES"
}
}
}
}
}
}
]
}
}
]
}
}
}
Update: To answer your comment.
Missing aggregation does not support nested field for now. There is open issue as of now.
To get count of empty matches, you can use a filter aggregation with the nested query wrapped into the must_not clause of the bool query.
{
"aggs": {
"missing_matches_agg": {
"filter": {
"bool": {
"must_not": {
"nested": {
"query": {
"match_all": {}
},
"path": "MATCHES"
}
}
}
}
}
}
}

ElasticSearch should with nested and bool must_not exists

With the following mapping:
"categories": {
"type": "nested",
"properties": {
"category": {
"type": "integer"
},
"score": {
"type": "float"
}
}
},
I want to use the categories field to return documents that either:
have a score above a threshold in a given category, or
do not have the categories field
This is my query:
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
<id>
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [
{
"exists": {
"field": "categories"
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}
It correctly returns documents both with and without the categories field, and orders the results so the ones I want are first, but it doesn't filter the results having score below the 0.5 threshold.
Great question.
That is because categories is not exactly a field from the elasticsearch point of view[a field on which inverted index is created and used for querying/searching] but categories.category and categories.score is.
As a result categories being not found in any document, which is actually true for all the documents, you observe the result what you see.
Modify the query to the below and you'd see your use-case working correctly.
POST <your_index_name>/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"terms": {
"categories.category": [
"100"
]
}
},
{
"range": {
"categories.score": {
"gte": 0.5
}
}
}
]
}
}
}
},
{
"bool": {
"must_not": [ <----- Note this
{
"nested": {
"path": "categories",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "categories.category"
}
},
{
"exists": {
"field": "categories.score"
}
}
]
}
}
}
}
]
}
}
],
"minimum_should_match": 1
}
}
}

passing multiple combination query in elastic search

`"query": {
"function_score": {
"query": {
"bool": {
"must": [],
"should": [],
"filter": [
{
"terms": {
"category": "type-1",
"product": "product-A"
},
"terms": {
"category": "type-2",
"product": "product-B"
}
}
]
}
},
"functions": []
}
},`
I want to pass multiple combination query like above is it possible, what should be the correct query format
in sql my query would be
select * from product where (category='type1' and product=product-A) or (category='type2' and product=product-B) or (category='type3' and product=product-C)
i want to replicate above query
If you want to make a OR statement in a bool query you should is a nested bool query with multiple should clause.
so try :
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [],
"should": [],
"filter": [
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"category": "type-1"
}
},
{
"term": {
"product": "product-A"
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"category": "type-2"
}
},
{
"term": {
"product": "product-B"
}
}
]
}
}
]
}
}
]
}
}
}
},
"functions": []
}
and if you have no must clause you can move your filters clauses into the main should as only document that matchs at least one of the clause will match.

Elasticsearch filtered query with script for term frequency

I'm using the attachment plugin: https://github.com/elastic/elasticsearch-mapper-attachments
I'm able to find documents with a specific word in 1 or more fields but unable to filter documents with a lower term frequency than searched for.
This works:
POST /crm/employee/_search
{
"query": {"filtered": {
"query": {"match": {
"employee.cv.content": "transitie"
}},
"filter": {
"bool": {
"should": [
{"terms": {
"employee.listEmployeeType.id": [
2
]
}}
]
}
}
}},
"highlight": {"fields": {"employee.cv.content" : {}}}
}
After a long search, I've found the following:
"script": {
"script": "crm['employee.cv.content'][lookup].tf() > occurrence",
"params": {
"lookup": "transitie",
"occurrence": 1
}
},
I'm unable to implement it unfortunately. I hope i've explained the issue good enough for someone to give me a push in the right direction!
{
"query": {
"filtered": {
"query": {
"match": {
"employee.cv.content": "transitie"
}
},
"filter": {
"bool": {
"should": [
{
"terms": {
"employee.listEmployeeType.id": [
2
]
}
}
],
"must": [
{
"script": {
"script": "_index['employee.cv.content'][lookup].tf() > occurrence",
"params": {
"lookup": "transitie",
"occurrence": 1
}
}
}
]
}
}
}
},
"highlight": {
"fields": {
"employee.cv.content": {}
}
}
}

Elastic Search - OR querying for non matches

I'm having trouble querying in elastic search. I'm searching over a specific set of data defined by the state_id, and then wanting to return all the states which do not have either one of the cities defined by the identifiers below.
The query below returns 18 results with just "city_id_1", and 0 results with "city_id_2". With both though, I return 0 results (since "city_id_2" is on every state record). What I want to do is still return the 18 results, but query over both cities.
I feel like my query should be working, and basically doing a NOT (A or B) style query, equivalent to NOT A and NOT B, but basically the 0 results seems to be overriding the 18.
Is there a way I can change my query to get the results I want, or is this something elasticsearch cannot do?
{
"query": {
"bool": {
"must": [
{ "terms": { "state_id": ["4ca16f80-da79-11e5-9874-64006a4f57cb"]}}
],
"must_not": [
{
"nested": {
"path": "cities",
"query": {
"bool": {
"should": [
{"term": { "cities.identifier": "city_id_1"}},
{"term": { "cities.identifier": "city_id_2"}}
]
}
}
}
}
]
}
},
"size": 10
}
Try this on for size. Elasticsearch is silly. The filter needs to be in each of the nested queries.
{
"query": {
"bool": {
"should": [
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "cities",
"query": {
"term": { "cities.identifier": "city_id_1"}
}
}
}
],
"filter":[
{
"term":{
"state_id":"4ca16f80-da79-11e5-9874-64006a4f57cb"
}
}
]
}
}
},
{
"query": {
"bool": {
"must_not": [
{
"nested": {
"path": "cities",
"query": {
"term": { "cities.identifier": "city_id_2"}
}
}
}
],
"filter":[
{
"term":{
"state_id":"4ca16f80-da79-11e5-9874-64006a4f57cb"
}
}
]
}
}
}
]
}
},
"size": 10
}
If you want NOT A AND NOT B behaviour you need to make a little change
{
"query": {
"bool": {
"must": [
{ "terms": { "state_id": ["4ca16f80-da79-11e5-9874-64006a4f57cb"]}}
],
"must_not": [
{
"nested": {
"path": "cities",
"query": {
"bool": {
"must": [ ====> Use must instead of should
{"term": { "cities.identifier": "city_id_1"}},
{"term": { "cities.identifier": "city_id_2"}}
]
}
}
}
}
]
}
},
"size": 10
}
This will exclude those record which will have both city_id_1 and city_id_2.
As per my understanding, you are looking our for NOT A or NOT B kind of a clause. Please check the query below and see if it fits your requirement
{
"query": {
"bool": {
"must": [
{ "terms": { "state_id": ["4ca16f80-da79-11e5-9874-64006a4f57cb"]}}
],
"should": [
{
"nested": {
"path": "cities",
"query": {
"bool": {
"must_not": [
{"term": { "cities.identifier": "city_id_1"}}
]
}
}
}
},
{
"nested": {
"path": "cities",
"query": {
"bool": {
"must_not": [
{"term": { "cities.identifier": "city_id_2"}}
]
}
}
}
}
],
"minimum_number_should_match": 1
}
},
"size": 10
}

Resources