Elasticsearch OR condition with Multiple Criteria - elasticsearch

I need to query specific results in elasticsearch, using a set of global filters, and an extended OR clause, where each case in the OR clause consists of a specific tuple of conditions. In words, it would be something like this: 'Fetch records which match these global filters: category = 'foo' and name = 'bar'. From that set, fetch records where the keywords (x, y, z) match any of the following: (1, 2, 3), (4, 5, 6), or (7, 8, 9).
For example, if I have these items:
Item 1:
category: foo, name: bar,
x: 1, y: 2, z: 3
Item 2:
category: foo, name: baz,
x: 1, y: 2, z: 3
Item 3:
category: foo, name: bar,
x: 4, y: 5, z: 6
Item 4:
category: foo, name: bar,
x: 10, y: 11, z: 12
The search should not return Item 2 (because it fails the global condition that name = 'bar'), or Item 4 (because it has (x, y, z) = (10, 11, 12), which was not one of my specified/allowed tuples). It should return the other items, which match both the global conditions and fall within the list of specified/allowed tuples for the values of x, y, z.
I know I could issue one simple query per item to do this; but I assume this would be very inefficient, since I need to specify on the order of 10K tuples or more, each time.
Apologies if this was already answered; one of the existing answers may already be adaptable for this, but I am too new to elasticsearch to recognize how to do it.
Environment: elasticsearch 7.10.1 in Python 3.8.

Here you go,
{
"query": {
"bool": {
"must": [
{
"term": {
"category": "foo"
}
},
{
"term": {
"name": "bar"
}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"x": 1
}
},
{
"term": {
"y": 2
}
},
{
"term": {
"z": 3
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"x": 4
}
},
{
"term": {
"y": 5
}
},
{
"term": {
"z": 6
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"x": 7
}
},
{
"term": {
"y": 8
}
},
{
"term": {
"z": 9
}
}
]
}
}
]
}
}
]
}
}
}
All you need is bool query, with a combination of must and should clauses.

Related

Elasticsearch conditional sorting by different fields

Let's say that my buisness need is to sort results differently, based on some "external" parameter that i'm passing to query.
Documents are more or less like:
{
"transfer_rate": 2000.00,
"some_collection": [
{ "transfer_rate": 1000.00, "identifier": 1, "campaign": 1 },
{ "transfer_rate": 500.00, "identifier": 2, "campaign": 2 },
{ "transfer_rate": 750.00, "identifier": 3, "campaign": 3 },
//...
]
},
{
"transfer_rate": 500.00,
"some_collection": [
{ "transfer_rate": 1000.00, "identifier": 4, "campaign": 1 },
{ "transfer_rate": 2000.00, "identifier": 5, "campaign": 2 },
{ "transfer_rate": 625.00, "identifier": 6, "campaign": 3 },
{ "transfer_rate": 225.00, "identifier": 7, "campaign": 1 },
//...
]
}
And now i do have my "parameter", let's say, that's equal to 750.00.
Now, i would like to order this set of documents differently, depends on how different root's transfer_rate is compared to given param as follows:
If doc['transfer_rate'] >= _param then sort by doc['transfer_rate'], else sort by MIN of doc['some_collection'].transfer_rate.
I know that there could be some document optimisations done, but i wasn't inventing this model, nor i'm allowed to change or re-index.
The tricky part about nested objects is, that they do contain property (in given example it's campaign) that has to match criteria, so basically:
When doc['transfer_rate'] is LT than _param_, order by minimum value of doc['some_collection'].transfer_rate where campaign equals to XYZ
So, for given example, with given parameter, documents like first one, should be ordered by doc['transfer_rate'] and documents like second one, should be ordered by nested.
Thanks for any advices / links / support
This is going to be a pain if you can not reindex the data.
I came up with this query
GET /71095886/_search
{
"query": {
"nested": {
"path": "some_collection",
"query": {
"match": {
"some_collection.campaign": 1
}
}
}
},
"sort": {
"_script": {
"type": "number",
"script": {
"lang": "painless",
"source": """
if (doc['transfer_rate'].value >= params.factor){
return doc['transfer_rate'].value;
} else {
def min = 10000;
for (item in doc['some_collection']){
if (item['transfer_rate'] < min){
min = item['transfer_rate'];
}
}
return min;
}
""",
"params": {
"factor": 2000
}
},
"order": "asc"
}
}
}
But it won't work because of the nested object, and how it is stored in Elastic (actually Lucene, but let's not get down that road .. yet)
If you add "nested_path" : "some_collection" in _script you won't have access to the global transfer_rate anymore (because stored in a different Lucene documents).
Maybe on thing you can look into is runtime fields

Query return the search difference on elasticsearch

How would the following query look:
Scenario:
I have two bases (base 1 and 2), with 1 column each, I would like to see the difference between them, that is, what exists in base 1 that does not exist in base 2, considering the fictitious names of the columns as hostname.
Example:
Selected value of Base1.Hostname is for Base2.Hostname?
YES → DO NOT RETURN
NO → RETURN
I have this in python for the following function:
def diff(first, second):
second = set (second)
return [item for item in first if item not in second]
Example match equal:
GET /base1/_search
{
"query": {
"multi_match": {
"query": "webserver",
"fields": [
"hostname"
],
"type": "phrase"
}
}
}
I would like to migrate this architecture to elastic search in order to generate forecast in the future with the frequency of change of these search in the bases
This could be done with aggregation.
Collect all the hostname from base1 & base2 index
For each hostname count occurrences in base2
Keep only the buckets that have base2 count 0
GET base*/_search
{
"size": 0,
"aggs": {
"all": {
"composite": {
"size": 10,
"sources": [
{
"host": {
"terms": {
"field": "hostname"
}
}
}
]
},
"aggs": {
"base2": {
"filter": {
"match": {
"_index": "base2"
}
}
},
"index_count_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"base2_count": "base2._count"
},
"script": "params.base2_count == 0"
}
}
}
}
}
}
By the way don't forget to use pagination to get rest of the result.
References :
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html
https://discuss.elastic.co/t/data-set-difference-between-fields-on-different-indexes/160015/4

Elasticsearch query to return the most recent of 'each document' based on a condition

I am trying to retrieve the most recent version of each document in my dataset when the document is not already archived (archived: false). So when any version of the document has archived set to true, it should not appear in my result.
An example of my dataset:
{
name: "soccer game",
base_id: 1,
hours_remaining: 10,
updatedDate: 2019 - 03 - 10,
archived: false
}
{
name: 'basketball game",
base_id: 2,
hours_remaining: 20,
updatedDate: 2019 - 03 - 10,
archived: false
}
{
name: "soccer game",
base_id: 1,
hours_remaining: 5,
updatedDate: 2019 - 03 - 14,
archived: true
}
The expected result is :
{
name: 'basketball game",
base_id: 2,
hours_remaining: 20,
timestamp: 2019 - 03 - 10,
archived: false
}
After writing several queries, I haven't been able to achieve my goal. This is one of my attempts.
{
"size": 10,
"query":{
"bool":{
"must":[
{
"query_string":{
"query": "*",
"fields":["name.keyword"]
}
},
{
"term":{
"archived": false
}
}
]
}
},
"collapse": {
"field": "base_id",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{"updatedDate": "desc"}]
}
}
}
What am I doing wrong?
I believe your query_string can be avoided. If you would like to get only the archived: false, the name is not necessary.
I think you should first use a must condition to filter the ones that have the field archived set to False, then you should use a terms aggregation on the name field so it will bring you back the unique names that fulfill the must condition.
You can then use a max aggregation as sub-aggregation to bring you only the biggest value in the updatedDate field. The final query should look like this:
"size": 0, #We don't care about the size of this
"query":{
"bool": {
"must": {
"term":{
"archieved": false #Only false will be shown
}
}
},
}
"aggs":{
"names":{
"terms":{
"field": "name.keyword" #Unique names will be shown here
},
"aggs":{
"most_recent":{
"max": {
"field": "updatedDate" #The max value of this field
}
}
}
}
}
Hope this is helpful! :D
This is more simple. Query on archived, regardless of anything else.
then order by date desc, and keep only the first one (size = 1)
{
"size": 1,
"query": {
"bool": {
"must": {
"term": {
"archived": "false"
}
}
}
},
"sort": [
{
"updatedDate": {
"order": "desc"
}
}
]
}

Elasticsearch match against filter only

We have a multi-tenant index and need to perform queries against the index for a single tenant only. Basically, for all documents that match the filter, return any documents that match the following query, but do not include documents that only match the filter.
For example, say we have a list of documents document like so:
{ _id: 1, account_id: 1, name: "Foo" }
{ _id: 2, account_id: 2, name: "Bar" }
{ _id: 3, account_id: 2, name: "Foo" }
I thought this query would work but it doesn't:
{
"bool": {
"filter": { "term": { "account_id": 2 } },
"should": [
{ "match": { "name": "Foo" }
]
}
}
It returns both documents matching account_id: 2:
{ _id: 3, account_id: 2, name: "Foo", score: 1.111 }
{ _id: 2, account_id: 2, name: "Bar", score: 0.0 }
What I really want is it just to return document _id: 3, which is basically "Of all documents where account_id is equal to 2, return only the ones whose names match Foo".
How can I accomplish this with ES 6.2? The caveat is that the number of should and must match conditions are not always known and I really want to avoid using minimum_should_match.
Try this instead: simply replace should by must:
{
"bool": {
"filter": { "term": { "account_id": 2 } },
"must": [
{ "match": { "name": "Foo" }
]
}
}

How to calculate the sum amount of parent field in Elasticsearch?

Hi I have denormalized the data to be flat in elasticsearch.
e.g.
{childId: 123, childAmount: 3.4, parentId: 1, parentAmount: 5.6}
{childId: 234, childAmount: 4.4, parentId: 1, parentAmount: 5.6}
{childId: 345, childAmount: 5.4, parentId: 2, parentAmount: 1.2}
See there are 3 children and 2 identical parent.
How to calulate the sum amount of parentAmount (which should be 6.8)?
Thanks. And if possible, how to use kibana metric visual to show this data?
In Kibana you can do it like this using a Metric visualization:
And with a query like this:
{
"size": 0,
"aggs": {
"per_parent": {
"terms": {
"field": "parentId",
"size": 25
},
"aggs": {
"max": {
"max": {
"field": "parentAmount"
}
}
}
},
"sum_amounts": {
"sum_bucket": {
"buckets_path": "per_parent>max"
}
}
}
}

Resources