elasticsearch - find document by exactly matching a nested object - elasticsearch

I have documents that contain multiple role/right definitions as an array of nested objects:
{
...
'roleRights': [
{'roleId':1, 'right':1},
{'roleId':2, 'right':1},
{'roleId':3, 'right':2},
]
}
I am trying to filter out document with specific roleRights, but my query seems to mix up combinations. Here is my filterQuery as "pseudoCode"
boolFilter > must > termQuery >roleRights.roleId: 1
boolFilter > must > termQuery >roleRights.type: 2
The above should only return
documents that have role 1 assigned with right 2.
But it looks like i get
all document that have role 1 assigned disregarding the right
and all documents that have right 2 assigned disregarding the role.
Any hints?

You need to map roleRights as nested (see a good explanation here), like below:
PUT your_index
{
"mappings": {
"your_type": {
"properties": {
"roleRights": {
"type": "nested",
"properties": {
"roleId": { "type": "integer" },
"right": { "type": "integer" }
}
}
}
}
}
}
Make sure to delete your index first, recreate it and re-populate it.
Then you'll be able to make your query like this:
POST your_index/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "roleRights",
"query": {
"term": { "roleRights.roleId": 1}
}
}
},
{
"nested": {
"path": "roleRights",
"query": {
"term": { "roleRights.type": 2}
}
}
}
]
}
}
}

Related

What's the best way of storing tags into elasticsearch

I have a index 'product' in elasticsearch,I want to add some tags like 'environmental','energy-saving','recyclable','medical-grade' to item.I collected some ways after google:array,nested,bit.
1.Use array.
{
"mappings": {
"properties": {
"tags": {
"type": "keyword"
}
}
}
}
It can store tag's name directly.
Query that contains 'environmental' and 'medical-grade':
{
"query": {
"bool": {
"must": {
"terms": {
"tags": [
"environmental",
"medical-grade"
]
}
}
}
}
}
2.Use nested.
{
"mappings": {
"properties": {
"tags": {
"type": "nested",
"properties": {
"code": {
"type": "text"
}
}
}
}
}
}
It can store tag's name directly too even id or others.
Query that contains 'environmental' and 'medical-grade':
{
"query": {
"bool": {
"must": {
"terms": {
"tags.name": [
"environmental",
"medical-grade"
]
}
}
}
}
}
3.Use bit.
{
"mappings": {
"properties": {
"tags": {
"type": "long"
}
}
}
}
It can store tags indirectly and need to specify a bit as a tag.
Suppose the n-th bit represents n-th tag(binary):0->'environmental',1->'energy-saving',2->'recyclable',3->'medical-grade'.So 1001(binary,equal to 9 in decimal) means it contains 'environmental' and 'medical-grade'.
Query that contains 'environmental' and 'medical-grade':
{
"query": {
"bool": {
"must": {
"script": {
"script": "doc['tags'].size() != 0 && (doc['tags'].value&9)==9"
}
}
}
}
}
I don't know how them performs,but I likes third way actually.Please give me some advice or better way.
My suggestion will be go with option 1 and use array. it will easy to query data and also used in aggregation.
Option 2, you can use but i dont think so its best for your case because you dont have nested or paent-child data so it is unneccessary to store as nested.
Option 3, I will not suggest as you need to use script at query time and it will impact the performance.

How do I refer to multiple nesting levels in an Elastic Search's Filter Aggregation?

Let's call my root level foo and my child level events. I want to aggregate on the events level but with a filter that EITHER the event has color "orange" OR the parent foo has customerId "35".
So, I want to have a filter aggregation that's inside a nested aggregation. In this filter's query clause, I have one child that refers to a field on foo and the other refers to a field on events. However, that first child has no way to actually reference the parent like that! I can't use a reverse_nested aggregation because I can't put one of those as a child of a compound query, and I can't filter before nesting because I'd lose the OR semantics that way. How do I reference the field on foo?
Concrete example if it helps. Mapping:
{
"foo": {
"properties": {
"customer_id": { "type": "long" },
"events": {
"type": "nested",
"properties": {
"color": { "type": "keyword" },
"coord_y": { "type": "double" }
}
}
}
}
}
(update for clarity: that's an index named foo with the root mapping named foo)
The query I want to be able to make:
{
"aggs": {
"OP0_nest": {
"nested": { "path": "events" },
"aggs": {
"OP0_custom_filter": {
"filter": {
"bool": {
"should": [
{ "term": { "events.color": "orange" } },
{ "term": { "customer_id": 35 } }
]
}
},
"aggs": {
"OP0_op": {
"avg": { "field": "events.coord_y" }
}
}
}
}
}
}
}
Of course, this does not work, because the child of the should clause containing customer_id does not work. That term query is always false because customer_id can't be accessed inside the nested aggregation.
Thanks in advance!
Since the fields you want to apply filter on are at different levels you need to make query for each level separately and place them in should clause of bool query which becomes the filter for our filter aggregation. In this aggregation we then add a nested aggregation to get the avg of coord_y.
The aggregation will be (UPDATED: since foo is index name removed foo from field names):
{
"aggs": {
"OP0_custom_filter": {
"filter": {
"bool": {
"should": [
{
"term": {
"customer_id": 35
}
},
{
"nested": {
"path": "events",
"query": {
"term": {
"events.color": "orange"
}
}
}
}
]
}
},
"aggs": {
"OP0_op": {
"nested": {
"path": "events"
},
"aggs": {
"OP0_op_avg": {
"avg": {
"field": "events.coord_y"
}
}
}
}
}
}
}
}

Multiple (AND) queries for a nested index structure in Elasticsearch

I have an index with the below mapping
{
"mappings": {
"xxxxx": {
"properties": {
"ID": {
"type": "text"
},
"pairs": {
"type": "nested"
},
"xxxxx": {
"type": "text"
}
}
}
}
}
the pairs field is essentially an array of objects - each object has a unique ID associated with it
What i'm trying to do is to get only one object from the pairs field for updates. To that extent , i've tried this
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"match": {
"pairs.id": "c1vNGnnQLuk"
}
}
]
}
},
"_source": "pairs"
}
but this just returns an empty object despite them being valid IDs. If i remove the pairs.id rule - i get the entire array of objects .
What do i need to add/edit to ensure that i can query via both IDS (original and nested)
Since pairs is of nested type, you need to use a nested query. Also you might probably want to leverage nested inner-hits as well:
GET /sample/_search/?size=1000
{
"query": {
"bool": {
"must": [
{
"match": {
"ID": "2rXdCf5OM9g1ebPNFdZNqW"
}
},
{
"nested": {
"path": "pairs",
"query": {
"match": {
"pairs.id": "c1vNGnnQLuk"
}
},
"inner_hits": {}
}
}
]
}
},
"_source": false
}

Elasticsearch Query Filter for Word Count

I am currently looking for a way to return documents with a maximum of n words in a certain field.
The query could look like this for a resultset that contains documents with less than three words in the "name" field but there is nothing like word_count as far as I know.
Does anyone know how to handle this, maybe even in a different way?
GET myindex/myobject/_search
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"word_count": {
"name": {
"lte": 3
}
}
}
]
}
},
"query": {
"match_all" : { }
}
}
}
}
You can use the token_count data type in order to index the number of tokens in a given field and then search on that field.
# 1. create the index/mapping with a token_count field
PUT myindex
{
"mappings": {
"myobject": {
"properties": {
"name": {
"type": "string",
"fields": {
"word_count": {
"type": "token_count",
"analyzer": "standard"
}
}
}
}
}
}
}
# 2. index some documents
PUT index/myobject/1
{
"name": "The quick brown fox"
}
PUT index/myobject/2
{
"name": "brown fox"
}
# 3. the following query will only return document 2
POST myindex/_search
{
"query": {
"range": {
"name.word_count": { 
"lt": 3
}
}
}
}

How to query for two fields in one and the same tuple in an array in ElasticSearch?

Let's say there are some documents in my index which look like this:
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"2"
},
{
"name":"boo",
"value":"2"
}
]
},
{
"category":"2020",
"properties":[
{
"name":"foo",
"value":"8"
},
{
"name":"boo",
"value":"2"
}
]
}
I'd like to query the index in a way to return only those documents that match "foo":"2"but not "boo":"2".
I tried to write a query that matches both properties.name and properties.value, but then I'm getting false positives. I need a way to tell ElasticSearch that name and value have to be part of the same properties tuple.
How can I do that?
You need to map properties as a nestedtype. So your mapping would look similar to this:
{
"your_type": {
"properties": {
"category": {
"type": "string"
},
"properties": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"value": {
"type": "string"
}
}
}
}
}
}
Then, your query to match documents having "foo=2" in the same tuple but not "boo=2" in the same tuple would need to use the nested query accordingly, like the one below.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "foo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "properties",
"query": {
"bool": {
"must": [
{
"match": {
"properties.name": "boo"
}
},
{
"match": {
"properties.value": "2"
}
}
]
}
}
}
}
]
}
}
}
#Val's answer is as good as it gets. One thing I would add, though, since it makes the difference between one type of query and others that might benefit from nesteds "opposite" feature.
In Elasticsearch, the default type for "properties":[{"name":"foo","value":"2"},{"name":"boo","value":"2"}] that is used to auto-create such a field is object. The object has the drawback that it doesn't associate one sub-field's value with another sub-field's value, meaning foo is not necessarily associated with 2. name is just an array of values and value is the again another array of values with not association between the two.
If one needs the above association to work then nested is a must.
But, I have encountered situations where both these features were needed. If you need both of these, you can set include_in_parent: true for the mapping so that you can take advantage of both. One of the situations that I have seen is here.
"properties": {
"type": "nested",
"include_in_parent": true,
"properties": {
"name": {
"type": "string"
...

Resources