elasticsearch match on every element in the nested collection - elasticsearch

I am trying to perform an elastic-search query that will return documents where "every" element of the nested collection has a match, not just one.
For example, I have a Driver object, with the List of cars, and each car has the color attribute.
Driver index:
curl --location --request PUT 'localhost:9200/driver' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"properties": {
"driver": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"car": {
"type": "nested",
"properties": {
"color": {
"type": "text"
}
}
}
}
}
}
}
}'
And the following data: Driver John with green and red car, and Driver Bob with two green cars.
curl --location --request PUT 'localhost:9200/driver/_doc/1' \
--header 'Content-Type: application/json' \
--data-raw '{
"driver": {
"name": "John",
"car": [
{
"color": "red"
},
{
"color": "green"
}
]
}
}'
curl --location --request PUT 'localhost:9200/driver/_doc/2' \
--header 'Content-Type: text/plain' \
--data-raw '{
"driver": {
"name": "Bob",
"car": [
{
"color": "green"
},
{
"color": "green"
}
]
}
}'
I want to find the driver that has ONLY green cars (i.e. Bob).
I tried the following query, but it returns a driver that has at least one car that matches color:
curl --location --request GET 'localhost:9200/driver/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"nested": {
"path": "driver",
"query": {
"nested": {
"path": "driver.car",
"query": {
"bool": {
"must": [
{
"match": {
"driver.car.color": "green"
}
}
]
}
}
}
}
}
}
}'
This query returns every driver that has at least one green car. What is the fix? Thank you.

You can add a must_not query explicitly ruling out red:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "driver.car",
"query": {
"match": {
"driver.car.color": "green"
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "driver.car",
"query": {
"range": {
"driver.car.color": {
"gt": "green"
}
}
}
}
}
]
}
}
}
BTW you don't need the double nesting -- drive.car will work exactly the same as driver -> driver.car.

Related

ElasticSearch multiple AND/OR query

I have a schema like below -
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004",
"readBy": [
{
"userId": "scha3055"
},
{
"userId": "abcd1234"
}
]
}
I'm trying to search combination of "errorCode","storeId","businessFunction" with a date range like below -
{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}
But when I add another condition with "businessFunction" the query does not work.
{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"errorDescription": [
"Description e020",
"71103"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}
What am I missing in the query? When I add the third "terms" cndition , the query does not work. Please suggest or let me know any alternate way.
In your example you are searching for "Description e020" but in your example you stored "Description e015".
Short answer, I hope that's right for you:
"Description e015" will have been indexed as the two terms ["description","e015"].
use match_phrase instead of terms
...
{
"match_phrase": {
"errorDescription": "Description e015"
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
....
Without knowing your mapping, I think that your errorDescription field its analyzed.
Other option not recommended:
If your field its analyzed and you require match exact, search in errorDescription.keyword
{
"terms": {
"errorDescription.keyword": [
"Description e015"
]
}
}
UPDATE
Long answer:
As I mentioned previously maybe, your field value was analyzed, then converted from "PriceFeedIntegration2" to pricefeedintegration2.
2 options
Search by your field.keyword aka businessFunction.keyword
Change your field mapping to not analyzed. Then you can get results just as you expect using terms.
Option: 1
It's the easy way, if you never run full text searches on that field, better not use as default. If it does not matter, use this option, it is the simplest.
Check your businessFunction.keyword field (created by default if you dont specify mapping)
Indexing data without mapping on my000001 index
curl -X "POST" "http://localhost:9200/my000001/_doc" \
-H "Content-type: application/json" \
-d $'
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004"
}'
Check
curl -X "GET" "localhost:9200/my000001/_analyze" \
-H "Content-type: application/json" \
-d $'{
"field": "businessFunction.keyword",
"text": "PriceFeedIntegration"
}'
Result:
{
"tokens": [
{
"token": "PriceFeedIntegration",
"start_offset": 0,
"end_offset": 20,
"type": "word",
"position": 0
}
]
}
Get the results using businessFunction.keyword
curl -X "GET" "localhost:9200/my000001/_search" \
-H "Content-type: application/json" \
-d $'{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"businessFunction.keyword": [
"PriceFeedIntegration2",
"PriceFeedIntegration"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}' | jq
Why isn't recommended as default option?
"The default dynamic string mappings will index string fields both as
text and keyword. This is wasteful if you only need one of them."
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html
Option 2
Run on my000001 index
curl -X "GET" "localhost:9200/my000001/_analyze" \
-H "Content-type: application/json" \
-d $'{
"field": "businessFunction",
"text": "PriceFeedIntegration"
}'
You can see, that your field value was analyzed(tokenized, lowercase, and others modifications depending of the analyzer and the value provided)
Results:
{
"tokens": [
{
"token": "pricefeedintegration",
"start_offset": 0,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 0
}
]
}
That is the reason why your search doesn't return results.
"PriceFeedIntegration" doesn't match with "pricefeedintegration"
"The problem isn’t with the term query; it is with the way the data
has been indexed."
Your businessFunction field value was analyzed.
If you require find(search/filter) by exact values, maybe you need to change your "businessFunction" field mapping to not_analyzed.
Change your mapping require delete your index and create again providing the required mapping.
If you try to change the mapping of an existing index you will get an "resource_already_exists_exception" error.
Here is the background that you need to know in order to solve your problem:
https://www.elastic.co/guide/en/elasticsearch/guide/master/_finding_exact_values.html#_finding_exact_values
Create a Mapping on a new my000005 index
curl -X "PUT" "localhost:9200/my000005" \
-H "Content-type: application/json" \
-d $'{
"mappings" : {
"properties" : {
"businessFunction" : {
"type" : "keyword"
},
"errorDescription" : {
"type" : "text"
},
"errorCode" : {
"type" : "keyword"
},
"createdDate" : {
"type" : "date"
},
"storeId": {
"type" : "keyword"
}
}
}
}'
Indexing data
curl -X "POST" "http://localhost:9200/my000005/_doc" \
-H "Content-type: application/json" \
-d $'
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004"
}'
Get the results, that you expect using terms businessFunction
curl -X "GET" "localhost:9200/my000005/_search" \
-H "Content-type: application/json" \
-d $'{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"businessFunction": [
"PriceFeedIntegration2",
"PriceFeedIntegration"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}' | jq
This answer is based on what I think is your mapping and your needs.
In the future share your mapping and your ES version, in order to get a better answer from the community.
curl -X "GET" "localhost:9200/yourindex/_mappings"
Please read this https://www.elastic.co/guide/en/elasticsearch/guide/master/_finding_exact_values.html#_finding_exact_values
and this https://www.elastic.co/blog/strings-are-dead-long-live-strings

Elassandra: UDT List Match Query- No Results

I am using Elassandra. In Cassandra, I have a UDT:
CREATE TYPE test.entity_attributes (
attribute_key text,
attribute_value text
);
It is used in table
CREATE TABLE test.attributes_test (
id text PRIMARY KEY,
attr list<frozen<entity_attributes>>
)
I mapped the attributes_test using:
curl --location --request PUT 'localhost:9200/attr_index' \
--header 'Content-Type: application/json' \
--data-raw '{
"settings": { "keyspace": "test" },
"mappings": {
"attributes_test" : {
"discover":".*"
}
}
}'
(copied from postman)
It returns the following as mapping:
{
"attr_index": {
"aliases": {},
"mappings": {
"attributes_test": {
"properties": {
"attr": {
"type": "nested",
"cql_udt_name": "entity_attributes",
"properties": {
"attribute_key": {
"type": "keyword",
"cql_collection": "singleton"
},
"attribute_value": {
"type": "keyword",
"cql_collection": "singleton"
}
}
},
"id": {
"type": "keyword",
"cql_collection": "singleton",
"cql_partition_key": true,
"cql_primary_key_order": 0
}
}
}
},
"settings": {
"index": {
"keyspace": "test",
"number_of_shards": "1",
"provided_name": "attr_index",
"creation_date": "1615291749532",
"number_of_replicas": "0",
"uuid": "Oua1ACLbRvCATC-kcGPoQg",
"version": {
"created": "6020399"
}
}
}
}
}
This is what I have in the table:
id | attr
----+----------------------------------------------------------------------------------------------
2 | [{attribute_key: 'abc', attribute_value: '2'}, {attribute_key: 'def', attribute_value: '1'}]
1 | [{attribute_key: 'abc', attribute_value: '1'}]
The problem now is, when I run the following query, it does not return any result.
curl --location --request POST 'localhost:9200/attr_index/_search' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"match": {
"attr.attribute_key": "abc"
}
}
}'
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html - describes the way to search in nested objects.
How can I search in the list of nested objects?
It was a mistake in my query. The correct query would be:
{
"query": {
"nested": {
"path": "attr",
"query": {
"match": {
"attr.attribute_key": "abc"
}
}
}
}
}

Elastic Search: return matching parents with matched/unmatched childs

I am using elastic search 7.8.1 and have used parent-child method to index the documents. My requirement is to search both parent and child documents, but return response in a format that parent document is the main document and child document is a field within the parent document. i.e
1) If the child matches, I wish to return parent & child in a document. I am able to achieve this using has_child and inner_hits.
2) If the parent matches the query, I wish to return parent and child in a document even if the child does not matches. (Not sure how to achieve this)
# This is the parent child relationship mapping in index
*curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"my_id": {
"type": "keyword"
},
"my_join_field": {
"type": "join",
"relations": {
"question": "answer"
}
}
}
}
}
'*
Below is the query I am trying to use, but it does not return the child when the parent matches:
*curl -X POST "localhost:9200/my-index-000001/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"should": [
{
"has_child": {
"type": "answer",
"query": {
"match": {
"my_id": "4"
}
},
"inner_hits": {
"size": 1
}
}
},
{
"match": {
"my_id": "1"
}
}
]
}
}
}'*
#Parent docs curl -X PUT "localhost:9200/my-index-000001/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "1", "text": "This is a question", "my_join_field": "question" } ' curl -X PUT "localhost:9200/my-index-000001/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "2", "text": "This is another question", "my_join_field": "question" } '
#Child docs curl -X PUT "localhost:9200/my-index-000001/_doc/3?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "3", "text": "This is an answer", "my_join_field": { "name": "answer", "parent": "1" } } ' curl -X PUT "localhost:9200/my-index-000001/_doc/4?routing=1&refresh&pretty" -H 'Content-Type: application/json' -d' { "my_id": "4", "text": "This is another answer", "my_join_field": { "name": "answer", "parent": "1" } } '
How can I search both parent and child, but return child as a field in parent doc. Thanks in advance.

How to index some fields of an object

I have to log a dynamic object, but I'm interested only to index some fields (not all), but when I configure this behaviour I can't search for those fields.
Here an example of what I'm doing with Elastic 6.x:
curl --request PUT 'http://localhost:9200/manuel-prova?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"log": {
"properties": {
"hello": {
"type": "object",
"enabled": false,
"properties": {
"my-api-key": {
"type": "text"
}
}
},
"check": {
"type": "boolean"
}
}
}
}
}'
Then I insert the data:
curl --request POST 'http://localhost:9200/manuel-prova/log?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"hello": {
"foo": "bar",
"my-api-key": "QWERTY"
},
"check": true
}'
Finally, I tried to query:
curl --request POST 'http://localhost:9200/manuel-prova/_search?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"query": {
"bool": {
"must": [
{ "exists": { "field": "hello.my-api-key" } }
]
}
}
}'
This query doesn't work.
If I change to { "exists": { "field": "check" } } for example, it works.
Do you have any suggestion?
This is because your hello object is defined with enabled: false, which makes ES ignore the content of the field altogether, and hence, it is not searchable.
In order to fix that you need to remove enabled: false, like below, and it will work:
curl --request PUT 'http://localhost:9200/manuel-prova?pretty' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"log": {
"dynamic": false, <-- add this
"properties": {
"hello": {
"type": "object",
"properties": { <-- remove enabled: false
"my-api-key": {
"type": "text"
}
}
},
"check": {
"type": "boolean"
}
}
}
}
}'

Understanding Elasticsearch aggregations

My scenario is the following:
I have people, who can have regular or one-time income. I would like to sum the regular income of every people, who are not deleted and was born within a date range. The query part just works well, but when I start to put together the aggregation part of the Elastic query, I got the wrong figures and can't understand, what do I do wrong.
This is how I've created the mapping for my data type:
curl -X PUT -i http://localhost:9200/people --data '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"person" : {
"properties" : {
"birthDate" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"company" : {
"type" : "string"
},
"deleted" : {
"type" : "boolean"
},
"income" : {
"type": "nested",
"properties" : {
"income_type" : {
"type" : "string"
},
"value" : {
"type" : "double"
}
}
},
"name" : {
"type" : "string"
}
}
}
}
}
}'
This is the data:
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200
/people/person/1 --data '{
"deleted":false,
"birthDate":"1980-10-10",
"name":"John Smith",
"company": "IBM",
"income": [{"income_type":"regular","value":55.5}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/2 --data '{
"deleted":true,
"birthDate":"1960-10-10",
"name":"Mary Legend",
"company": "ASUS",
"income": [{"income_type":"one-time","value":10},{"income_type":"regular","value":55}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/3 --data '{
"deleted":false,
"birthDate":"2000-10-10",
"name":"F. King Elastic",
"income": [{"income_type":"one-time","value":1},{"income_type":"regular","value":5}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/4 --data '{
"deleted":false,
"birthDate":"1989-10-10",
"name":"Prison Lesley",
"income": [{"income_type":"regular","value":120.7},{"income_type":"one-time","value":99.3}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/5 --data '{
"deleted":false,
"birthDate":"1983-10-10",
"name":"Prison Lesley JR.",
"income": [{"income_type":"one-time","value":99.3}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/6 --data '{
"deleted":true,
"birthDate":"1986-10-10",
"name":"Hono Lulu",
"income": [{"income_type":"regular","value":11.3}]
}'
This is a query, which filters for undeleted people, who have at least one regular income, and was born between the given dates. The below query still works as expected (two persons were fulfilling the criteria):
curl -X POST -H 'Content-Type: application/json' -i 'http://localhost:9200/people/person/_search?pretty=true' --data '{
"size": 100,
"filter": {
"bool": {
"must": [
{
"match": {
"deleted": false
}
},
{
"range": {
"birthDate": {
"gte": "1980-01-01",
"lte": "1990-12-31"
}
}
},
{
"nested": {
"path": "income",
"query": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
}
]
}
}
}
}
]
}
}
}'
But when I add the aggregation section, everything goes wrong, and I do not understand, why :(
curl -X POST -H 'Content-Type: application/json' -i 'http://localhost:9200/people/person/_search?pretty=true' --data '{
"size": 100,
"filter": {
"bool": {
"must": [
{
"match": {
"deleted": false
}
},
{
"range": {
"birthDate": {
"gte": "1980-01-01",
"lte": "1990-12-31"
}
}
},
{
"nested": {
"path": "income",
"query": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"incomes": {
"nested": {
"path": "income"
},
"aggs": {
"income_type": {
"filter": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
},
{
"match": {
"deleted": false
}
}
]
}
},
"aggs": {
"totalIncome": {
"sum": {
"field": "income.value"
}
}
}
}
}
}
}
}'
The result is this:
...
"aggregations": {
"incomes": {
"doc_count": 9,
"income_type": {
"doc_count": 0,
"totalIncome": {
"value": 0.0
}
}
}
}
}
I was expecting the doc_count to be 2, and the totalIncome should be 176.2 (120.7 + 55.5)
Does anyone have an idea, what do I do wrong?
Great start! You don't need the filter on the deleted field in your aggregation since your query is already filtering out all deleted documents. Try this:
"aggs": {
"incomes": {
"nested": {
"path": "income"
},
"aggs": {
"income_type": {
"filter": {
"match": {
"income.income_type": "regular"
}
},
"aggs": {
"totalIncome": {
"sum": {
"field": "income.value"
}
}
}
}
}
}
}

Resources