ElasticSearch multiple AND/OR query - elasticsearch

I have a schema like below -
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004",
"readBy": [
{
"userId": "scha3055"
},
{
"userId": "abcd1234"
}
]
}
I'm trying to search combination of "errorCode","storeId","businessFunction" with a date range like below -
{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}
But when I add another condition with "businessFunction" the query does not work.
{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"errorDescription": [
"Description e020",
"71103"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}
What am I missing in the query? When I add the third "terms" cndition , the query does not work. Please suggest or let me know any alternate way.

In your example you are searching for "Description e020" but in your example you stored "Description e015".
Short answer, I hope that's right for you:
"Description e015" will have been indexed as the two terms ["description","e015"].
use match_phrase instead of terms
...
{
"match_phrase": {
"errorDescription": "Description e015"
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
....
Without knowing your mapping, I think that your errorDescription field its analyzed.
Other option not recommended:
If your field its analyzed and you require match exact, search in errorDescription.keyword
{
"terms": {
"errorDescription.keyword": [
"Description e015"
]
}
}
UPDATE
Long answer:
As I mentioned previously maybe, your field value was analyzed, then converted from "PriceFeedIntegration2" to pricefeedintegration2.
2 options
Search by your field.keyword aka businessFunction.keyword
Change your field mapping to not analyzed. Then you can get results just as you expect using terms.
Option: 1
It's the easy way, if you never run full text searches on that field, better not use as default. If it does not matter, use this option, it is the simplest.
Check your businessFunction.keyword field (created by default if you dont specify mapping)
Indexing data without mapping on my000001 index
curl -X "POST" "http://localhost:9200/my000001/_doc" \
-H "Content-type: application/json" \
-d $'
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004"
}'
Check
curl -X "GET" "localhost:9200/my000001/_analyze" \
-H "Content-type: application/json" \
-d $'{
"field": "businessFunction.keyword",
"text": "PriceFeedIntegration"
}'
Result:
{
"tokens": [
{
"token": "PriceFeedIntegration",
"start_offset": 0,
"end_offset": 20,
"type": "word",
"position": 0
}
]
}
Get the results using businessFunction.keyword
curl -X "GET" "localhost:9200/my000001/_search" \
-H "Content-type: application/json" \
-d $'{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"businessFunction.keyword": [
"PriceFeedIntegration2",
"PriceFeedIntegration"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}' | jq
Why isn't recommended as default option?
"The default dynamic string mappings will index string fields both as
text and keyword. This is wasteful if you only need one of them."
https://www.elastic.co/guide/en/elasticsearch/reference/current/tune-for-disk-usage.html
Option 2
Run on my000001 index
curl -X "GET" "localhost:9200/my000001/_analyze" \
-H "Content-type: application/json" \
-d $'{
"field": "businessFunction",
"text": "PriceFeedIntegration"
}'
You can see, that your field value was analyzed(tokenized, lowercase, and others modifications depending of the analyzer and the value provided)
Results:
{
"tokens": [
{
"token": "pricefeedintegration",
"start_offset": 0,
"end_offset": 20,
"type": "<ALPHANUM>",
"position": 0
}
]
}
That is the reason why your search doesn't return results.
"PriceFeedIntegration" doesn't match with "pricefeedintegration"
"The problem isn’t with the term query; it is with the way the data
has been indexed."
Your businessFunction field value was analyzed.
If you require find(search/filter) by exact values, maybe you need to change your "businessFunction" field mapping to not_analyzed.
Change your mapping require delete your index and create again providing the required mapping.
If you try to change the mapping of an existing index you will get an "resource_already_exists_exception" error.
Here is the background that you need to know in order to solve your problem:
https://www.elastic.co/guide/en/elasticsearch/guide/master/_finding_exact_values.html#_finding_exact_values
Create a Mapping on a new my000005 index
curl -X "PUT" "localhost:9200/my000005" \
-H "Content-type: application/json" \
-d $'{
"mappings" : {
"properties" : {
"businessFunction" : {
"type" : "keyword"
},
"errorDescription" : {
"type" : "text"
},
"errorCode" : {
"type" : "keyword"
},
"createdDate" : {
"type" : "date"
},
"storeId": {
"type" : "keyword"
}
}
}
}'
Indexing data
curl -X "POST" "http://localhost:9200/my000005/_doc" \
-H "Content-type: application/json" \
-d $'
{
"errorCode": "e015",
"errorDescription": "Description e015",
"storeId": "71102",
"businessFunction": "PriceFeedIntegration",
"createdDate": "2021-02-20T09:17:04.004"
}'
Get the results, that you expect using terms businessFunction
curl -X "GET" "localhost:9200/my000005/_search" \
-H "Content-type: application/json" \
-d $'{
"query": {
"bool": {
"must": [
{
"terms": {
"errorCode": [
"e015",
"e020",
"e022"
]
}
},
{
"terms": {
"storeId": [
"71102",
"71103"
]
}
},
{
"terms": {
"businessFunction": [
"PriceFeedIntegration2",
"PriceFeedIntegration"
]
}
},
{
"range": {
"createdDate": {
"gte": "2021-02-16T09:17:04.000",
"lte": "2021-02-22T00:00:00.005"
}
}
}
]
}
}
}' | jq
This answer is based on what I think is your mapping and your needs.
In the future share your mapping and your ES version, in order to get a better answer from the community.
curl -X "GET" "localhost:9200/yourindex/_mappings"
Please read this https://www.elastic.co/guide/en/elasticsearch/guide/master/_finding_exact_values.html#_finding_exact_values
and this https://www.elastic.co/blog/strings-are-dead-long-live-strings

Related

Understanding Elasticsearch aggregations

My scenario is the following:
I have people, who can have regular or one-time income. I would like to sum the regular income of every people, who are not deleted and was born within a date range. The query part just works well, but when I start to put together the aggregation part of the Elastic query, I got the wrong figures and can't understand, what do I do wrong.
This is how I've created the mapping for my data type:
curl -X PUT -i http://localhost:9200/people --data '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"person" : {
"properties" : {
"birthDate" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"company" : {
"type" : "string"
},
"deleted" : {
"type" : "boolean"
},
"income" : {
"type": "nested",
"properties" : {
"income_type" : {
"type" : "string"
},
"value" : {
"type" : "double"
}
}
},
"name" : {
"type" : "string"
}
}
}
}
}
}'
This is the data:
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200
/people/person/1 --data '{
"deleted":false,
"birthDate":"1980-10-10",
"name":"John Smith",
"company": "IBM",
"income": [{"income_type":"regular","value":55.5}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/2 --data '{
"deleted":true,
"birthDate":"1960-10-10",
"name":"Mary Legend",
"company": "ASUS",
"income": [{"income_type":"one-time","value":10},{"income_type":"regular","value":55}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/3 --data '{
"deleted":false,
"birthDate":"2000-10-10",
"name":"F. King Elastic",
"income": [{"income_type":"one-time","value":1},{"income_type":"regular","value":5}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/4 --data '{
"deleted":false,
"birthDate":"1989-10-10",
"name":"Prison Lesley",
"income": [{"income_type":"regular","value":120.7},{"income_type":"one-time","value":99.3}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/5 --data '{
"deleted":false,
"birthDate":"1983-10-10",
"name":"Prison Lesley JR.",
"income": [{"income_type":"one-time","value":99.3}]
}'
curl -X PUT -H 'Content-Type: application/json' -i http://localhost:9200/people/person/6 --data '{
"deleted":true,
"birthDate":"1986-10-10",
"name":"Hono Lulu",
"income": [{"income_type":"regular","value":11.3}]
}'
This is a query, which filters for undeleted people, who have at least one regular income, and was born between the given dates. The below query still works as expected (two persons were fulfilling the criteria):
curl -X POST -H 'Content-Type: application/json' -i 'http://localhost:9200/people/person/_search?pretty=true' --data '{
"size": 100,
"filter": {
"bool": {
"must": [
{
"match": {
"deleted": false
}
},
{
"range": {
"birthDate": {
"gte": "1980-01-01",
"lte": "1990-12-31"
}
}
},
{
"nested": {
"path": "income",
"query": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
}
]
}
}
}
}
]
}
}
}'
But when I add the aggregation section, everything goes wrong, and I do not understand, why :(
curl -X POST -H 'Content-Type: application/json' -i 'http://localhost:9200/people/person/_search?pretty=true' --data '{
"size": 100,
"filter": {
"bool": {
"must": [
{
"match": {
"deleted": false
}
},
{
"range": {
"birthDate": {
"gte": "1980-01-01",
"lte": "1990-12-31"
}
}
},
{
"nested": {
"path": "income",
"query": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
}
]
}
}
}
}
]
}
},
"aggs": {
"incomes": {
"nested": {
"path": "income"
},
"aggs": {
"income_type": {
"filter": {
"bool": {
"must": [
{
"match": {
"income.income_type": "regular"
}
},
{
"match": {
"deleted": false
}
}
]
}
},
"aggs": {
"totalIncome": {
"sum": {
"field": "income.value"
}
}
}
}
}
}
}
}'
The result is this:
...
"aggregations": {
"incomes": {
"doc_count": 9,
"income_type": {
"doc_count": 0,
"totalIncome": {
"value": 0.0
}
}
}
}
}
I was expecting the doc_count to be 2, and the totalIncome should be 176.2 (120.7 + 55.5)
Does anyone have an idea, what do I do wrong?
Great start! You don't need the filter on the deleted field in your aggregation since your query is already filtering out all deleted documents. Try this:
"aggs": {
"incomes": {
"nested": {
"path": "income"
},
"aggs": {
"income_type": {
"filter": {
"match": {
"income.income_type": "regular"
}
},
"aggs": {
"totalIncome": {
"sum": {
"field": "income.value"
}
}
}
}
}
}
}

Difference of two query results in Elasticsearch

Let's say we've indexes of e-commerce store data, and we want to get the difference of list of products which are present in 2 stores.
Information on the index content: A sample data stored in each document looks like below:
{
"product_name": "sample 1",
"store_slug": "store 1",
"sales_count": 42,
"date": "2018-04-04"
}
Below are queries which gets me all products present in 2 stores individually,
Data for store 1
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": ["product_name"],
"query": {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "store_slug" : "store_1"}}]}}}}}'
Data for store 2
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d'
{
"_source": ["product_name"],
"query": {
"constant_score" : {
"filter" : {
"bool" : {
"must" : [
{ "term" : { "store_slug" : "store_2"}}]}}}}}'
Is it possible with elasticsearch query to get the difference of both result(without doing using some script/ other languages)?
E.g. of above operation: Let's say "store 1" is selling products ["product 1", "product 2"] and "store 2" is selling products ["product 1", "product 3"], So expected output of difference of products of "store 1" and "store 2" is "product 2".
Why not doing it in a single query?
Products that are in store 1 but not in store 2:
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
"_source": [
"product_name"
],
"query": {
"constant_score": {
"filter": {
"bool": {
"filter": [
{
"term": {
"store_slug": "store_1"
}
}
],
"must_not": [
{
"term": {
"store_slug": "store_2"
}
}
]
}
}
}
}
}'
You can easily do the opposite, too.
UPDATE
After reading your updates, I think the best way to solve this is using terms aggregations, first by product and then by store and only select the products for which there is only a single store bucket (using a pipeline aggregation)
curl -XGET 'localhost:9200/store/_search?pretty' -H 'Content-Type: application/json' -d '{
{
"size": 0,
"aggs": {
"products": {
"terms": {
"field": "product_name"
},
"aggs": {
"stores": {
"terms": {
"field": "store_slug"
}
},
"min_bucket_selector": {
"bucket_selector": {
"buckets_path": {
"count": "stores._bucket_count"
},
"script": {
"source": "params.count == 1"
}
}
}
}
}
}
}'

Search in two different types with different mappings in Elasticsearch

Having the following mapping of the index tester with two types items and items_two:
curl -XPUT 'localhost:9200/tester?pretty=true' -d '{
"mappings": {
"items": {
"properties" : {
"body" : { "type": "string" }
}},
"items_two": {
"properties" : {
"body" : { "type": "string" },
"publised" : { "type": "integer"}
}}}}'
I put three elements on it.
curl -XPUT 'localhost:9200/tester/items/1?pretty=true' -d '{
"body" : "Hey there im reading a book"
}'
curl -XPUT 'localhost:9200/tester/items_two/1?pretty=true' -d '{
"body" : "I love the new book of my brother",
"publised" : 0
}'
curl -XPUT 'localhost:9200/tester/items_two/2?pretty=true' -d '{
"body" : "Stephen kings book is very nice",
"publised" : 1
}'
I need to make a query that matches the word book and has published = 1 AND the ones that has not published on the mapping, but has book on it (as the only item of items).
With the following query I only get match with the "Stephen kings book is very nice" item (obviously).
curl -XGET 'localhost:9200/tester/_search?pretty=true' -d '{
"query": {
"bool": {
"must": [
{
"match": { "body": "book" }
},
{
"match": { "publised": "1" }
}]
}}}'
My desired output if I search for the string book should match item #1 from the type items ("Hey there im reading a book") and item #2 from the type items_two ("Stephen kings book is very nice").
I don't want to change the mapping or anything else, I need to archieve this via one query, so how can I build my query?
Thanks in advance.
You can use the _type field for these kind of searches. Try the following query
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"body": "text"
}
},
{
"match": {
"publised": "1"
}
}
],
"filter": {
"term": {
"_type": "items_two"
}
}
}
},
{
"bool": {
"must": [
{
"match": {
"body": "text"
}
}
],
"filter": {
"term": {
"_type": "items"
}
}
}
}
]
}
}
}

Elasticsearch multi-select faceting

Lets imagine we have the following documents:
curl -XPUT 'http://localhost:9200/multiselect/demo/1' -d '{
"title": "One",
"tags": ["tag1"],
"keywords": ["keyword1"]
}'
curl -XPUT 'http://localhost:9200/multiselect/demo/2' -d '{
"title": "Two",
"tags": ["tag2"],
"keywords": ["keyword2"]
}'
If we do the query:
curl -XGET '
{
"post_filter": {
"and": [
{
"terms": {
"tags": [
"tag1",
"tag2"
]
}
},
{
"terms": {
"keywords": [
"keyword1"
]
}
}
]
},
"aggs": {
"tagFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "tags",
"size": 0
}
}
},
"filter": {
"terms": {
"keywords": [
"keyword1"
]
}
}
},
"keywordFacet": {
"aggs": {
"aggs": {
"terms": {
"field": "keywords",
"size": 0
}
}
},
"filter": {
"terms": {
"tags": [
"tag1",
"tag2"
]
}
}
}
}
}
'
We will have a document "One" and a list of facets: tag1 - 1, keyword1 - 1, keyword2 - 0 and tag2 - 1, but actually the last one tag2 should not be there, because we don't have anything for the keyword2 in our filter (and the facets).
The question is, is there are any possibility to get facets without tag2, and not to make 2 requests.
Let me know, if you need a better explanation, but I guess the basic idea should be clear.
PS. Some better explanation of the following pattern you can find out here: https://gist.github.com/mattweber/1947215; it's the same thing, and it have the same issue described here.

Sort an elasicsearch resultset based on a filter term

For an ecommerce I am implementing elasticsearch in order to get a sorted and paginated resultset of product ids for a category.
I have a product document which looks like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"sort": 102,
"categories": [
"28554568",
"28554577",
"28554578"
],
}
To get the resultset I filter and sort like this:
POST /products/_search
{
"filter": {
"term": {
"categories": "28554666"
}
},
"sort" : [
{ "sort" : {"order" : "asc"}}
]
}
However, how I now learned the requirement is, that the product sorting depends on the category. Looking at the example above this means that I need to add a different sort value for each value in the categories array and depending on the category that I filter by I want to sort by the corresponding sort value.
The document should look something like this:
PUT /products_test/product/1
{
"id": "1",
"title": "foobar",
"categories": [
{ "id": "28554568", "sort": "102" },
{ "id": "28554577", "sort": "482" },
{ "id": "28554578", "sort": "2" }
]
}
My query now should be able to sort something like this:
POST /products/_search
{
"filter": {
"term": {
"categories.id": "28554666"
}
},
"sort" : [
{ "categories.{filtered_category_id}.sort" : {"order" : "asc"}}
]
}
Is it somehow possible to accomplish this?
To achieve this, you will have to store your categories as nested documents. If not, Elasticsearch will not know what sort is associated with what category ID.
Then, you will have to sort on the nested documents, by also filtering to choose the right one.
Here's a runnable example you can play with: https://www.found.no/play/gist/47282a07414e1432de6d
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"categories": {
"type": "nested"
}
}
}
}
}'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"id":1,"title":"foobar","categories":[{"id":"28554568","sort":102},{"id":"28554577","sort":482},{"id":"28554578","sort":2}]}
{"index":{"_index":"play","_type":"type"}}
{"id":2,"title":"barbaz","categories":[{"id":"28554577","sort":0}]}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"nested": {
"path": "categories",
"query": {
"term": {
"categories.id": {
"value": 28554577
}
}
}
}
},
"sort": {
"categories.sort": {
"order": "asc",
"nested_filter": {
"term": {
"categories.id": 28554577
}
}
}
}
}
'

Resources