Related
Currently im using ES 7.10
Creating a index and mapping
PUT /rathan_index
PUT /rathan_index/_mappings
{
"properties" : {
"Term__kt" : {
"type" : "nested",
"include_in_root" : true,
"properties" : {
"Fee_Amount" : {
"type" : "text",
"fields" : {
"raw" : {
"type" : "double"
}
}
}
}
},
"type" : {
"type" : "keyword"
}
}
}
Insert some data
POST /rathan_index/_doc/contracts-74900
{
"type": "contracts",
"Term__kt" : [
{
"Fee_Amount" : 105.75
}
]
}
POST /rathan_index/_doc/contracts-74901
{
"type": "contracts",
"Term__kt" : [
{
"Fee_Amount" : 105.75
}
]
}
Query for search
GET rathan_index/_search
{
"from": 0,
"size": 1000,
"_source": [
"*__kt.*"
],
"track_total_hits": true,
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"term": {
"type": "contracts"
}
}
]
}
},
"should": [
{
"query_string": {
"query": "0105.75",
"type": "best_fields",
"fields": [
"*__kt.*"
],
"lenient": true,
"default_operator": "AND",
"boost": 3
}
},
{
"query_string": {
"query": "0105.75",
"type": "phrase_prefix",
"fields": [
"*__kt.*"
],
"lenient": true,
"boost": 2
}
},
{
"query_string": {
"query": "0105.75",
"fields": [
"*__kt.*"
],
"lenient": true,
"type": "best_fields"
}
}
],
"minimum_should_match": 1
}
}
}
},
"min_score": 0.000001,
"highlight": {
"fields": {
"*": {
"type": "plain"
}
}
}
}
The result comes with 105.75 but its not getting higlighted.
I'm assuming there is a implicit conversion that's happening which is converting 0105.75 to 105.75 .
But i want it to highlighted regardless. Can i please get a solution for this.
Thanks in advance
{
"took" : 565,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 4.0,
"hits" : [
{
"_index" : "rathan_index",
"_type" : "_doc",
"_id" : "contracts-74900",
"_score" : 4.0,
"_source" : {
"Term__kt" : [
{
"Fee_Amount" : 105.75
}
]
},
"highlight" : {
"type" : [
"<em>contracts</em>"
]
}
},
{
"_index" : "rathan_index",
"_type" : "_doc",
"_id" : "contracts-74901",
"_score" : 4.0,
"_source" : {
"Term__kt" : [
{
"Fee_Amount" : 105.75
}
]
},
"highlight" : {
"type" : [
"<em>contracts</em>"
]
}
}
]
}
}
I have a index with documents like this
[
{
"customer_id" : "123",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-23"
...
},
{
"customer_id" : "123",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-24"
...
},
{
"customer_id" : "345",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-25"
...
}
]
I want to get the list of all documents from specific country e.g USA, between a give time range with at least 2 occurrences of same customer_id.
With the above data, it should return
[
{
"customer_id" : "123",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-24"
...
}
]
Now, I tried the below ES query
POST /index_name/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"creation_date": {
"gte": "2021-06-23",
"lte": "2021-08-23"
}
}
},
{
"match": {
"country": "USA"
}
}
]
}
},
"aggs": {
"customer_agg": {
"terms": {
"field": "customer_id",
"min_doc_count": 2
}
}
}
}
The above query returns following result
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 1.5587491,
"hits" : [...]
]
},
"aggregations" : {
"person_agg" : {
"doc_count_error_upper_bound" : 1,
"sum_other_doc_count" : 1,
"buckets" : [
{
"key" : "customer_id",
"doc_count" : 2
}
]
}
}
I don't need the list of buckets in response, but only the list of documents satisfying the condition. How can I achieve it?
On a first glance I noticed that in the search query you are searching by a field called creation_timestamp but in the mapping of the document you say that the date field you want to range check is called creation_date.
I decided to test this locally on Elasticsearch 7.10 and here are the settings I used
PUT /test-index-v1
PUT /test-index-v1/_mapping
{
"properties": {
"customer_id": {
"type": "keyword"
},
"country": {
"type": "keyword"
},
"department": {
"type": "keyword"
},
"creation-date": {
"type": "date"
}
}
}
As you can see I'm using keyword on the fields so that we can use - sorting, aggregation and etc.
After I created the index I imported the documents you gave as an example
POST /test-index-v1/_doc
{
"customer_id" : "345",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-25"
}
POST /test-index-v1/_doc
{
"customer_id" : "123",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-25"
}
POST /test-index-v1/_doc
{
"customer_id" : "123",
"country": "USA",
"department": "IT",
"creation_date" : "2021-06-24"
}
Then I executed this search query including a must match on the customer_id as well:
POST /test-index-v1/_search
{
"query": {
"bool": {
"must": [
{
"range": {
"creation_date": {
"gte": "2021-06-23",
"lte": "2021-08-23"
}
}
},
{
"match": {
"country": "USA"
}
},
{
"match": {
"customer_id": "123"
}
}
]
}
},
"aggs": {
"customer_agg": {
"terms": {
"field": "customer_id",
"min_doc_count": 2
}
}
}
}
This query will return you the search hits as well. Using only an aggregation the searchHits won't be returned.
Here is the response I received:
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.6035349,
"hits" : [
{
"_index" : "test-index-v1",
"_type" : "_doc",
"_id" : "vbVD9HsBRVWFAvvZTW-l",
"_score" : 1.6035349,
"_source" : {
"customer_id" : "123",
"country" : "USA",
"department" : "IT",
"creation_date" : "2021-06-25"
}
},
{
"_index" : "test-index-v1",
"_type" : "_doc",
"_id" : "vrVD9HsBRVWFAvvZU29q",
"_score" : 1.6035349,
"_source" : {
"customer_id" : "123",
"country" : "USA",
"department" : "IT",
"creation_date" : "2021-06-24"
}
}
]
},
"aggregations" : {
"customer_agg" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123",
"doc_count" : 2
}
]
}
}
}
Hope this helps with your issue. Feel free to leave a comment if you have other questions regarding Elastic! :)
EDIT:
Regarding the grouping by customer_id in a certain date range I used this query:
POST /test-index-v1/_search
{
"aggs": {
"group_by_customer_id": {
"terms": {
"field": "customer_id"
},
"aggs": {
"dates_between": {
"filter": {
"range": {
"creation_date": {
"gte": "2020-06-23",
"lte": "2021-06-24"
}
}
}
}
}
}
}
}
And the response is:
"aggregations" : {
"group_by_customer_id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "123",
"doc_count" : 2,
"dates_between" : {
"doc_count" : 1
}
},
{
"key" : "345",
"doc_count" : 1,
"dates_between" : {
"doc_count" : 0
}
}
]
}
}
My mapping looks like so:
"condition": {
"properties": {
"name": {
"type": "keyword"
},
"value": {
"type": "keyword"
}
}
},
and some data I have looks like:
"condition": [
{
"name": "condition",
"value": "new",
},
{
"name": "condition",
"value": "gently-used",
}
]
How can I write a query that finds all objects within the array that have a new condition?
I have the following but I am getting 0 results back:
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"attribute_condition": "new"
}
}
]
}
}
}
First, you need to map your condition field as a nested type.
"condition": {
"type": "nested",
"properties": {
"name": { "type": "keyword" },
"value": { "type": "keyword" }
}
},
Now you're able to query each element of the condition array independently from each other. Next, you need to use the nested query and request to retrieve the inner hits and output them in the inner_hits object of the query response
{
"query": {
"bool": {
"must": {
"nested": {
"path": "condition",
"query": {
"match": {
"condition.value": "new"
}
},
"inner_hits": {}
}
}
}
}
}
An example response will look like below:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_score" : 0.6931471,
"_source" : {
"condition" : [
{
"name" : "condition",
"value" : "new"
},
{
"name" : "condition",
"value" : "gently-used"
}
]
},
"inner_hits" : { <--- here begins the list of inner hits
"condition" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.6931471,
"hits" : [
{
"_index" : "nested",
"_type" : "_doc",
"_id" : "Xx_LN3gBp5RUqdfAef3B",
"_nested" : {
"field" : "condition",
"offset" : 0
},
"_score" : 0.6931471,
"_source" : {
"name" : "condition",
"value" : "new"
}
}
]
}
}
}
}
]
}
}
Assume I have the following two elements in my elasticsearch index:
{
"name": "bob",
"likes": ["computer", "cat", "water"]
},
{
"name": "alice",
"likes": ["gaming", "gambling"]
}
I would now like to query for elements, that like computer, laptop or cat. (which matches bob, note that it should be an exact string match)
As a result I need the matches, as well as the count of matches, so would like to get the following back (since it found computer and cat, but not laptop or water):
{
"name": "bob",
"likes": ["computer", "cat"],
"likes_count": 2
}
Is there a way to achieve this with a single elasticsearch query? (note that I'm still stuck with ES2.4, but will hopefully soon be able to upgrade).
Ideally I would also like to sort the output by likes_count.
Thank you!
Best way would be to create likes as nested data type
Mapping
PUT index71
{
"mappings": {
"properties": {
"name":{
"type": "text"
},
"likes":{
"type": "nested",
"properties": {
"name":{
"type":"keyword"
}
}
}
}
}
}
Query:
GET index71/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "likes",
"query": {
"bool": {
"must": [
{
"terms": {
"likes.name": [
"computer",
"cat",
"laptop"
]
}
}
]
}
},
"inner_hits": {} ---> It will return matched elements in nested type
}
}
]
}
},
"aggs": {
"likes": {
"nested": {
"path": "likes"
},
"aggs": {
"matcheLikes": {
"filter": {
"bool": {
"must": [
{
"terms": {
"likes.name": [
"computer",
"cat",
"laptop"
]
}
}
]
}
},
"aggs": {
"likeCount": {
"value_count": {
"field": "likes.name"
}
}
}
}
}
}
}
}
Result:
"hits" : [
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_score" : 1.0,
"_source" : {
"name" : "bob",
"likes" : [
{
"name" : "computer"
},
{
"name" : "cat"
},
{
"name" : "water"
}
]
},
"inner_hits" : {
"likes" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_nested" : {
"field" : "likes",
"offset" : 0
},
"_score" : 1.0,
"_source" : {
"name" : "computer"
}
},
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_nested" : {
"field" : "likes",
"offset" : 1
},
"_score" : 1.0,
"_source" : {
"name" : "cat"
}
}
]
}
}
}
}
]
},
"aggregations" : {
"likes" : {
"doc_count" : 3,
"matcheLikes" : {
"doc_count" : 2,
"likeCount" : {
"value" : 2
}
}
}
}
If likes cannot be changed to nested type then scripts need to be used which will impact performance
Mapping
{
"index72" : {
"mappings" : {
"properties" : {
"likes" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}
Query:
{
"script_fields": { ---> It will iterate through likes and get matched values
"matchedElements": {
"script": "def matchedLikes=[];def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {matchedLikes.add(doc['likes.keyword'][i])}} return matchedLikes;"
}
},
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"terms": {
"likes": [
"computer",
"laptop",
"cat"
]
}
}
]
}
}
}
},
"aggs": {
"Name": {
"terms": {
"field": "name.keyword",
"size": 10
},
"aggs": {
"Count": {
"scripted_metric": { --> get count of matched values
"init_script": "state.matchedLikes=[]",
"map_script": " def list_to_check = ['computer', 'laptop', 'cat']; def do_not_return = true; for(int i=0;i<doc['likes.keyword'].length;i++){ if(list_to_check.contains(doc['likes.keyword'][i])) {state.matchedLikes.add(doc['likes.keyword'][i]);}}",
"combine_script": "int count = 0; for (int i=0;i<state.matchedLikes.length;i++) { count += 1 } return count;",
"reduce_script": "int count = 0; for (a in states) { count += a } return count"
}
}
}
}
}
}
Result:
"hits" : [
{
"_index" : "index72",
"_type" : "_doc",
"_id" : "wtqso3ABH6obcmRR0hSV",
"_score" : 0.0,
"fields" : {
"matchedElements" : [
"cat",
"computer"
]
}
}
]
},
"aggregations" : {
"Name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "bob",
"doc_count" : 1,
"Count" : {
"value" : 2
}
}
]
}
}
EDIT 1
To give higher score to more matches change terms query to should clause. Each term in should clause will contribute towards score
GET index71/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "likes",
"query": {
"bool": {
"should": [
{
"term": {
"likes.name": "computer"
}
},
{
"term": {
"likes.name": "cat"
}
},
{
"term": {
"likes.name": "laptop"
}
}
]
}
},
"inner_hits": {}
}
}
]
}
},
"aggs": {
"likes": {
"nested": {
"path": "likes"
},
"aggs": {
"matcheLikes": {
"filter": {
"bool": {
"must": [
{
"terms": {
"likes.name": [
"computer",
"cat",
"laptop"
]
}
}
]
}
},
"aggs": {
"likeCount": {
"value_count": {
"field": "likes.name"
}
}
}
}
}
}
}
}
Result
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.5363467,
"hits" : [
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_score" : 1.5363467,
"_source" : {
"name" : "bob",
"likes" : [
{
"name" : "computer"
},
{
"name" : "cat"
},
{
"name" : "water"
}
]
},
"inner_hits" : {
"likes" : {
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.7917595,
"hits" : [
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_nested" : {
"field" : "likes",
"offset" : 1
},
"_score" : 1.7917595,
"_source" : {
"name" : "cat"
}
},
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "u9qTo3ABH6obcmRRRhSA",
"_nested" : {
"field" : "likes",
"offset" : 0
},
"_score" : 1.2809337,
"_source" : {
"name" : "computer"
}
}
]
}
}
}
},
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "pr-lqHABcSMy6UhGAWtW",
"_score" : 1.2809337,
"_source" : {
"name" : "bob",
"likes" : [
{
"name" : "computer"
},
{
"name" : "gaming"
},
{
"name" : "gambling"
}
]
},
"inner_hits" : {
"likes" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2809337,
"hits" : [
{
"_index" : "index71",
"_type" : "_doc",
"_id" : "pr-lqHABcSMy6UhGAWtW",
"_nested" : {
"field" : "likes",
"offset" : 0
},
"_score" : 1.2809337,
"_source" : {
"name" : "computer"
}
}
]
}
}
}
}
]
},
"aggregations" : {
"likes" : {
"doc_count" : 6,
"matcheLikes" : {
"doc_count" : 3,
"likeCount" : {
"value" : 3
}
}
}
}
Our Elastic Mapping
{"mappings": {
"products" : {
"properties":{
"name " : {
"type" : "keyword"
},
"resellers" : {
"type" : "nested",
"properties" : {
"name" : { "type" : "text" },
"price" : { "type" : "double" }
}
}
}
}
}}
In this mapping each product stores the list of resellers which are selling it at specific price.
We have requirement where we want to get count of products sell by specific resellers at specific price, I am able to get it for single reseller by using reverse nested agg and cardinality agg using following query DSL
. For ex:- Getting Total Product sell by Amazon at 2.
{
"query": {
"bool": {
"must": [
{
"match_all": {
"boost": 1.0
}
}
]
}
},
"aggs": {
"patchFilter": {
"nested": {
"path": "resellers"
},
"aggs": {
"nestedfilter": {
"filter": {
"bool": {
"must":[
{
"term" :{
"resellers.name.keyword": {
"value": "Amazon"
}
}
},{
"terms" :{
"resellers.price":[2]
}
}
]
}
},
"aggs": {
"resellerprice": {
"reverse_nested" :{},
"aggs": {
"resellers_price":{
"cardinality" : {
"field" : "name.keyword"
}
}
}
}
}
}
}
}
}
}
I want to fetch it for multiple resellers(Amazon,Flipkart, Walmart) which are selling at 2 in single query. Can somebody help me out in doing that?
Mapping:
PUT productreseller
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
},
"resellers": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields":{
"keyword":{
"type": "keyword"
}
}
},
"price": {
"type": "double"
}
}
}
}
}
}
Data:
[
{
"_index" : "productreseller",
"_type" : "_doc",
"_id" : "JNbCLm0B00idyGV0Pn1Z",
"_score" : 1.0,
"_source" : {
"name" : "P2",
"resellers" : [
{
"name" : "amazon",
"price" : 3
},
{
"name" : "abc",
"price" : 2
}
]
}
},
{
"_index" : "productreseller",
"_type" : "_doc",
"_id" : "JdbCLm0B00idyGV0Wn0y",
"_score" : 1.0,
"_source" : {
"name" : "P1",
"resellers" : [
{
"name" : "amazon",
"price" : 2
},
{
"name" : "abc",
"price" : 3
}
]
}
},
{
"_index" : "productreseller",
"_type" : "_doc",
"_id" : "JtbPLm0B00idyGV0D32Y",
"_score" : 1.0,
"_source" : {
"name" : "P4",
"resellers" : [
{
"name" : "xyz",
"price" : 2
},
{
"name" : "abc",
"price" : 3
}
]
}
}
]
Query:
GET productreseller/_search
{
"query": {
"bool": {
"must": [
{
"match_all": {
"boost": 1
}
}
]
}
},
"aggs": {
"patchFilter": {
"nested": {
"path": "resellers"
},
"aggs": {
"nestedfilter": {
"filter": {
"bool": {
"must": [
{
"terms": {
"resellers.price": [
2
]
}
}
]
}
},
"aggs": {
"NAME": {
"terms": {
--->terms aggregation to list resellers and reverse_nested as subaggregation
"field": "resellers.name.keyword",
"size": 10
},
"aggs": {
"resellerprice": {
"reverse_nested": {},
"aggs": {
"resellers_price": {
"cardinality": {
"field": "name"
}
}
}
}
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"patchFilter" : {
"doc_count" : 8,
"nestedfilter" : {
"doc_count" : 3,
"NAME" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "abc",
"doc_count" : 1,
"resellerprice" : {
"doc_count" : 1,
"resellers_price" : {
"value" : 1
}
}
},
{
"key" : "amazon",
"doc_count" : 1,
"resellerprice" : {
"doc_count" : 1,
"resellers_price" : {
"value" : 1
}
}
},
{
"key" : "xyz",
"doc_count" : 1,
"resellerprice" : {
"doc_count" : 1,
"resellers_price" : {
"value" : 1
}
}
}
]
}
}
}
}
If you want to display only certain resellers, you can add terms query in nested filter