Can one document come into two buckets? - elasticsearch

In elastic search, I have list of documents. And each document contain field type(possible value for type is 1,2,3,4,5). Now I want to create two bucket
one contain document with type field value as 1 and
contain all the document(including type 1).
Is it possible in elastic search? If yes then how?
I search on internet but I did not find anything that is helpful.
Following is document structure:-
"_source": { "city": "Ahmadabad",
"pId": "A1332605",
"sellerType": 1,
"seller": "Dealer",
"makeId": 7,
"makeName": "ABC",
"modelId": 673,
"type": 1
},
"_source": { "city": "Surat",
"pId": "A265843",
"sellerType": 1,
"seller": "Dealer",
"makeId": 45,
"makeName": "XYZ",
"modelId": 520,
"type": 2
}

I copied this request from a visualization that Kibana made, it should work just the same. I picked one of your integer fields, change it if you need something else.
{
"query": {
// your query
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"filters": {
"filters": {
"filter_for_specific": {
"query_string": {
"query": "sellerType: 1",
"analyze_wildcard": true
}
},
"filter_for_existing": {
"query_string": {
"query": "sellerType: *",
"analyze_wildcard": true
}
}
}
}
}
}
}

Related

Is there a way to build an Elastic query with changing search values?

I want to use Elastic in PHP to process a search request from my website. For example, I have the search parameter
name
age
height
weight.
But it should not be necessary to always search for all parameters.
So it could be that only (name AND age) have values and (height AND weight) have not.
Is there a way to build one query with flexible/changing input values?
The query below would not work when there are no search values for (height AND weight).
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
]
}
}
}
Search templates to the rescue:
POST _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": """
{
"query": {
"bool": {
"should": [
{{#name}}
{ "match": { "name.keyword": "{{name}}" } },
{{/name}}
{{#age}}
{ "match": { "age": "{{age}}" } },
{{/age}}
{{#height}}
{ "match": { "height": "{{height}}" } },
{{/height}}
{{#weight}}
{ "match": { "weight": "{{weight}}" } },
{{/weight}}
{ "match_none": { } }
]
}
}
}
"""
}
}
Note that since you don't know how many criteria you have, the last condition is always false and is only there to make sure the JSON is valid (i.e. the last comma doesn't stay dangling)
You can then run your query like this:
POST my-index/_search/template
{
"id": "my-search-template",
"params": {
"name": "Anna",
"age": 30
}
}
You need to handle in your application that constructs your Elasticsearch query and its very easy to do it in the application as you know what all search parameter value you got from UI, if they are not null than only includes those fields in your Elasticsearch query.
Elasticsearch doesn't support if...else like condition in query.
Tldr;
They are multiple way to address your problem in Elasticsearch.
You could be playing with the parameter minimum_should_match
You could be using template queries with conditions.
You could also perform more complex bool queries, that enumerate the possibilities for a match.
You could also use scripts to program the logic you want to see.
Minimum should match
POST /_bulk
{"index":{"_index":"73121817"}}
{"name": "ana", "age": 1, "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "jack", "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "emma", "age": 1, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "william", "age": 1, "height": 180}
{"index":{"_index":"73121817"}}
{"name": "jenny", "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "marco", "age": 1}
{"index":{"_index":"73121817"}}
{"name": "giulia", "height": 180}
{"index":{"_index":"73121817"}}
{"name": "paul"}
GET 73121817/_search
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
],
"minimum_should_match": 2
}
}
}
with the minimum should match set to 2 only 2 documents are returned ana and jack
Template queries
Well Val's answer is quite complete
You could also refer to the doc
Complex queries
Refer to the so post behind the link
Scripted queries
GET 73121817/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": """
return (!doc["name.keyword"].empty && !doc["age"].empty);
"""
}
}
}
}
}

How does "must" clause with an array of "match" clauses really mean?

I have an elasticsearch query which looks like this...
"query": {
"bool": {
"must": [{
"match": {"attrs.name": "username"}
}, {
"match": {"attrs.value": "johndoe"}
}]
}
}
... and documents in the index that look like this:
{
"key": "value",
"attrs": [{
"name": "username",
"value": "jimihendrix"
}, {
"name": "age",
"value": 23
}, {
"name": "alias",
"value": "johndoe"
}]
}
Which of the following does this query really mean?
Document should contain either attrs.name = username OR attrs.value = johndoe
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, even if they may match different elements in the attrs array (this would mean that the document given above would match the query)
Or, document should contain, both, attrs.name = username AND attrs.value = johndoe, but they must match the same element in the attrs array (which would mean that the document given above would not match the query)
Further, how do I write a query to express #3 from the list above, i.e. the document should match only if a single element inside the attrs array matches both the following conditions:
attrs.name = username
attrs.value = johndoe
Must stands for "And" so a document satisfying all the clauses in match query is returned.
Must will not satisfy point 1. Document should contain either attrs.name = username OR attrs.value = johndoe- you need a should clause which works like "OR"
Whether Must will satisfy Point 2 or point 3 depends on the type of "attrs" field.
If "attr" field type is object then fields are flattened that is no relationship maintained between different fields for array. So must query will return a document if any attrs.name="username" and attrs.value="John doe", even if they are not part of same object in that array.
If you want an object in an array to act like a separate document, you need to use nested field and use nested query to match documents
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {}, --> returns matched nested documents
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
hits in the response will contain all nested documents , to get all matched nested documents , inner_hits has to be specified
Based on your requirements you need to define your attrs field as nested, please refer nested type in Elasticsearch for more information. Disclaimer : it maintains the relationship but costly to query.
Answer to your other two questions also depends on what data type you are using please refer nested vs object data type for more details
Edit: solution using sample mapping, example docs and expected result
Index mapping using nested type
{
"mappings": {
"properties": {
"attrs": {
"type": "nested"
}
}
}
}
Index 2 sample doc one which severs the criteria and other which doesn't
{
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
}
Another which serves criteria
{
"attrs": [
{
"name": "username",
"value": "jimihendrix"
},
{
"name": "age",
"value": 23
},
{
"name": "alias",
"value": "johndoe"
}
]
}
And search query
{
"query": {
"nested": {
"path": "attrs",
"inner_hits": {},
"query": {
"bool": {
"must": [
{
"match": {
"attrs.name": "username"
}
},
{
"match": {
"attrs.value": "johndoe"
}
}
]
}
}
}
}
}
And Search result
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_score": 1.7509375,
"_source": {
"attrs": [
{
"name": "username",
"value": "johndoe"
},
{
"name": "alias",
"value": "myname"
}
]
},
"inner_hits": {
"attrs": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.7509375,
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "2",
"_nested": {
"field": "attrs",
"offset": 0
},
"_score": 1.7509375,
"_source": {
"name": "username",
"value": "johndoe"
}
}
]
}
}
}
}
]

Sort keyword field array within ElasticSearch document by relevance

I've got an ElasticSearch index that looks something like this:
{
"mappings": {
"article": {
"properties": {
"title": { "type": "string" },
"tags": {
"type": "keyword"
},
}
}
}
And data that looks something like this:
{ "title": "Something about Dogs", "tags": ["articles", "dogs"] },
{ "title": "Something about Cats", "tags": ["articles", "cats"] },
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
If I search for dog, I get the first and third documents, as I'd expect. And I can weight the search documents the way I like (in reality, I'm using a function_score query to weight on a bunch of fields irrelevant to this question).
What I'd like to do is sort the tags field so that the most relevant tags are returned first, without affecting the sort order of the documents themselves. So I'm hoping for a result like this:
{ "title": "Something about Dog Food", "tags": ["dogs", "dogfood", "articles"] }
Instead of what I get now:
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
The documentation on sort and function score don't cover my case. Any help appreciated. Thanks!
You cannot sort the _source (your array of tags) of the documents given its "matching" capability. One way of doing this is by using nested fields and inner_hits that allows you to sort the matching nested fields.
My suggestion is to transform your tags in a nested field (I chose keyword there just by simplicity, but you can also have text and the analyzer of your choice):
PUT test
{
"mappings": {
"article": {
"properties": {
"title": {
"type": "string"
},
"tags": {
"type": "nested",
"properties": {
"value": {
"type": "keyword"
}
}
}
}
}
}
}
And use this kind of query:
GET test/_search
{
"_source": {
"exclude": "tags"
},
"query": {
"bool": {
"must": [
{
"match": {
"title": "dogs"
}
},
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"match": {
"tags.value": "dogs"
}
}
]
}
},
"inner_hits": {
"sort": {
"_score": "desc"
}
}
}
}
]
}
}
}
Where you try to match on the tags nested field value for the same text you try to match on title. Then, using inner_hits sorting, you can actually sort the nested values based on their inner scoring.
#Val's suggestion is very good, but is good as long as for your "relevant tags" you are ok with just a simple text matching as a substring (i1.indexOf(params.search)). His solution's biggest advantage is that you don't have to change the mapping.
My solution's big advantage is that you are actually using Elasticsearch true search capabilities to determine the "relevant" tags. But the drawback is that you need nested field instead of the regular simple keyword.
What you get from a search call are the source documents. The documents in the response are returned in exactly the same form as when you indexed them, which means that if you indexed ["articles", "dogs", "dogfood"], you'll always get that array in that unaltered form.
One way to get around this is to declare a script_field that applies a small script to sort your array and return the result of that sort.
What the script does is simply move the terms that contain the search term in the front of the list
{
"_source": ["title"],
"query" : {
"match_all": {}
},
"script_fields" : {
"sorted_tags" : {
"script" : {
"lang": "painless",
"source": "return params._source.tags.stream().sorted((i1, i2) -> i1.indexOf(params.search) > -1 ? -1 : 1).collect(Collectors.toList())",
"params" : {
"search": "dog"
}
}
}
}
}
This will return something like this, as you can see the sorted_tags array contains the terms as you expect.
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "Something about Dog Food"
},
"fields": {
"sorted_tags": [
"dogfood",
"dogs",
"articles"
]
}
}
]
}
}

Item variants in ElasticSearch

What is the best way to use item variants in elasticsearch and retrieving only 1 item of the variant group?
For example, let's say I have the following items:
[{
"sku": "abc-123",
"group": "abc",
"color": "red",
"price": 10
},
{
"sku": "def-123",
"group": "def",
"color": "red",
"price": 10
},
{
"sku": "abc-456",
"group": "abc",
"color": "black",
"price": 20
}
]
The first item and the last one are in the same group, so I want only to return one of them if I query for items below the price of 20 (for example), but with the best hit score.
Feel free to suggest documents design and queries accordingly.
If your mapping is of Nested datatype, then you can use this to retrieve them.
GET index/type/_search
{
"size": 2000,
"_source": false,
"query": {
"bool": {
"filter": {
"nested": {
"path": "childs",
"query": {
"bool": {
"filter": {
"term": {
"childs.group.keyword": "abc"
}
}
}
},
"inner_hits": {}
}
}
}
}
}

Return results based on sequence of criterias in query Elasticsearch

I would like the results to be in the same sequence as the query. To elaborate more about this, consider the following example. :
I have documents with ID field, which look somewhat like this :
{
{
"ID": 102,
"Name": "Mark"
},
{
"ID": 104,
"Name": "Pete"
},
{
"ID": 101,
"Name": "Su"
},
{
"ID": 107,
"Name": "Kate"
},
{
"ID": 106,
"Name": "Roger"
}
}
And my query is:
{
"query": {
"bool": {
"should": [
{
"match":
{
"ID": "101"
}
},
{
"match":
{
"ID": "104"
}
},
{
"match":
{
"ID": "107"
}
},
{
"match":
{
"ID": "102"
}
},
{
"match":
{
"ID": "106"
}
}
]
}
}
}
Now I'm expecting the results to be in the same order as the search criteria. i.e :
{
"ID": 101,
"Name": "Su"
},
{
"ID": 104,
"Name": "Pete"
},
{
"ID": 107,
"Name": "Kate"
},
{
"ID": 102,
"Name": "Mark"
},
{
"ID": 106,
"Name": "Roger"
}
Been trying to figure out a way to do this. Is it even possible?
Any help appreciated.
i don't think you can achieve this kind of scoring mechanism just by using query or filters.
how is the source of those id's coming into the code when your query builder builds the query.Are they coming in the same order as the order you want to sort the document.
If yes, then i would suggest to nourish your query with function score or script based scoring logic. you can use groovy sandbox or mvel.
{
"query": {"filtered": {
"query": {"function_score": {
"query": {"match_all": {}},
"functions": [
{"script_score": {
"params": {
"id_list":[103,105,109,101]
},
"script": " id_list.length -id_list.indexOf(doc['id'].value)"
}}
]
}}
}}
}
Scripting in elasticsearch.
Function score in elastic
Think about it like feeding a array structure with id contained in the same order you want to return the result as params for script score inside the function score.
After comparing the id of the document from the doc fields assign a score of the Array.size - index to each document.
I think this will pretty much solve your problem.it is not a concrete solution but a work around
Note - carefull my cluster is configured to allow inline script execution and the default script language is configured to javascript. if you follow this approach you may have to look at your cluster configurations as well

Resources