Elasticsearch - Filter on string array and then aggregate on relevant keywords only - elasticsearch

I have an index with an attribute containing a list of keywords.
Let's say my documents look like this :
{
"product_name": "Iphone",
"keywords" : ["Best seller", "Apple", "Black", "Awesome"]
}
{
"product_name": "Galaxy S21",
"keywords" : ["Awesome", "Android"]
}
I want to enable my users to do get autocompletions on the keywords (like suggestions) but I also want to make aggregations on the suggestions to let them know how many documents match each one.
So if a user types "A", we should return 3 results :
{"expression": "Android", "count": 1}
{"expression": "Apple", "count": 1}
{"expression": "Awesome", "count": 2}
"Best seller" / "Black" should not be returned as results by Elasticsearch.
There's no mapping constraint.
I've tried queries like the one below but unexpected keywords are returned in the aggregations :
{
"query": {
"multi_match": {
"query": "a",
"fields": ["keywords"],
"type": "bool_prefix"
}
},
"size": 0,
"aggs": {
"matched_keywords": {
"terms": {
"field": "keywords",
"size": 10
}
}
}
}
Any solution / advice would be helpful.
Thanks.

Related

Elastic Search Filter on the result of terms aggregation

Apply Match phrase prefix query on the result of terms aggregation in Elastic Search.
I have terms query and the result looks something like below
"buckets": [
{
"key": "KEY",
"count": 20
},
{
"key": "LOCK",
"count": 30
}
]
Now the requirement is to filter those buckets whose key starts with a certain prefix, so something similar to match phrase prefix. For example if input to match phrase prefix query is "LOC", then only one bucket should be returned(2nd one). So effectively it's a filter on terms aggregation. Thanks for your thoughts.
You could use the include parameter on your terms aggregation to filter out the values based on regex.
Something like this should work:
GET stackoverflow/_search
{
"_source": false,
"aggs": {
"groups": {
"terms": {
"field": "text.keyword",
"include": "LOC.*"
}
}
}
}
Example: Let's say you have three different documents with three different terms(LOCK, KEY & LOL) in an index. So if you perform the following request:
GET stackoverflow/_search
{
"_source": false,
"aggs": {
"groups": {
"terms": {
"field": "text.keyword",
"include": "L.*"
}
}
}
}
You'll get the following buckets:
"buckets" : [
{
"key" : "LOCK",
"doc_count" : 1
},
{
"key" : "LOL",
"doc_count" : 1
}
]
Hope it is helpful.

Can one document come into two buckets?

In elastic search, I have list of documents. And each document contain field type(possible value for type is 1,2,3,4,5). Now I want to create two bucket
one contain document with type field value as 1 and
contain all the document(including type 1).
Is it possible in elastic search? If yes then how?
I search on internet but I did not find anything that is helpful.
Following is document structure:-
"_source": { "city": "Ahmadabad",
"pId": "A1332605",
"sellerType": 1,
"seller": "Dealer",
"makeId": 7,
"makeName": "ABC",
"modelId": 673,
"type": 1
},
"_source": { "city": "Surat",
"pId": "A265843",
"sellerType": 1,
"seller": "Dealer",
"makeId": 45,
"makeName": "XYZ",
"modelId": 520,
"type": 2
}
I copied this request from a visualization that Kibana made, it should work just the same. I picked one of your integer fields, change it if you need something else.
{
"query": {
// your query
},
"size": 0,
"_source": {
"excludes": []
},
"aggs": {
"2": {
"filters": {
"filters": {
"filter_for_specific": {
"query_string": {
"query": "sellerType: 1",
"analyze_wildcard": true
}
},
"filter_for_existing": {
"query_string": {
"query": "sellerType: *",
"analyze_wildcard": true
}
}
}
}
}
}
}

Sort keyword field array within ElasticSearch document by relevance

I've got an ElasticSearch index that looks something like this:
{
"mappings": {
"article": {
"properties": {
"title": { "type": "string" },
"tags": {
"type": "keyword"
},
}
}
}
And data that looks something like this:
{ "title": "Something about Dogs", "tags": ["articles", "dogs"] },
{ "title": "Something about Cats", "tags": ["articles", "cats"] },
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
If I search for dog, I get the first and third documents, as I'd expect. And I can weight the search documents the way I like (in reality, I'm using a function_score query to weight on a bunch of fields irrelevant to this question).
What I'd like to do is sort the tags field so that the most relevant tags are returned first, without affecting the sort order of the documents themselves. So I'm hoping for a result like this:
{ "title": "Something about Dog Food", "tags": ["dogs", "dogfood", "articles"] }
Instead of what I get now:
{ "title": "Something about Dog Food", "tags": ["articles", "dogs", "dogfood"] }
The documentation on sort and function score don't cover my case. Any help appreciated. Thanks!
You cannot sort the _source (your array of tags) of the documents given its "matching" capability. One way of doing this is by using nested fields and inner_hits that allows you to sort the matching nested fields.
My suggestion is to transform your tags in a nested field (I chose keyword there just by simplicity, but you can also have text and the analyzer of your choice):
PUT test
{
"mappings": {
"article": {
"properties": {
"title": {
"type": "string"
},
"tags": {
"type": "nested",
"properties": {
"value": {
"type": "keyword"
}
}
}
}
}
}
}
And use this kind of query:
GET test/_search
{
"_source": {
"exclude": "tags"
},
"query": {
"bool": {
"must": [
{
"match": {
"title": "dogs"
}
},
{
"nested": {
"path": "tags",
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"match": {
"tags.value": "dogs"
}
}
]
}
},
"inner_hits": {
"sort": {
"_score": "desc"
}
}
}
}
]
}
}
}
Where you try to match on the tags nested field value for the same text you try to match on title. Then, using inner_hits sorting, you can actually sort the nested values based on their inner scoring.
#Val's suggestion is very good, but is good as long as for your "relevant tags" you are ok with just a simple text matching as a substring (i1.indexOf(params.search)). His solution's biggest advantage is that you don't have to change the mapping.
My solution's big advantage is that you are actually using Elasticsearch true search capabilities to determine the "relevant" tags. But the drawback is that you need nested field instead of the regular simple keyword.
What you get from a search call are the source documents. The documents in the response are returned in exactly the same form as when you indexed them, which means that if you indexed ["articles", "dogs", "dogfood"], you'll always get that array in that unaltered form.
One way to get around this is to declare a script_field that applies a small script to sort your array and return the result of that sort.
What the script does is simply move the terms that contain the search term in the front of the list
{
"_source": ["title"],
"query" : {
"match_all": {}
},
"script_fields" : {
"sorted_tags" : {
"script" : {
"lang": "painless",
"source": "return params._source.tags.stream().sorted((i1, i2) -> i1.indexOf(params.search) > -1 ? -1 : 1).collect(Collectors.toList())",
"params" : {
"search": "dog"
}
}
}
}
}
This will return something like this, as you can see the sorted_tags array contains the terms as you expect.
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"title": "Something about Dog Food"
},
"fields": {
"sorted_tags": [
"dogfood",
"dogs",
"articles"
]
}
}
]
}
}

Elasticsearch - bump individual result to the top

I'm working with Elasticsearch. I have an array of documents, and I'm trying to sort documents by the property price, except that I'd like a particular document to be the first result no matter what.
The below is what I'm using as my "sort" array as my attempt to order documents by ID 1213, and then all following documents ordered by price descending.
[
{
"id": {
"mode": "max",
"order": "desc",
"nested_filter": {
"term": {
"id": 1213
}
},
"missing": "_last"
}
},
{
"price": {
"order": "asc"
}
}
]
This doesn't appear to be working, though—document 1213 doesn't appear first. What am I doing wrong here?
As an example—the ideal returned result:
[{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
Instead, I get:
[{"id": 1000, "name": "Green Sunglasses", "price": 2},
{"id": 1031, "name": "Purple Sunglasses", "price: 4},
{"id": 1213, "name": "Blue Sunglasses", "price": 12},
{"id": 5923, "name": "Yellow Sunglasses, "price": 18}]
As others have already asked, what is the reason for the nested_filter?
There's many possible ways to do what you need. Here is one possible way which fits with the simple requirements you mentioned so far:
{
"query" : {
"custom_filters_score" : {
"query" : {
"match_all" : {}
},
"filters" : [
{
"filter" : {
"term" : {
"id" : "1213"
}
},
"boost" : 2
}
]
}
},
"sort" : [
"_score",
"price"
]
}
The assumption here is that your query is simple like the match_all query and does not affect the scores in anyway. If you do have something more complicated for the queries, to not affect the scores, you can try wrapping with a constant_score query. But ideally you get the document set you want where all the documents have the same score and then custom_filters_score query will boost the score of the document you want. You can do this for any number of documents adding further filters or if the documents are equal, use a terms filter. In the end the sort by the score and then the price.
In this case you need to use function_score to modify score of each doc.
{
"query": {
"function_score": {
"functions": [
{
"filter": {
"term": {
"id": "1213"
}
},
"weight": 1
},
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
],
"score_mode": "sum",
"boost_mode" : "replace",
"query" : {
//YOUR QUERY GOES HERE
}
}
}
}
Explanation:
{
"script_score": {
"script": "(1 / doc['price'].value)"
}
}
Compute score based on price and give a value < 1. The higher the price the smaller the score (ascending). If you want to switch to descending then just replace it with
"script": "(1 - (1 / doc['price'].value))"
{
"filter": {
term": {
"id": "1213"
}
},
"weight": 1
}
This will give any docs with "id" = 1213 an extra 1 score. The total score at the end will be the sum of those 2 functions.

Elasticsearch terms query: less tags - more relevance

I've got an index with products. Each product has it's own tags like this
{
"_id": 1,
"tags": ["red", "blue", "green"]
...
},
{
"_id": 2,
"tags": ["red", "blue"]
...
},
{
"_id": 3,
"tags": ["red"]
...
}
How do I create a terms query against the "tags" field that considers the amount of tags the object associated with? I'd like to see the following results for the "red" tag query:
1: _id=3
2: _id=2
3: _id=1
You can use a script based sort
{
"query": {
"match_all": {}
},
"sort": {
"_script": {
"script": "doc.containsKey('tags') == false ? 0 : doc['tags'].values.size()",
"order": "asc",
"type": "string"
}
}
}
Security regarding scripts has changed in the more recent versions of ElasticSearch, for more information take a look at this page:
http://www.elasticsearch.org/blog/scripting-security/

Resources