Elasticsearch 2.x, query for tag, and sort results by tag weigth - elasticsearch

i am using elasticsearch 2.3
i have an index of books. each book has tag, and each tag has weight.
i want to get all books that have the requested tag, sorted by tag weight.
for example:
PUT book/book/0
{
"name": "book 0",
"tags": [
{"t": "comedy", "w": 30},
{"t": "drama","w": 20},
]
}
PUT book/book/1
{
"name": "book 1",
"tags": [
{"t": "comedy", "w": 10},
{"t": "drama","w": 5},
{"t": "other","w": 50},
]
}
PUT book/book/2
{
"name": "book 2",
"tags": [
{"t": "comedy", "w": 5},
{"t": "drama","w": 30},
]
}
PUT book/book/3
{
"name": "book 3",
"tags": [
{"t": "comedy", "w": 5},
{"t": "other","w": 30},
]
}
i want to search for all books that has tags comedy and drama.
the result order is:
book 0 (20+30)
book 2 (30+5)
book 1 (10+5)
UPDATE:
i want to return only books that match both tags (and sort only by requested tags). so if i search for 'drama' and 'comedy', only books that has both tags will return (in this case book 0, book 1, book2), sorted by requested tag weights.
how can i get this? any example for query?

Ibrahim's answer is correct if you always want to sum up all weights, even for tags that don't match your query.
If you only want to take the weights of tags into account that you're searching for, you'll have to index tags as a nested object. This is because otherwise all ts and ws are flattened into lists, losing the associations in the process (described here).
Then you can use a function_score query wrapped in a nested query to sum up only the weights of the matching tags. You will have to enable scripting.
Here is an example:
GET /book/_search
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"filter": [
{
"terms": {
"tags.t": [
"comedy",
"drama"
]
}
}
]
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
},
"score_mode": "sum"
}
}
}
=== EDIT following #Eyal Ch's comment ===
If only books matching BOTH tags (comedy and drama in the example) are to be returned, it gets a bit more complicated, as each search term needs its own nested query.
Here's an example:
GET /book/_search
{
"query": {
"bool": {
"must":
[
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"term": {
"tags.t": {
"value": "comedy"
}
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
}
}
},
{
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"term": {
"tags.t": {
"value": "drama"
}
}
},
"functions": [
{
"script_score": {
"script": "return doc['tags.w'].value"
}
}
],
"boost_mode": "replace"
}
}
}
}
]
}
}
}

Try this:
POST book/book/_search
{
"query": {
"match": {
"tags.t": "comedy drama"
}
},
"sort": [
{
"tags.w": {
"order": "desc",
"mode": "sum"
}
}
]
}

Related

Is there a way to build an Elastic query with changing search values?

I want to use Elastic in PHP to process a search request from my website. For example, I have the search parameter
name
age
height
weight.
But it should not be necessary to always search for all parameters.
So it could be that only (name AND age) have values and (height AND weight) have not.
Is there a way to build one query with flexible/changing input values?
The query below would not work when there are no search values for (height AND weight).
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
]
}
}
}
Search templates to the rescue:
POST _scripts/my-search-template
{
"script": {
"lang": "mustache",
"source": """
{
"query": {
"bool": {
"should": [
{{#name}}
{ "match": { "name.keyword": "{{name}}" } },
{{/name}}
{{#age}}
{ "match": { "age": "{{age}}" } },
{{/age}}
{{#height}}
{ "match": { "height": "{{height}}" } },
{{/height}}
{{#weight}}
{ "match": { "weight": "{{weight}}" } },
{{/weight}}
{ "match_none": { } }
]
}
}
}
"""
}
}
Note that since you don't know how many criteria you have, the last condition is always false and is only there to make sure the JSON is valid (i.e. the last comma doesn't stay dangling)
You can then run your query like this:
POST my-index/_search/template
{
"id": "my-search-template",
"params": {
"name": "Anna",
"age": 30
}
}
You need to handle in your application that constructs your Elasticsearch query and its very easy to do it in the application as you know what all search parameter value you got from UI, if they are not null than only includes those fields in your Elasticsearch query.
Elasticsearch doesn't support if...else like condition in query.
Tldr;
They are multiple way to address your problem in Elasticsearch.
You could be playing with the parameter minimum_should_match
You could be using template queries with conditions.
You could also perform more complex bool queries, that enumerate the possibilities for a match.
You could also use scripts to program the logic you want to see.
Minimum should match
POST /_bulk
{"index":{"_index":"73121817"}}
{"name": "ana", "age": 1, "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "jack", "height": 180, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "emma", "age": 1, "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "william", "age": 1, "height": 180}
{"index":{"_index":"73121817"}}
{"name": "jenny", "weight": 70}
{"index":{"_index":"73121817"}}
{"name": "marco", "age": 1}
{"index":{"_index":"73121817"}}
{"name": "giulia", "height": 180}
{"index":{"_index":"73121817"}}
{"name": "paul"}
GET 73121817/_search
{
"query": {
"bool": {
"should": [
{ "match": { "name.keyword": "Anna" } },
{ "match": { "age": "30" } },
{ "match": { "height": "180" } },
{ "match": { "weight": "70" } }
],
"minimum_should_match": 2
}
}
}
with the minimum should match set to 2 only 2 documents are returned ana and jack
Template queries
Well Val's answer is quite complete
You could also refer to the doc
Complex queries
Refer to the so post behind the link
Scripted queries
GET 73121817/_search
{
"query": {
"bool": {
"filter": {
"script": {
"script": """
return (!doc["name.keyword"].empty && !doc["age"].empty);
"""
}
}
}
}
}

Search for documents matching all terms in a nested array Elasticsearch

I am learning to use Elasticsearch as a basic recommender engine.
My elasticsearch document contains records with nested entities as follows
PUT recs/user/1
{
"name" : "Brad Pitt",
"movies_liked": [
{
"name": "Forrest Gump",
"score": 1
},
{
"name": "Terminator",
"score": 4
},
{
"name": "Rambo",
"score": 4
},
{
"name": "Rocky",
"score": 4
},
{
"name": "Good Will Hunting",
"score": 2
}
]
}
PUT recs/user/2
{
"name" : "Tom Cruise",
"movies_liked": [
{
"name": "Forrest Gump",
"score": 2
},
{
"name": "Terminator",
"score": 1
},
{
"name": "Rocky IV",
"score": 1
},
{
"name": "Rocky",
"score": 1
},
{
"name": "Rocky II",
"score": 1
},
{
"name": "Predator",
"score": 4
}
]
}
I would like to search for users who specifically like "Forrest Gump","Terminator" and "Rambo".
I have used a nested query which currently looks like this
POST recs/user/_search
{
"query": {
"nested": {
"path": "movies_liked",
"query": {
"terms": {
"movies_liked.name": ["Forrest Gump","Terminator","Rambo"]
}
}
}
}
}
However when I execute this search, I expected to see only the first record which has all the required terms, but in the results I am getting both the records. In the second record the user clearly does not have "Rambo" in his liked list. I understand that this query is doing an "OR" operation with the given terms, How do I tweak this query to do an "AND" operation so that only the records having all the terms get matched?
How do I tweak this query to do an "AND" operation so that only the records having all the terms get matched?
By using a bool query:
POST recs/user/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "movies_liked",
"query": {
"bool": {
"must": [
{
"terms": {
"movies_liked.name": [
"Forrest Gump"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "movies_liked",
"query": {
"bool": {
"must": [
{
"terms": {
"movies_liked.name": [
"Terminator"
]
}
}
]
}
}
}
},
{
"nested": {
"path": "movies_liked",
"query": {
"bool": {
"must": [
{
"terms": {
"movies_liked.name": [
"Rambo"
]
}
}
]
}
}
}
}
]
}
}
}
Note that bool wraps around several nested queries, not the other way around. It is important because the scope of a nested query is the nested document, because it basically a hidden separate object.
Hope that helps!

Elasticsearch must_not filter not works with a big bunch of values

I have the next query that include some filters:
{
"from": 0,
"query": {
"function_score": {
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"idpais": [
115
]
}
},
{
"term": {
"tipo": [
1
]
}
}
],
"must_not": [
{
"term": {
"idregistro": [
5912471,
3433876,
9814443,
11703069,
6333176,
8288242,
9924922,
6677850,
11852501,
12530205,
4703469,
12776479,
12287659,
11823679,
12456304,
12777457,
10977614,
...
]
}
}
]
}
},
"query": {
"bool": {
"should": [
{
"match_phrase": {
"area": "Coordinator"
}
},
{
"match_phrase": {
"company": {
"boost": 5,
"query": "IBM"
}
}
},
{
"match_phrase": {
"topic": "IT and internet stuff"
}
},
{
"match_phrase": {
"institution": {
"boost": 5,
"query": "University of my city"
}
}
}
]
}
}
}
},
"script_score": {
"params": {
"idpais": 115,
"idprovincia": 0,
"relationships": []
},
"script_id": "ScoreUsuarios"
}
}
},
"size": 24,
"sort": [
{
"_script": {
"order": "desc",
"script_id": "SortUsuarios",
"type": "number"
}
}
]
}
The must_not filter has a big bunch of values to exclude (around 200 values), but it looks like elasticsearch ignores those values and it includes on the result set. If I try to set only a few values (10 to 20 values) then elasticsearch applies the must_not filter.
Exists some restriction a bout the amount of values in the filters? Exists some way to remove a big amount of results from the query?
terms query is used for passing a list of values not term query.You have to use it like below in your must filter.
{
"query": {
"terms": {
"field_name": [
"VALUE1",
"VALUE2"
]
}
}
}

Elasticsearch additional boost if multiple conditions are met

Imagine I have a document, which looks like this:
{
"Title": "Smartphones in United Kingdom",
"Text": "A huge text about the topic",
"CategoryTags": [
{
"CategoryID": 1,
"CategoryName": "Smartphone"
},
{
"CategoryID": 2,
"CategoryName": "Apple"
},
{
"CategoryID": 3,
"CategoryName": "Samsung"
}
],
"GeographyTags": [
{
"GeographyID": 1,
"GeographyName": "Western Europe"
},
{
"GeographyID": 2,
"GeographyName": "United Kingdom"
}
]
}
CategoryTags and GeographyTags are stored as nested subdocuments.
I'd be looking for "apple united kingdom" in my search bar. How'd I make a query that would boost this document if it has both matching category and geography at the same time?
I was thinking of multi_match query, but I didn't figure out how would I deal with nested documents here...
I was thinking of nesting must into should statement. Would that make any sense?
POST /_search
{
"template": {
"size": "50",
"_source": {
"include": "Title"
},
"query": {
"filtered": {
"query": {
"bool": {
"minimum_number_should_match": "2<50%",
"must": [
{
"match": {
"Text": {
"query": "{{SearchPhrase}}"
}
}
}
],
"should": [
{
"match": {
"Title": {
"query": "{{SearchPhrase}}",
"type": "phrase",
"boost": "20"
}
}
},
{
"bool": {
"must": [
{
"nested": {
"path": "CategoryTags",
"query": {
"match": {
"CategoryTags.CategoryName": "{{SearchPhrase}}"
}
}
}
},
{
"nested": {
"path": "GeographyTags",
"query": {
"match": {
"GeographyTags.GeographyName": "{{SearchPhrase}}"
}
}
}
}
]
}
}
]
}
}
}
}
}
}

Bool AND search in properties in ElasticSearch

I've got a very small dataset of documents put in ES :
{"id":1, "name": "John", "team":{"code":"red", "position":"P"}}
{"id":2, "name": "Jack", "team":{"code":"red", "position":"S"}}
{"id":3, "name": "Emily", "team":{"code":"green", "position":"P"}}
{"id":4, "name": "Grace", "team":{"code":"green", "position":"P"}}
{"id":5, "name": "Steven", "team":[
{"code":"green", "position":"S"},
{"code":"red", "position":"S"}]}
{"id":6, "name": "Josephine", "team":{"code":"red", "position":"S"}}
{"id":7, "name": "Sydney", "team":[
{"code":"red", "position":"S"},
{"code":"green", "position":"P"}]}
I want to query ES for people who are in the red team, with position P.
With the request
curl -XPOST 'http://localhost:9200/teams/aff/_search' -d '{
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}'
I've got a wrong result.
ES gives
"name": "John",
"team":
{ "code": "red", "position": "P" }
and
"name": "Sydney",
"team":
[
{ "code": "red", "position": "S"},
{ "code": "green", "position": "P"}
]
For the last entry, ES took the property code=red in the first record and took the property position=P in the second record.
How can I specify that the search must match the 2 two terms in the same record (within or not a list of nested records) ?
In fact, the good answer is only the document 1, with John.
Here is the gist that creates the dataset :
https://gist.github.com/flrt/4633ef59b9b9ec43d68f
Thanks in advance
When you index document like
{
"name": "Sydney",
"team": [
{"code": "red", "position": "S"},
{"code": "green","position": "P"}
]
}
ES implicitly create inner object for your field (team in particular example) and flattens it to structure like
{
'team.code': ['red', 'green'],
'team.position: ['S', 'P']
}
So you lose your order. To avoid this you need explicitly put nested mapping, index your document as always and query them with nested query
So, this
PUT so/nest/_mapping
{
"nest": {
"properties": {
"team": {
"type": "nested"
}
}
}
}
PUT so/nest/
{
"name": "Sydney",
"team": [
{
"code": "red",
"position": "S"
},
{
"code": "green",
"position": "P"
}
]
}
GET so/nest/_search
{
"query": {
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{
"match": {
"team.code": "red"
}
},
{
"match": {
"team.position": "P"
}
}
]
}
}
}
}
}
will result with empty hits.
Further reading on relation management: https://www.elastic.co/blog/managing-relations-inside-elasticsearch
You can use a Nested Query so that your searches happen individually on the subdocuments in the team array, rather than across the entire document.
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "team",
"query": {
"bool": {
"must": [
{ "match": { "team.code": "red" } },
{ "match": { "team.position": "P" } }
]
}
}
}
}
]
}
}
}

Resources