elastic search: use terms in nested field - elasticsearch

I have a lot of documents with data that looks like this:
"paymentMethods": [
{
"id": 194,
"name": "Wire",
"logo": "wire.gif"
}, {
"id": 399,
"name": "Paper Check",
"logo": "papercheck.gif"
}
Mapping:
"paymentMethods": {
"type": "nested",
"properties": {
"id": {
"type": "long"
},
"logo": {
"type": "string",
"index": "not_analyzed"
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
I try to get all the documents that have paymentMethos.id 399 & 194.
this query is works for me:
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"nested": {
"path": "paymentMethods",
"query": {
"bool": {
"must": [
{
"term": {
"paymentMethods.id": 399
}
}
]
}
}
}
}]
}
},
"query": {
"match_all": {}
}
}
}
the problem is that I need all the documents with id 399 & 194
so I tried it:
"must" : [
{ "terms":{"paymentMethods.id" : [399,194]} }
]
but the result is kind of OR I want it as AND.
I also tried this one but it don't work at all
"must" : [{
"term": {
"paymentMethods.id": 399
}
}, {
"term": {
"paymentMethods.id": 194
}
}]
any suggestions how can I get paymentMethods.id 399 & 194?
thanks.

well, after some digging around i found the problem, each bool clause (must,should,must_not) should have it's own nested query i.e.
{"query": {
"bool": {
"must": [
{
"nested": {
"path": "paymentMethods",
"query": {
"term" : { "paymentMethods.id":399 }
}
}
},
{
"nested": {
"path": "paymentMethods",
"query": {
"term" : { "paymentMethods.id":187 }
}
},
etc..
before i tried to search with "terms" which returns document with ANY of the match terms so i got documents with 187 or 399
the code above queries the nested hidden documents twice once for 187 and once for 399 and returns the intersection of both queries => all the documents with 187 & 399
(of-course the second query does not run on all the documents again but runs on the results of the previous filter result)

Related

full-text and knn_vector hybrid search for elastic

I am currently working on a search engine and i've started to implement semantic search. I use open distro version of elastic and my mapping look like this for the moment :
{
"settings": {
"index": {
"knn": true,
"knn.space_type": "cosinesimil"
}
},
"mappings": {
"properties": {
"title": {
"type" : "text"
},
"data": {
"type" : "text"
},
"title_embeddings": {
"type": "knn_vector",
"dimension": 600
},
"data_embeddings": {
"type": "knn_vector",
"dimension": 600
}
}
}
}
for basic knn_vector search i use this :
{
"size": size,
"query": {
"script_score": {
"query": {
"match_all": { }
},
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
and i've managed to get a, kind of, hybrid search with this :
{
"size": size,
"query": {
"function_score": {
"query": {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"script_score": {
"script": {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
}
}
}
}
The problem is that if i don't have the word in the document, then it is not returned. For example, with the first search query, when i search for trump (which is not in my dataset) i manage to get document about social network and politic. I don't have these results with the hybrid search.
I have tried this :
{
"size": size,
"query": {
"function_score": {
"query": {
"match_all": { }
},
"functions": [
{
"filter" : {
"multi_match": {
"query": query,
"fields": ["data", "title"]
}
},
"weight": 1
},
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 4
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}
but the multi match part give a constant score to all documents that match and i want to use the filter to rank my document like in normal full text query. Any idea to do it ? Or should i use another strategy? Thank you in advance.
After the help of Archit Saxena here is the solution of my problems :
{
"size": size,
"query": {
"function_score": {
"query": {
"bool": {
"should" : [
{
"multi_match" : {
"query": query,
"fields": ["data", "title"]
}
},
{
"match_all": { }
}
],
"minimum_should_match" : 0
}
},
"functions": [
{
"script_score" : {
"script" : {
"source": "cosineSimilarity(params.query_value, doc[params.field1]) + cosineSimilarity(params.query_value, doc[params.field2])",
"params": {
"field1": "title_embeddings",
"field2": "data_embeddings",
"query_value": query_vec
}
}
},
"weight": 20
}
],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}

Elasticsearch query by generic properties: keywords and numeric values

I have this mapping in ES 7.9:
{
"mappings": {
"properties": {
"cid": {
"type": "keyword",
"store": true
},
"id": {
"type": "keyword",
"store": true
},
"a": {
"type": "nested",
"properties": {
"attribute":{
"type": "keyword"
},
"key": {
"type": "keyword"
},
"num": {
"type": "float"
}
}
}
}
}
}
And some documents indexed like:
{
"cid": "177",
"id": "1",
"a": [
{
"attribute": "tags",
"key": [
"heel",
"thong",
"low_heel",
"economic"
]
},
{
"attribute": "weight",
"num": 15
}
]
}
Basically, an object can have multiple attributes (a property array).
Those attributes can be different for each client. In this example, I have 2 types of attributes: tag and weight, however other documents could have other attributes like vendor, size, power, etc., so the model has to be generic enough to support beforehand unknown attributes.
An attribute can be a list of keywords (like tags) or a numeric value (like weight).
I need an ES query to fetch the documents ids with this pseudo-query:
cid="177" and (tag="flat" or tag="heel") and tag="economic" and weight<20
I managed to reach this query that seems to be working as expected:
{
"_source": ["id"],
"query": {
"bool": {
"must" : [
{"term" : { "cid" : "177" }},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"terms" : { "a.key": ["flat","heel"]}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "tags"}},
{"term" : { "a.key": "economic"}}
]
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"bool":{
"must":[
{"term" : { "a.attribute": "weight" } },
{"range": { "a.num": {"lt": 20} } }
]
}
}
}
}
]
}
}
}
Is this query correct or I am getting the correct results by chance?
Is the query (or mapping) optimal or I should rethink something?
Can the query be simplified?
The query is correct.
The mapping is great and the query is optimal.
While the query can be simplified:
{
"_source": [
"id"
],
"query": {
"bool": {
"must": [
{
"term": {
"cid": "177"
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:tags AND ((a.key:flat OR a.key:heel) AND a.key:economic)"
}
}
}
},
{
"nested": {
"path": "a",
"query": {
"query_string": {
"query": "a.attribute:weight AND a.num:<20"
}
}
}
}
]
}
}
}
it'd be less optimal due to the fact that these query_strings would still need to be internally compiled into essentially the query DSL that you've got above. Plus you'd still be needing the two separate nested groups so... You're good to roll with what you've got.

Function_score, multi_match, script_score, and filter in Elasticsearch

I'm having trouble adding a filter to my existing multimatch query which is embedded inside of a function_score.
Ideally, I'd like to filter by "term" : { "lang" : "en" }, only get back documents which are in the english language.
I've tried moving around the order, tried wrapping my query in bool, but just can't get the filter to work with the other functions I'm using.
My query code:
GET /my_index/_search/
{
"query": {
"function_score": {
"query": {
"bool": {
"filter": {
"term": {
"lang": "en"
}
},
"multi_match": {
"query": "Sample Query here",
"type": "most_fields",
"fields": [
"body",
"title",
"permalink",
"name"
]
}
}
},
"script_score": {
"script": {
"source": "_score + 10"
}
}
}
}
}
Error code:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[bool] query does not support [multi_match]",
"line": 11,
"col": 19
}
],
"type": "parsing_exception",
"reason": "[bool] query does not support [multi_match]",
"line": 11,
"col": 19
},
"status": 400
}
I'm using the latest version of Elasticsearch (I believe 6.2)
Try wrapping your multi_match in a must clause like so
"must": {
"multi_match": ...
}
The error message is clear, bool query accepts only filter, must, should
Final Solution:
GET /my_index/_search/
{
"query": {
"function_score": {
"query": {
"bool" : {
"filter": {
"term": {
"lang": "en"
}
},
"must" : {
"multi_match" : {
"query": "Sample Query Here",
"type": "most_fields",
"fields": [ "body", "title", "permalink", "name"]
}
}
}
},
"script_score" : {
"script" : {
"source": "_score + 10"
}
}
}
}
}

Elasticsearch : search document with conditional filter

I have two documents in my index (same type) :
{
"first_name":"John",
"last_name":"Doe",
"age":"24",
"phone_numbers":[
{
"contract_number":"123456789",
"phone_number":"987654321",
"creation_date": ...
},
{
"contract_number":"123456789",
"phone_number":"012012012",
"creation_date": ...
}
]
}
{
"first_name":"Roger",
"last_name":"Waters",
"age":"36",
"phone_numbers":[
{
"contract_number":"546987224",
"phone_number":"987654321",
"creation_date": ...,
"expired":true
},
{
"contract_number":"87878787",
"phone_number":"55555555",
"creation_date": ...
}
]
}
Clients would like to perform a full text search. Okay no problem here
My problem :
In this full text search, sometimes user will search by phone_number. In this case there is a parameter like expired=true.
Example :
First client search request : "987654321" with expired absent or set to false
--> Result : Only first document
Second client search request : "987654321" with expired set to true
--> Result : The two documents
How can I achieve that ?
Here is my mapping :
{
"user": {
"_all": {
"auto_boost": true,
"omit_norms": true
},
"properties": {
"phone_numbers": {
"type": "nested",
"properties": {
"phone_number": {
"type": "string"
},
"creation_date": {
"type": "string",
"index": "no"
},
"contract_number": {
"type": "string"
},
"expired": {
"type": "boolean"
}
}
},
"first_name":{
"type": "string"
},
"last_name":{
"type": "string"
},
"age":{
"type": "string"
}
}
}
}
Thanks !
MC
EDIT :
I tried this query :
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"filter": {
"bool": {
"should":[
{
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
},
{
"missing": {
"field": "expired"
}
}
]
}
},
{
"bool": {
"must_not": [
{
"term": {
"phone_number": "987654321"
}
}
]
}
}
]
}
}
}
}
}
}}
But I get the two documents instead of get only the first one
You're very close. Try using a combination of must and should, where the must clause ensures the phone_number matches the search value, and the should clause ensures that either the expired field is missing or set to false. For example:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "987654321",
"analyze_wildcard": "true"
}
},
"filter": {
"nested": {
"path": "phone_numbers",
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"phone_number": "987654321"
}
}
],
"should": [
{
"missing": {
"field": "expired"
}
},
{
"term": {
"expired": false
}
}
]
}
}
}
}
}
}
}
}
}
I ran this query using your mapping and sample documents and it returned the one document for John Doe, as expected.

Elastic Search Relevance for query based on most matches

I have a following mapping
posts":{
"properties":{
"prop1": {
"type": "nested",
"properties": {
"item1": {
"type": "string",
"index": "not_analyzed"
},
"item2": {
"type": "string",
"index": "not_analyzed"
},
"item3": {
"type": "string",
"index": "not_analyzed"
}
}
},
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
Consider the objects indexed like following for these mapping
{
"name": "Name1",
"prop1": [
{
"item1": "val1",
"item2": "val2",
"item3": "val3"
},
{
"item1": "val1",
"item2": "val5",
"item3": "val6"
}
]
}
And another object
{
"name": "Name2",
"prop1": [
{
"item1": "val2",
"item2": "val7",
"item3": "val8"
},
{
"item1": "val12",
"item2": "val9",
"item3": "val10"
}
]
}
Now say i want to search documents which have prop1.item1 value to be either "val1" or "val2". I also want the result to be sorted in such a way that the document with both val1 and val2 would have more score than the one with only one of "val1" or "val2".
I have tried the following query but that doesnt seem to score based on number of matches
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "prop1",
"filter": {
"or": [
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val2"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val1"}},
{"term": {"prop1.item2": "val5"}}
]
},
{
"and": [
{"term": {"prop1.item1": "val12"}},
{"term": {"prop1.item2": "val9"}}
]
}
]
}
}
}
}
}
}
Now although it should give both documents, first document should have more score as it contains 2 of the things in the filter whereas second contains only one.
Can someone help with the right query to get results sorted based on most matches ?
The biggest problem you have with your query is that you are using a filter. Therefore no score is calculated. Than you use a match_all query which gives all documents a score of 1. Replace the filtered query with a query and use the bool query instead of the bool filter.
Hope that helps.
Scores aren't calculated on filters use a nested query instead:
{
"query": {
"nested": {
"score_mode": "sum",
"path": "prop1",
"query": {
"bool": {
"should": [{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val2"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val1"
}
},
{
"match": {
"prop1.item2": "val5"
}
}]
}
},
{
"bool": {
"must": [{
"match": {
"prop1.item1": "val12"
}
},
{
"match": {
"prop1.item2": "val9"
}
}]
}
}]
}
}
}
}
}

Resources