Elastic search: exact match query on string array - elasticsearch

Given this document:
{"name": "Perfect Sunny-Side Up Eggs","ingredientList": ["canola oil","eggs"]}
How can I build a query in elastic search to return exact matches on a string array given query term "oil eggs", so far this it what I have, but it returns other irrelevant documents:
POST /recipes/recipe/_search
{
"query": {
"match": {
"ingredientList": {
"query": [
"oil",
"eggs"
],
"operator": "and"
}
}
}
}
for instance, this document is returned but it doesn't contain "oil". Results should only contain "oil" and "eggs":
{"name": "Quick Baked French Toast","ingredientList": ["butter","cinnamon raisin bread","eggs"]}

Your query will look like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"ingredientList": "oil"
}
},
{
"term": {
"ingredientList": "eggs"
}
}
]
}
}
}
Gives me the results:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "ingredients",
"_type" : "recipe",
"_id" : "AVeprXFrNutW6yNguPqp",
"_score" : 1.0,
"_source" : {
"name" : "Perfect Sunny-Side Up Eggs",
"ingredientList" : [ "canola oil", "eggs" ]
}
} ]
}
}

Elastic dont have API to exact match array. But same can be achieved using two methods:
Using multiple must blocks (not preferred)
Using terms set query and script
"query": {
"bool": {
"must": [
{
"terms_set": {
"ingredientList": {
"terms": ingredients,
"minimum_should_match_script": {
"source": "Math.min(params.num_terms, {})".format(len(ingredients))
}
}
}
},
{
"script": {
"script": {
"inline": "doc['ingredientList'].length == params.list_length",
"lang": "painless",
"params": {
"list_length": len(ingredients)
}
}
}
}
]
}
}

Related

"match-boolean-query doesn't return the "exact match"

I'm using "match-Boolean-prefix query but I can't get the exact match of the query.I can't use prefix queries because I also need "not exact match" results and I also need the fuzziness and word completion.I get every thing I need by match-boo-prefix query(the fuzziness not work that good though) but my problem is when I'm looking for exact match like "apple" it shows everything that includes "apple" I need the exact match gets higher ranking than others.
GET /_search
{
"query": {
"bool": {
"must": [
{
"match_bool_prefix": {
"name": {
"query": "apple",
"fuzziness": "auto"
}
}
},
{
"bool": {
"must_not": [
{
"match": {
"type": "3"
}
},
{
"match": {
"type": "4"
}
}
]
}
},
{
"match": {
"status": "A"
}
}
],
"should": [
{
"exists": {
"field": "",
"boost": 10
}
}
]
}
},
"indices_boost": [
{
"index1": 3
},
{
"index2": 1.3
},
{
"index3": 1.5
}
],
"size": 20
}
the result I'm getting with this query is :
{
"took" : 6,
"timed_out" : false,
"_shards" : {
"total" : 20,
"successful" : 20,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4970,
"relation" : "eq"
},
"max_score" : 14.451834,
"hits" : [
{
"_index" : "index",
"_id" : "11434",
"_score" : 14.451834,
"_source" : {
"name" : "Apple Slices With Peanut Butter".
is there any solution for this?

Using named queries (matched_queries) for nested types in Elasticsearch?

Using named queries, I can get a list of the matched_queries for boolean expressions such as:
(query1) AND (query2 OR query3 OR true)
Here is an example of using named queries to match on top-level document fields:
DELETE test
PUT /test
PUT /test/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"type": {
"type": "text"
},
"TAGS": {
"type": "nested"
}
}
}
POST /test/_doc
{
"name" : "doc1",
"type": "msword",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
},
{
"ID" : "tag3",
"TYPE" : "BASIC"
}
]
}
# (query1) AND (query2 or query3 or true)
GET /test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "doc1",
"_name": "query1"
}
}
}
],
"should": [
{
"match": {
"type": {
"query": "msword",
"_name": "query2"
}
}
},
{
"exists": {
"field": "type",
"_name": "query3"
}
}
]
}
}
}
The above query correctly returns all three matched_queries in the response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5753641,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TKNJ9G4BbvPS27u-ZYux",
"_score" : 1.5753641,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "BASIC"
},
{
"ID" : "wb1",
"TYPE" : "BASIC"
}
]
},
"matched_queries" : [
"query1",
"query2",
"query3"
]
}
]
}
}
However, I'm trying to run a similar search:
(query1) AND (query2 OR query3 OR true)
only this time on the nested TAGS object rather than top-level document fields.
I've tried the following query, but the problem is I need to supply the inner_hits object for nested objects in order to get the matched_queries in the response, and I can only add it to one of the three queries.
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
},
// "inner_hits" : {}
}
},
"should": [
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
// "inner_hits" : {}
}
},
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
},
// "inner_hits" : {}
}
}
]
}
}
}
Elasticsearch will complain if I add more than one 'inner_hits'. I've commented out the places above where I can add it, but each of these will only return the single matched query.
I want my response to this query to return:
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
Any help is much appreciated, thanks!
A colleague helpfully provided a solution to this; move the _named parameter to directly under each nested section:
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"_name": "tag1-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1"
}
}
}
}
},
"should": [
{
"nested": {
"_name": "tag2-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2"
}
}
}
}
},
{
"nested": {
"_name": "tag3-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3"
}
}
}
}
}
]
}
}
}
This correctly returns all three tags now in the matched_queries response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 2.9424875,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TaNy9G4BbvPS27u--oto",
"_score" : 2.9424875,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "DATASOURCE"
},
{
"ID" : "wb1",
"TYPE" : "WORKBOOK"
},
{
"ID" : "wb2",
"TYPE" : "WORKBOOK"
}
]
},
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
}
]
}
}

access query value from function_score to compute new score

I need to customize ES score. The score function I need to implement is:
score = len(document_term) - len(query_term)
For instance, one of my document in the ES index is :
{
"name": "foobar"
}
And the search query
{
"query": {
"function_score": {
"query": {
"match": {
"name": {
"query": "foo"
}
}
},
"functions": [
{
"script_score": {
"script": {
"source": "doc['name'].value.length() - ?LEN(query_tem)?"
}
}
}
],
"boost_mode": "replace"
}
}
}
The above search should provide a score of 6 - 3 = 3. But I didn't find a solution to get access the value of the query term.
Is it possible to access the value of the query term in a function_score context ?
There is no direct way to do this, however you can achieve that in the below way where you would need to add the query parameters in two different parts of the query.
Before that one important note, you cannot apply the doc['myfield'].value if the field is of type text, instead you would need to have its sibling field created as keyword and refer that in the script, which again I've mentioned below:
Mapping:
PUT myindex
{
"mappings" : {
"properties" : {
"myfield" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
Sample Document:
POST myquery/_doc/1
{
"myfield": "I've become comfortably numb"
}
Query:
POST <your_index_name>/_search
{
"query": {
"function_score": {
"query": {
"match": {
"myfield": "numb"
}
},
"functions": [
{
"script_score": {
"script": {
"source": "return doc['myfield.keyword'].value.length() - params.myquery.length()",
"params": {
"myquery": "numb" <---- Add the query string here as well
}
}
}
}
],
"boost_mode": "replace"
}
}
}
Response:
{
"took" : 558,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 24.0,
"hits" : [
{
"_index" : "myindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 24.0,
"_source" : {
"myfield" : "I've become comfortably numb"
}
}
]
}
}
Hope this helps!

Search partial words or wildcard with Elasticsearch

When I search as below I got result successfully. This is also valid for sentences (or complete words). However, partial words does not find anything.
For example lets have a look at this sentence:
embedded image can place here.
When I search embedded it finds this content. But embed does not find anything.
Let me show you:
GET _search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "Embedded"
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}
Result:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "8c76f6a5-115a-4102-94e6-a3abef914d13",
"_score" : 0.2876821,
"_source" : {
"user_id" : 10,
"content" : "Embedded image"
}
}
]
}
}
However, lets search word embed only:
GET _search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "Embed"
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}
Result: Empty...
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
Is it possible to find related contents when search like this? Please note that it should also find when i search embed image
GET _search
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "embed image"
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}
I solved this by using query_string
GET _search
{
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "embed image",
"fields": [
"content"
]
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}

How to see which of the queries in boolean is matched?

I have given multiple queries using the bool query. Now it can happen that some of them might have matches and some queries might not have matches in the database. How can I know which of the queries had a match?
For example, here I have a bool query with two should conditions against the field landMark.
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": "wendys"
}
},
{
"match": {
"landMark": "starbucks"
}
}
]
}
}
}
How can I know which one of them matched in the above query if only one of them matches the documents?
You can use named queries for this purpose. Try this
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbucks match"
}
}
}
]
}
}
}
you can use any _name . In response you will get something like this
"matched_queries": ["wendy match"]
so you will be able to tell which query matched that specific document.
Named query is certainly the way to go.
LINK - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
Idea of named query is simple , you tag a name to each of your query and in the result , it shows which all tags matched per document.
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys near starbucks" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "starbucks" }'
Hence create you query in this fashion -
curl -XPOST 'http://localhost:9200/data/_search?pretty' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy_is_a_match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbuck_is_a_match"
}
}
}
]
}
}
}'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.581694,
"hits" : [ {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNNCY3OZJfBZCJ_tO",
"_score" : 0.581694,
"_source": { "landMark" : "wendys near starbucks" },
"matched_queries" : [ "starbuck_is_a_match", "wendy_is_a_match" ] ---> "Matched tags
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNS0z3OZJfBZCJ_tQ",
"_score" : 0.1519148,
"_source": { "landMark" : "starbucks" },
"matched_queries" : [ "starbuck_is_a_match" ]
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNRsF3OZJfBZCJ_tP",
"_score" : 0.04500804,
"_source": { "landMark" : "wendys" },
"matched_queries" : [ "wendy_is_a_match" ]
} ]
}
}

Resources