How to see which of the queries in boolean is matched? - elasticsearch

I have given multiple queries using the bool query. Now it can happen that some of them might have matches and some queries might not have matches in the database. How can I know which of the queries had a match?
For example, here I have a bool query with two should conditions against the field landMark.
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": "wendys"
}
},
{
"match": {
"landMark": "starbucks"
}
}
]
}
}
}
How can I know which one of them matched in the above query if only one of them matches the documents?

You can use named queries for this purpose. Try this
{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbucks match"
}
}
}
]
}
}
}
you can use any _name . In response you will get something like this
"matched_queries": ["wendy match"]
so you will be able to tell which query matched that specific document.

Named query is certainly the way to go.
LINK - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-named-queries-and-filters.html
Idea of named query is simple , you tag a name to each of your query and in the result , it shows which all tags matched per document.
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys near starbucks" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "wendys" }'
curl -XPOST 'http://localhost:9200/data/data' -d ' { "landMark" : "starbucks" }'
Hence create you query in this fashion -
curl -XPOST 'http://localhost:9200/data/_search?pretty' -d '{
"query": {
"bool": {
"should": [
{
"match": {
"landMark": {
"query": "wendys",
"_name": "wendy_is_a_match"
}
}
},
{
"match": {
"landMark": {
"query": "starbucks",
"_name": "starbuck_is_a_match"
}
}
}
]
}
}
}'
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.581694,
"hits" : [ {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNNCY3OZJfBZCJ_tO",
"_score" : 0.581694,
"_source": { "landMark" : "wendys near starbucks" },
"matched_queries" : [ "starbuck_is_a_match", "wendy_is_a_match" ] ---> "Matched tags
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNS0z3OZJfBZCJ_tQ",
"_score" : 0.1519148,
"_source": { "landMark" : "starbucks" },
"matched_queries" : [ "starbuck_is_a_match" ]
}, {
"_index" : "data",
"_type" : "data",
"_id" : "AVMCNRsF3OZJfBZCJ_tP",
"_score" : 0.04500804,
"_source": { "landMark" : "wendys" },
"matched_queries" : [ "wendy_is_a_match" ]
} ]
}
}

Related

elasticsearch filter nested object

I have an index with a nested object containing two attributes namely scopeId and categoryName. Following is the mappings part of the index
"mappedCategories" : {
"type" : "nested",
"properties": {
"scopeId": {"type":"long"},
"categoryName": {"type":"text",
"analyzer" : "productSearchAnalyzer",
"search_analyzer" : "productSearchQueryAnalyzer"}
}
}
A sample document containing the nested mappedCategories object is as follows:
POST productsearchna_2/_doc/1
{
"categoryName" : "Operating Systems",
"contexts" : [
0
],
"countryCode" : "US",
"id" : "10076327-1",
"languageCode" : "EN",
"localeId" : 1,
"mfgpartno" : "test123",
"manufacturerName" : "Hewlett Packard Enterprise",
"productDescription" : "HPE Microsoft Windows 2000 Datacenter Server - Complete Product - Complete Product - 1 Server - Standard",
"productId" : 10076327,
"skus" : [
{"sku": "43233004",
"skuName": "UNSPSC"},
{"sku": "43233049",
"skuName": "SP Richards"},
{"sku": "43234949",
"skuName": "Ingram Micro"}
],
"mappedCategories" : [
{"scopeId": 3228552,
"categoryName": "Laminate Bookcases"},
{"scopeId": 3228553,
"categoryName": "Bookcases"},
{"scopeId": 3228554,
"categoryName": "Laptop"}
]
}
I want to filter categoryName "lap" on scopeId: 3228553 i.e. my query should return 0 hits since Laptop is mapped to scopeId 3228554. But my following query is returning 1 hit with scopeId : 3228554
POST productsearchna_2/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.categoryName": "lap"
}
},
"inner_hits": {}
}
}
],
"filter": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.scopeId": {
"value": 3228552
}
}
}
}
}
]
}
},
"_source": ["mappedCategories.categoryName", "productId"]
}
Following is part of the result of the query:
"inner_hits" : {
"mappedCategories" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.5586993,
"hits" : [
{
"_index" : "productsearchna_2",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "mappedCategories",
"offset" : 2
},
"_score" : 1.5586993,
"_source" : {
"scopeId" : 3228554,
"categoryName" : "Laptop"
}
}
]
}
}
I want my query to return zero hits, and in case I search for "book" with scopeId: 3228552, I want my query to return 2 hits, 1 for Bookcases and another for Laminate Bookcases categoryNames. Please help.
This query solves part of the problem but when searching for book" with scopeId: 3228552 it will only get 1 result.
GET idx_test/_search?filter_path=hits.hits.inner_hits
{
"query": {
"nested": {
"path": "mappedCategories",
"query": {
"bool": {
"filter": [
{
"term": {
"mappedCategories.scopeId": {
"value": 3228553
}
}
}
],
"must": [
{
"match": {
"mappedCategories.categoryName": "laptop"
}
}
]
}
},
"inner_hits": {}
}
}
}

Getting only the most recent records from ElasticSearch

I have a user index of ElasticSearch where each user has a name and multiple other user related information and also an indexedAt field which specify when the user information is being indexed. When any information of user changes I create a new record of the user and store it. Therefore each user can have many multiple records in the index.
Now Simply I want to get only the most up to date information of the queried users.
For example if I run the following query, it will return all of the records of John and Smith. But I want only the most recent record for each of the users.
{
"size": 10000,
"query": {
"bool": {
"should": [
{
"match_phrase": {
"name": "John"
}
},
{
"match_phrase": {
"name": "Smith"
}
}
]
}
},
"sort": [
{
"indexedAt": {
"order": "desc"
}
}
]
}
You can use inner_hits to get your answer
GET /temp_index/_search
{
"size": 10,
"query": {
"bool": {
"should": [
{
"match_phrase": {
"name": "John"
}
},
{
"match_phrase": {
"name": "Smith"
}
}
]
}
},
"collapse": {
"field": "name.keyword",
"inner_hits": {
"name": "most_recent",
"size": 1,
"sort": [{"indexedAt": "desc"}]
}
}
}
This will get you a result similar to below
{
"_index" : "temp_index",
"_type" : "_doc",
"_id" : "KSHBjnMBPr3VGlJjXe3d",
"_score" : 0.8266786,
"_source" : {
"name" : "John",
"indexedAt" : 1015
},
"fields" : {
"name.keyword" : [
"John"
]
},
"inner_hits" : {
"most_recent" : {
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "temp_index",
"_type" : "_doc",
"_id" : "LyHBjnMBPr3VGlJji-24",
"_score" : null,
"_source" : {
"name" : "John",
"indexedAt" : 1050
},
"sort" : [
1050
]
}
]
}
}
}
},
You can access the inner_hits portion to get the document which was most recently indexed (i.e. with the largest indexedAt value)

Using named queries (matched_queries) for nested types in Elasticsearch?

Using named queries, I can get a list of the matched_queries for boolean expressions such as:
(query1) AND (query2 OR query3 OR true)
Here is an example of using named queries to match on top-level document fields:
DELETE test
PUT /test
PUT /test/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"type": {
"type": "text"
},
"TAGS": {
"type": "nested"
}
}
}
POST /test/_doc
{
"name" : "doc1",
"type": "msword",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
},
{
"ID" : "tag3",
"TYPE" : "BASIC"
}
]
}
# (query1) AND (query2 or query3 or true)
GET /test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "doc1",
"_name": "query1"
}
}
}
],
"should": [
{
"match": {
"type": {
"query": "msword",
"_name": "query2"
}
}
},
{
"exists": {
"field": "type",
"_name": "query3"
}
}
]
}
}
}
The above query correctly returns all three matched_queries in the response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5753641,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TKNJ9G4BbvPS27u-ZYux",
"_score" : 1.5753641,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "BASIC"
},
{
"ID" : "wb1",
"TYPE" : "BASIC"
}
]
},
"matched_queries" : [
"query1",
"query2",
"query3"
]
}
]
}
}
However, I'm trying to run a similar search:
(query1) AND (query2 OR query3 OR true)
only this time on the nested TAGS object rather than top-level document fields.
I've tried the following query, but the problem is I need to supply the inner_hits object for nested objects in order to get the matched_queries in the response, and I can only add it to one of the three queries.
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
},
// "inner_hits" : {}
}
},
"should": [
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
// "inner_hits" : {}
}
},
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
},
// "inner_hits" : {}
}
}
]
}
}
}
Elasticsearch will complain if I add more than one 'inner_hits'. I've commented out the places above where I can add it, but each of these will only return the single matched query.
I want my response to this query to return:
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
Any help is much appreciated, thanks!
A colleague helpfully provided a solution to this; move the _named parameter to directly under each nested section:
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"_name": "tag1-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1"
}
}
}
}
},
"should": [
{
"nested": {
"_name": "tag2-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2"
}
}
}
}
},
{
"nested": {
"_name": "tag3-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3"
}
}
}
}
}
]
}
}
}
This correctly returns all three tags now in the matched_queries response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 2.9424875,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TaNy9G4BbvPS27u--oto",
"_score" : 2.9424875,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "DATASOURCE"
},
{
"ID" : "wb1",
"TYPE" : "WORKBOOK"
},
{
"ID" : "wb2",
"TYPE" : "WORKBOOK"
}
]
},
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
}
]
}
}

Filter Full Text Search based on User ID

GET _search
{
"query": {
"match": {
"content": "this test"
}
}
}
This gave me below result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "6bb174ab-a4ce-4409-a626-c9a42c98b89e",
"_score" : 0.5753642,
"_source" : {
"user_id" : 13,
"content" : "This is a test"
}
},
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "1304cf2e-a1d4-40ca-9876-9abb08c4474d",
"_score" : 0.36464313,
"_source" : {
"user_id" : 10,
"content" : "This is a test"
}
},
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "623c093c-4408-445e-abb1-460d2c5004cd",
"_score" : 0.36464313,
"_source" : {
"user_id" : 15,
"content" : "This is a test"
}
}
]
}
}
Which is good. However, I need to filter them by user_id. I mean I need to score only specific user and their content.
GET _search
{
"query": {
"match": {
"content": "this test",
"user_id": 10
}
}
}
When I add user_id i get this error:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query doesn't support multiple fields, found [content] and [user_id]",
"line": 5,
"col": 18
}
],
"type": "parsing_exception",
"reason": "[match] query doesn't support multiple fields, found [content] and [user_id]",
"line": 5,
"col": 18
},
"status": 400
}
Why? And How to properly filter based on user_id?
You can use term query to filter the result by user_id.
Update your query as below:
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this test"
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}
The query should be like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this test"
}
},
{
"match": {
"user_id": 10
}
}
]
}
}
}
Use bool query to combine filters
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this is content"
}
},
{
"term": {
"user_id": {
"value": 47545
}
}
}
]
}
}
}

Elastic search: exact match query on string array

Given this document:
{"name": "Perfect Sunny-Side Up Eggs","ingredientList": ["canola oil","eggs"]}
How can I build a query in elastic search to return exact matches on a string array given query term "oil eggs", so far this it what I have, but it returns other irrelevant documents:
POST /recipes/recipe/_search
{
"query": {
"match": {
"ingredientList": {
"query": [
"oil",
"eggs"
],
"operator": "and"
}
}
}
}
for instance, this document is returned but it doesn't contain "oil". Results should only contain "oil" and "eggs":
{"name": "Quick Baked French Toast","ingredientList": ["butter","cinnamon raisin bread","eggs"]}
Your query will look like this:
{
"query": {
"bool": {
"must": [
{
"term": {
"ingredientList": "oil"
}
},
{
"term": {
"ingredientList": "eggs"
}
}
]
}
}
}
Gives me the results:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "ingredients",
"_type" : "recipe",
"_id" : "AVeprXFrNutW6yNguPqp",
"_score" : 1.0,
"_source" : {
"name" : "Perfect Sunny-Side Up Eggs",
"ingredientList" : [ "canola oil", "eggs" ]
}
} ]
}
}
Elastic dont have API to exact match array. But same can be achieved using two methods:
Using multiple must blocks (not preferred)
Using terms set query and script
"query": {
"bool": {
"must": [
{
"terms_set": {
"ingredientList": {
"terms": ingredients,
"minimum_should_match_script": {
"source": "Math.min(params.num_terms, {})".format(len(ingredients))
}
}
}
},
{
"script": {
"script": {
"inline": "doc['ingredientList'].length == params.list_length",
"lang": "painless",
"params": {
"list_length": len(ingredients)
}
}
}
}
]
}
}

Resources