Nested Query Elastic Search - spring

Currently I am trying to search/filter a nested Document in Elastic Search Spring Data.
The Current Document Structure is:
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": [
{
"id": 1,
"status": true,
"issue": "Variation Issue"
},
{
"id": 32,
"status": false,
"issue": "NoiseIssue"
}
]
}
}
Now we need to filter out the policy_data which has Noise Issue and If there is no Policy Data which has Noise Issue the policy_data will be null inside the parent document.
I have tried to use this Query
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
},
{
"nested": {
"path": "policiesDetails.policy_data",
"query": {
"bool": {
"must": {
"terms": {
"policiesDetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
}
}
This works Fine to filter nested Document. But If the Nested Document does not has the match it removes the entire document from the view.
What i want is if nested filter does not match:-
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": null
}

If any nested document is not found then parent document will not be returned.
You can use should clause for policy_data. If nested document is found it will be returned under inner_hits otherwise parent document will be returned
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
}
],
"should": [
{
"nested": {
"path": "policydetails.policy_data",
"inner_hits": {}, --> to return matched policy_data
"query": {
"bool": {
"must": {
"terms": {
"policydetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
},
"_source": ["id","customername","policydetails.address"] --> selected fields
}
Result:
{
"_index" : "index116",
"_type" : "_doc",
"_id" : "f1SxGHoB5tcHqHDtAkTC",
"_score" : 0.2876821,
"_source" : {
"policydetails" : {
"address" : {
"city" : "Irvine",
"address2" : "23994384, Out OF World",
"post_code" : "92617",
"state" : "CA"
}
},
"id" : 1,
"customername" : "Cust#123"
},
"inner_hits" : {
"policydetails.policy_data" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ] --> nested query result , matched document returned
}
}
}
}

Related

elasticsearch filter nested object

I have an index with a nested object containing two attributes namely scopeId and categoryName. Following is the mappings part of the index
"mappedCategories" : {
"type" : "nested",
"properties": {
"scopeId": {"type":"long"},
"categoryName": {"type":"text",
"analyzer" : "productSearchAnalyzer",
"search_analyzer" : "productSearchQueryAnalyzer"}
}
}
A sample document containing the nested mappedCategories object is as follows:
POST productsearchna_2/_doc/1
{
"categoryName" : "Operating Systems",
"contexts" : [
0
],
"countryCode" : "US",
"id" : "10076327-1",
"languageCode" : "EN",
"localeId" : 1,
"mfgpartno" : "test123",
"manufacturerName" : "Hewlett Packard Enterprise",
"productDescription" : "HPE Microsoft Windows 2000 Datacenter Server - Complete Product - Complete Product - 1 Server - Standard",
"productId" : 10076327,
"skus" : [
{"sku": "43233004",
"skuName": "UNSPSC"},
{"sku": "43233049",
"skuName": "SP Richards"},
{"sku": "43234949",
"skuName": "Ingram Micro"}
],
"mappedCategories" : [
{"scopeId": 3228552,
"categoryName": "Laminate Bookcases"},
{"scopeId": 3228553,
"categoryName": "Bookcases"},
{"scopeId": 3228554,
"categoryName": "Laptop"}
]
}
I want to filter categoryName "lap" on scopeId: 3228553 i.e. my query should return 0 hits since Laptop is mapped to scopeId 3228554. But my following query is returning 1 hit with scopeId : 3228554
POST productsearchna_2/_search
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.categoryName": "lap"
}
},
"inner_hits": {}
}
}
],
"filter": [
{
"nested": {
"path": "mappedCategories",
"query": {
"term": {
"mappedCategories.scopeId": {
"value": 3228552
}
}
}
}
}
]
}
},
"_source": ["mappedCategories.categoryName", "productId"]
}
Following is part of the result of the query:
"inner_hits" : {
"mappedCategories" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.5586993,
"hits" : [
{
"_index" : "productsearchna_2",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "mappedCategories",
"offset" : 2
},
"_score" : 1.5586993,
"_source" : {
"scopeId" : 3228554,
"categoryName" : "Laptop"
}
}
]
}
}
I want my query to return zero hits, and in case I search for "book" with scopeId: 3228552, I want my query to return 2 hits, 1 for Bookcases and another for Laminate Bookcases categoryNames. Please help.
This query solves part of the problem but when searching for book" with scopeId: 3228552 it will only get 1 result.
GET idx_test/_search?filter_path=hits.hits.inner_hits
{
"query": {
"nested": {
"path": "mappedCategories",
"query": {
"bool": {
"filter": [
{
"term": {
"mappedCategories.scopeId": {
"value": 3228553
}
}
}
],
"must": [
{
"match": {
"mappedCategories.categoryName": "laptop"
}
}
]
}
},
"inner_hits": {}
}
}
}

Should and Filter combination in ElasticSearch

I have this query which return the correct result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
]
}
}
}
and the result is below:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.6012557,
"hits" : [
{
"_index" : "person",
"_type" : "_doc",
"_id" : "70002",
"_score" : 3.6012557,
"_source" : {
"gender" : "Male",
"activeStatus" : "Inactive",
"deceased" : "No",
"nameDetails" : {
"name" : [
{
"nameValue" : {
"firstName" : "Fawsu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Primary Name"
},
{
"nameValue" : {
"firstName" : "Fausu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Spelling Variation"
}
]
}
}
}
]
}
But when I add the filter for Gender, it returns no result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
],
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}
Even I just use filter, it return no result
GET /person/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}
You are not getting any search result, because you are using the term query (in the filter clause). Term query will return the document only if it has an exact match.
A standard analyzer is used when no analyzer is specified, which will tokenize Male to male. So either you can search for male instead of Male or use any of the below solutions.
If you have not defined any explicit index mapping, you need to add .keyword to the gender field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after gender field). Try out this below query -
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender.keyword": "Male"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "66879128",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"gender": "Male",
"activeStatus": "Inactive",
"deceased": "No",
"nameDetails": {
"name": [
{
"nameValue": {
"firstName": "Fawsu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Primary Name"
},
{
"nameValue": {
"firstName": "Fausu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Spelling Variation"
}
]
}
}
}
]
If you have defined index mapping, then modify the mapping for gender field as shown below
{
"mappings": {
"properties": {
"gender": {
"type": "keyword"
}
}
}
}

Documents repeating in the query of elasticsearch

I'm new to elasticsearch. I need to build the query dynamically, where for each field name the the corresponding file is fetched
I have the below query, can anyone say if its the right approach? Also with this query, the documents are just repeating for one particular file name
Please let me know how to go about it
GET index_name/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"match_phrase": {
"field_name": "program"
}
},
{
"match_phrase": {
"field_value": "aaa-123"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"field_name": "species"
}
},
{
"match_phrase": {
"field_value": "mouse"
}
}
]
}
},
{
"bool": {
"must": [
{
"match_phrase": {
"field_name": "model name"
}
},
{
"match_phrase": {
"field_value": "b45"
}
}
]
}
}
]
}
},"aggs": {
"2": {
"terms": {
"field": "myfile_file_name.keyword",
"size": 1000,
"order": {
"_key": "asc"
}
},
"aggs": {
"3": {
"terms": {
"field": "field_name.keyword",
"size": 1000,
"order": {
"_key": "asc"
}
}
}
}
}
}
}
mapping and Output
{
"_index" : "test",
"_type" : "test_data",
"_id" : "123",
"_score" : 1.0,
"_source" : {
"document_id" : 123,
"m_id" : 1,
"source" : "ADDD",
"type" : "M",
"name" : "Animal",
"value" : "None",
"test_type" : "Test123",
"file_name" : "AA.zip",
"description" : "testing",
"program" : ["hello"],
"species" : ["mouse"],
"study" : ["Study1"],
"create_date" : "2020-08-20 11:51:21.152",
"update_date" : "2020-08-20 11:51:21.152",
"source_name" : "Anim",
"auth" : ["na"],
"treatment" : ["TR001", "TR002", "TR004"],
"timepoint" : ["72", "48"],
"findings_reports" : "na",
"model" : ["None",],
"additional" : "{'view': '', 'load': []}",
"data" : "Pre"
}
},
]
}
}

Using named queries (matched_queries) for nested types in Elasticsearch?

Using named queries, I can get a list of the matched_queries for boolean expressions such as:
(query1) AND (query2 OR query3 OR true)
Here is an example of using named queries to match on top-level document fields:
DELETE test
PUT /test
PUT /test/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"type": {
"type": "text"
},
"TAGS": {
"type": "nested"
}
}
}
POST /test/_doc
{
"name" : "doc1",
"type": "msword",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
},
{
"ID" : "tag3",
"TYPE" : "BASIC"
}
]
}
# (query1) AND (query2 or query3 or true)
GET /test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "doc1",
"_name": "query1"
}
}
}
],
"should": [
{
"match": {
"type": {
"query": "msword",
"_name": "query2"
}
}
},
{
"exists": {
"field": "type",
"_name": "query3"
}
}
]
}
}
}
The above query correctly returns all three matched_queries in the response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5753641,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TKNJ9G4BbvPS27u-ZYux",
"_score" : 1.5753641,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "BASIC"
},
{
"ID" : "wb1",
"TYPE" : "BASIC"
}
]
},
"matched_queries" : [
"query1",
"query2",
"query3"
]
}
]
}
}
However, I'm trying to run a similar search:
(query1) AND (query2 OR query3 OR true)
only this time on the nested TAGS object rather than top-level document fields.
I've tried the following query, but the problem is I need to supply the inner_hits object for nested objects in order to get the matched_queries in the response, and I can only add it to one of the three queries.
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
},
// "inner_hits" : {}
}
},
"should": [
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
// "inner_hits" : {}
}
},
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
},
// "inner_hits" : {}
}
}
]
}
}
}
Elasticsearch will complain if I add more than one 'inner_hits'. I've commented out the places above where I can add it, but each of these will only return the single matched query.
I want my response to this query to return:
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
Any help is much appreciated, thanks!
A colleague helpfully provided a solution to this; move the _named parameter to directly under each nested section:
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"_name": "tag1-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1"
}
}
}
}
},
"should": [
{
"nested": {
"_name": "tag2-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2"
}
}
}
}
},
{
"nested": {
"_name": "tag3-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3"
}
}
}
}
}
]
}
}
}
This correctly returns all three tags now in the matched_queries response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 2.9424875,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TaNy9G4BbvPS27u--oto",
"_score" : 2.9424875,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "DATASOURCE"
},
{
"ID" : "wb1",
"TYPE" : "WORKBOOK"
},
{
"ID" : "wb2",
"TYPE" : "WORKBOOK"
}
]
},
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
}
]
}
}

How to sort result set in order of matching words

How to sort result set in order of matching words?
I have a couple words "heinz meyer"
my query returns:
Heinz A. Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz Meyer
Karl-Heinz Meyer GmbH
but i need, order by positions matching like next :
Heinz Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz A. Meyer
Karl-Heinz Meyer GmbH
my query is:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "meyer*"
}
}],
"must_not": [],
"should": [],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
FINAL SOLUTION:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "mayer*"
}
}, {
"span_near": {
"clauses": [{
"span_term": {
"name": {
"value": "heinz"
}
}
}, {
"span_term": {
"name": {
"value": "mayer"
}
}
}],
"slop": 4,
"in_order": true
}
}],
"must_not": [],
"should": [{
"span_first": {
"match": {
"span_term": {
"name": "heinz"
}
},
"end": 1
}
}, {
"span_first": {
"match": {
"span_term": {
"name": "mayer"
}
},
"end": 2
}
}],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
You can implement the match query using combination of Span First, Span Term and Span Near Query
For the sake of simplicity, I've created a sample index with only one field labeled name of type text along with the below documents.
Documents:
POST sortindex/_doc/1
{
"name": "Heinz A. Meyer"
}
POST sortindex/_doc/2
{
"name": "Heinz Meyer GmbH Heizung-Sanitär"
}
POST sortindex/_doc/3
{
"name": "Heinz Meyer"
}
POST sortindex/_doc/4
{
"name": "Karl-Heinz Meyer GmbH"
}
Query:
POST sortindex/_search
{
"query": {
"bool": {
"must": [
{
"span_near": { <---- Span Near Query
"clauses": [
{
"span_term": { <---- Span Term Query
"name": {
"value": "heinz"
}
}
},
{
"span_term": {
"name": {
"value": "meyer"
}
}
}
],
"slop": 4, <---- Retrieve all docs having both heinz and meyer with distance of <= 4 words
"in_order": true <---- Heinz must always come before Meyer
}
}
],
"should": [
{
"span_first": { <---- Span First Query
"match": {
"span_term": { <---- Span Term Query
"name": "heinz"
}
},
"end": 1 <---- Retrieve docs having heinz's postition <= 1 and > 0 i.e. the first word
}
}
]
}
}
}
Notice that Span Near is placed in must clause whereas Span First is placed in should clause. That way the documents conforming to the should clause would get higher score as compared to the ones that doesn't match.
Internally for both, we search using Span Term which is nothing but like a term query but it is specifically mean for using with Span Queries.
I'd suggest you to go through the links if you would like to understand more on Span Queries.
From the link:
Span queries are low-level positional queries which provide expert
control over the order and proximity of the specified terms. These are
typically used to implement very specific queries on legal documents
or patents.
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 0.38327998,
"hits" : [
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.38327998,
"_source" : {
"name" : "Heinz Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.26893127,
"_source" : {
"name" : "Heinz Meyer GmbH Heizung-Sanitär"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.25940484,
"_source" : {
"name" : "Heinz A. Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.19908611,
"_source" : {
"name" : "Karl-Heinz Meyer GmbH"
}
}
]
}
}
You can go ahead and add the above query to the one you have.
Hope this helps!

Resources