Should and Filter combination in ElasticSearch - elasticsearch

I have this query which return the correct result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
]
}
}
}
and the result is below:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 3.6012557,
"hits" : [
{
"_index" : "person",
"_type" : "_doc",
"_id" : "70002",
"_score" : 3.6012557,
"_source" : {
"gender" : "Male",
"activeStatus" : "Inactive",
"deceased" : "No",
"nameDetails" : {
"name" : [
{
"nameValue" : {
"firstName" : "Fawsu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Primary Name"
},
{
"nameValue" : {
"firstName" : "Fausu",
"middleName" : "L.",
"surname" : "Pibba"
},
"nameType" : "Spelling Variation"
}
]
}
}
}
]
}
But when I add the filter for Gender, it returns no result
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"fuzzy": {
"nameDetails.name.nameValue.surname": {
"value": "Pibba",
"fuzziness": "AUTO"
}
}
},
{
"fuzzy": {
"nameDetails.nameValue.firstName": {
"value": "Fawsu",
"fuzziness": "AUTO"
}
}
}
],
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}
Even I just use filter, it return no result
GET /person/_search
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender": "Male"
}
}
]
}
}
}

You are not getting any search result, because you are using the term query (in the filter clause). Term query will return the document only if it has an exact match.
A standard analyzer is used when no analyzer is specified, which will tokenize Male to male. So either you can search for male instead of Male or use any of the below solutions.
If you have not defined any explicit index mapping, you need to add .keyword to the gender field. This uses the keyword analyzer instead of the standard analyzer (notice the ".keyword" after gender field). Try out this below query -
{
"query": {
"bool": {
"filter": [
{
"term": {
"gender.keyword": "Male"
}
}
]
}
}
}
Search Result:
"hits": [
{
"_index": "66879128",
"_type": "_doc",
"_id": "1",
"_score": 0.0,
"_source": {
"gender": "Male",
"activeStatus": "Inactive",
"deceased": "No",
"nameDetails": {
"name": [
{
"nameValue": {
"firstName": "Fawsu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Primary Name"
},
{
"nameValue": {
"firstName": "Fausu",
"middleName": "L.",
"surname": "Pibba"
},
"nameType": "Spelling Variation"
}
]
}
}
}
]
If you have defined index mapping, then modify the mapping for gender field as shown below
{
"mappings": {
"properties": {
"gender": {
"type": "keyword"
}
}
}
}

Related

Nested Query Elastic Search

Currently I am trying to search/filter a nested Document in Elastic Search Spring Data.
The Current Document Structure is:
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": [
{
"id": 1,
"status": true,
"issue": "Variation Issue"
},
{
"id": 32,
"status": false,
"issue": "NoiseIssue"
}
]
}
}
Now we need to filter out the policy_data which has Noise Issue and If there is no Policy Data which has Noise Issue the policy_data will be null inside the parent document.
I have tried to use this Query
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
},
{
"nested": {
"path": "policiesDetails.policy_data",
"query": {
"bool": {
"must": {
"terms": {
"policiesDetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
}
}
This works Fine to filter nested Document. But If the Nested Document does not has the match it removes the entire document from the view.
What i want is if nested filter does not match:-
{
"id": 1,
"customername": "Cust#123",
"policydetails": {
"address": {
"city": "Irvine",
"state": "CA",
"address2": "23994384, Out OF World",
"post_code": "92617"
},
"policy_data": null
}
If any nested document is not found then parent document will not be returned.
You can use should clause for policy_data. If nested document is found it will be returned under inner_hits otherwise parent document will be returned
{
"query": {
"bool": {
"must": [
{
"match": {
"customername": "Cust#345"
}
}
],
"should": [
{
"nested": {
"path": "policydetails.policy_data",
"inner_hits": {}, --> to return matched policy_data
"query": {
"bool": {
"must": {
"terms": {
"policydetails.policy_data.issue": [
"Noise Issue"
]
}
}
}
}
}
}
]
}
},
"_source": ["id","customername","policydetails.address"] --> selected fields
}
Result:
{
"_index" : "index116",
"_type" : "_doc",
"_id" : "f1SxGHoB5tcHqHDtAkTC",
"_score" : 0.2876821,
"_source" : {
"policydetails" : {
"address" : {
"city" : "Irvine",
"address2" : "23994384, Out OF World",
"post_code" : "92617",
"state" : "CA"
}
},
"id" : 1,
"customername" : "Cust#123"
},
"inner_hits" : {
"policydetails.policy_data" : {
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ] --> nested query result , matched document returned
}
}
}
}

Elasticsearch filter by multiple fields in an object which is in an array field

The goal is to filter products with multiple prices.
The data looks like this:
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
I would like to filter by membershipLevel and price. For example, if I am a silver member and query price range 0-10, the product should not appear, but if I am a gold member, the product "a" should appear. Is this kind of query supported by Elasticsearch?
You need to make use of nested datatype for price and make use of nested query for your use case.
Please see the below mapping, sample document, query and response:
Mapping:
PUT my_price_index
{
"mappings": {
"properties": {
"name":{
"type":"text"
},
"price":{
"type":"nested",
"properties": {
"membershipLevel":{
"type":"keyword"
},
"price":{
"type":"double"
}
}
}
}
}
}
Sample Document:
POST my_price_index/_doc/1
{
"name":"a",
"price":[
{
"membershipLevel":"Gold",
"price":"5"
},
{
"membershipLevel":"Silver",
"price":"50"
},
{
"membershipLevel":"Bronze",
"price":"100"
}
]
}
Query:
POST my_price_index/_search
{
"query": {
"nested": {
"path": "price",
"query": {
"bool": {
"must": [
{
"term": {
"price.membershipLevel": "Gold"
}
},
{
"range": {
"price.price": {
"gte": 0,
"lte": 10
}
}
}
]
}
},
"inner_hits": {} <---- Do note this.
}
}
}
The above query means, I want to return all the documents having price.price range from 0 to 10 and price.membershipLevel as Gold.
Notice that I've made use of inner_hits. The reason is despite being a nested document, ES as response would return the entire set of document instead of only the document specific to where the query clause is applicable.
In order to find the exact nested doc that has been matched, you would need to make use of inner_hits.
Below is how the response would return.
Response:
{
"took" : 128,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.9808291,
"_source" : {
"name" : "a",
"price" : [
{
"membershipLevel" : "Gold",
"price" : "5"
},
{
"membershipLevel" : "Silver",
"price" : "50"
},
{
"membershipLevel" : "Bronze",
"price" : "100"
}
]
},
"inner_hits" : {
"price" : {
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.9808291,
"hits" : [
{
"_index" : "my_price_index",
"_type" : "_doc",
"_id" : "1",
"_nested" : {
"field" : "price",
"offset" : 0
},
"_score" : 1.9808291,
"_source" : {
"membershipLevel" : "Gold",
"price" : "5"
}
}
]
}
}
}
}
]
}
}
Hope this helps!
Let me take show you how to do it, using the nested fields and query and filter context. I will take your example to show, you how to define index mapping, index sample documents, and search query.
It's important to note the include_in_parent param in Elasticsearch mapping, which allows us to use these nested fields without using the nested fields.
Please refer to Elasticsearch documentation about it.
If true, all fields in the nested object are also added to the parent
document as standard (flat) fields. Defaults to false.
Index Def
{
"mappings": {
"properties": {
"product": {
"type": "nested",
"include_in_parent": true
}
}
}
}
Index sample docs
{
"product": {
"price" : 5,
"membershipLevel" : "Gold"
}
}
{
"product": {
"price" : 50,
"membershipLevel" : "Silver"
}
}
{
"product": {
"price" : 100,
"membershipLevel" : "Bronze"
}
}
Search query to show Gold with price range 0-10
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Gold"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Result
"hits": [
{
"_index": "so-60620921-nested",
"_type": "_doc",
"_id": "1",
"_score": 1.0296195,
"_source": {
"product": {
"price": 5,
"membershipLevel": "Gold"
}
}
}
]
Search query to exclude Silver, with same price range
{
"query": {
"bool": {
"must": [
{
"match": {
"product.membershipLevel": "Silver"
}
}
],
"filter": [
{
"range": {
"product.price": {
"gte": 0,
"lte" : 10
}
}
}
]
}
}
}
Above query doesn't return any result as there isn't any matching result.
P.S :- this SO answer might help you to understand nested fields and query on them in detail.
You have to use Nested fields and nested query to archive this: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html
Define you Price property with type "Nested" and then you will be able to filter by every property of nested object

Using named queries (matched_queries) for nested types in Elasticsearch?

Using named queries, I can get a list of the matched_queries for boolean expressions such as:
(query1) AND (query2 OR query3 OR true)
Here is an example of using named queries to match on top-level document fields:
DELETE test
PUT /test
PUT /test/_mapping/_doc
{
"properties": {
"name": {
"type": "text"
},
"type": {
"type": "text"
},
"TAGS": {
"type": "nested"
}
}
}
POST /test/_doc
{
"name" : "doc1",
"type": "msword",
"TAGS" : [
{
"ID" : "tag1",
"TYPE" : "BASIC"
},
{
"ID" : "tag2",
"TYPE" : "BASIC"
},
{
"ID" : "tag3",
"TYPE" : "BASIC"
}
]
}
# (query1) AND (query2 or query3 or true)
GET /test/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": {
"query": "doc1",
"_name": "query1"
}
}
}
],
"should": [
{
"match": {
"type": {
"query": "msword",
"_name": "query2"
}
}
},
{
"exists": {
"field": "type",
"_name": "query3"
}
}
]
}
}
}
The above query correctly returns all three matched_queries in the response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5753641,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TKNJ9G4BbvPS27u-ZYux",
"_score" : 1.5753641,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "BASIC"
},
{
"ID" : "wb1",
"TYPE" : "BASIC"
}
]
},
"matched_queries" : [
"query1",
"query2",
"query3"
]
}
]
}
}
However, I'm trying to run a similar search:
(query1) AND (query2 OR query3 OR true)
only this time on the nested TAGS object rather than top-level document fields.
I've tried the following query, but the problem is I need to supply the inner_hits object for nested objects in order to get the matched_queries in the response, and I can only add it to one of the three queries.
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1",
"_name": "tag1-query"
}
}
},
// "inner_hits" : {}
}
},
"should": [
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2",
"_name": "tag2-query"
}
}
},
// "inner_hits" : {}
}
},
{
"nested": {
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3",
"_name": "tag3-query"
}
}
},
// "inner_hits" : {}
}
}
]
}
}
}
Elasticsearch will complain if I add more than one 'inner_hits'. I've commented out the places above where I can add it, but each of these will only return the single matched query.
I want my response to this query to return:
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
Any help is much appreciated, thanks!
A colleague helpfully provided a solution to this; move the _named parameter to directly under each nested section:
GET /test/_search
{
"query": {
"bool": {
"must": {
"nested": {
"_name": "tag1-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag1"
}
}
}
}
},
"should": [
{
"nested": {
"_name": "tag2-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag2"
}
}
}
}
},
{
"nested": {
"_name": "tag3-query",
"path": "TAGS",
"query": {
"match": {
"TAGS.ID": {
"query": "tag3"
}
}
}
}
}
]
}
}
}
This correctly returns all three tags now in the matched_queries response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 2.9424875,
"hits" : [
{
"_index" : "test",
"_type" : "_doc",
"_id" : "TaNy9G4BbvPS27u--oto",
"_score" : 2.9424875,
"_source" : {
"name" : "doc1",
"type" : "msword",
"TAGS" : [
{
"ID" : "ds1",
"TYPE" : "DATASOURCE"
},
{
"ID" : "wb1",
"TYPE" : "WORKBOOK"
},
{
"ID" : "wb2",
"TYPE" : "WORKBOOK"
}
]
},
"matched_queries" : [
"tag1-query",
"tag2-query",
"tag3-query"
]
}
]
}
}

How to sort result set in order of matching words

How to sort result set in order of matching words?
I have a couple words "heinz meyer"
my query returns:
Heinz A. Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz Meyer
Karl-Heinz Meyer GmbH
but i need, order by positions matching like next :
Heinz Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz A. Meyer
Karl-Heinz Meyer GmbH
my query is:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "meyer*"
}
}],
"must_not": [],
"should": [],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
FINAL SOLUTION:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "mayer*"
}
}, {
"span_near": {
"clauses": [{
"span_term": {
"name": {
"value": "heinz"
}
}
}, {
"span_term": {
"name": {
"value": "mayer"
}
}
}],
"slop": 4,
"in_order": true
}
}],
"must_not": [],
"should": [{
"span_first": {
"match": {
"span_term": {
"name": "heinz"
}
},
"end": 1
}
}, {
"span_first": {
"match": {
"span_term": {
"name": "mayer"
}
},
"end": 2
}
}],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
You can implement the match query using combination of Span First, Span Term and Span Near Query
For the sake of simplicity, I've created a sample index with only one field labeled name of type text along with the below documents.
Documents:
POST sortindex/_doc/1
{
"name": "Heinz A. Meyer"
}
POST sortindex/_doc/2
{
"name": "Heinz Meyer GmbH Heizung-Sanitär"
}
POST sortindex/_doc/3
{
"name": "Heinz Meyer"
}
POST sortindex/_doc/4
{
"name": "Karl-Heinz Meyer GmbH"
}
Query:
POST sortindex/_search
{
"query": {
"bool": {
"must": [
{
"span_near": { <---- Span Near Query
"clauses": [
{
"span_term": { <---- Span Term Query
"name": {
"value": "heinz"
}
}
},
{
"span_term": {
"name": {
"value": "meyer"
}
}
}
],
"slop": 4, <---- Retrieve all docs having both heinz and meyer with distance of <= 4 words
"in_order": true <---- Heinz must always come before Meyer
}
}
],
"should": [
{
"span_first": { <---- Span First Query
"match": {
"span_term": { <---- Span Term Query
"name": "heinz"
}
},
"end": 1 <---- Retrieve docs having heinz's postition <= 1 and > 0 i.e. the first word
}
}
]
}
}
}
Notice that Span Near is placed in must clause whereas Span First is placed in should clause. That way the documents conforming to the should clause would get higher score as compared to the ones that doesn't match.
Internally for both, we search using Span Term which is nothing but like a term query but it is specifically mean for using with Span Queries.
I'd suggest you to go through the links if you would like to understand more on Span Queries.
From the link:
Span queries are low-level positional queries which provide expert
control over the order and proximity of the specified terms. These are
typically used to implement very specific queries on legal documents
or patents.
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 0.38327998,
"hits" : [
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.38327998,
"_source" : {
"name" : "Heinz Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.26893127,
"_source" : {
"name" : "Heinz Meyer GmbH Heizung-Sanitär"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.25940484,
"_source" : {
"name" : "Heinz A. Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.19908611,
"_source" : {
"name" : "Karl-Heinz Meyer GmbH"
}
}
]
}
}
You can go ahead and add the above query to the one you have.
Hope this helps!

Filter Full Text Search based on User ID

GET _search
{
"query": {
"match": {
"content": "this test"
}
}
}
This gave me below result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 6,
"successful" : 6,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 0.5753642,
"hits" : [
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "6bb174ab-a4ce-4409-a626-c9a42c98b89e",
"_score" : 0.5753642,
"_source" : {
"user_id" : 13,
"content" : "This is a test"
}
},
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "1304cf2e-a1d4-40ca-9876-9abb08c4474d",
"_score" : 0.36464313,
"_source" : {
"user_id" : 10,
"content" : "This is a test"
}
},
{
"_index" : "inbox",
"_type" : "mailbox",
"_id" : "623c093c-4408-445e-abb1-460d2c5004cd",
"_score" : 0.36464313,
"_source" : {
"user_id" : 15,
"content" : "This is a test"
}
}
]
}
}
Which is good. However, I need to filter them by user_id. I mean I need to score only specific user and their content.
GET _search
{
"query": {
"match": {
"content": "this test",
"user_id": 10
}
}
}
When I add user_id i get this error:
{
"error": {
"root_cause": [
{
"type": "parsing_exception",
"reason": "[match] query doesn't support multiple fields, found [content] and [user_id]",
"line": 5,
"col": 18
}
],
"type": "parsing_exception",
"reason": "[match] query doesn't support multiple fields, found [content] and [user_id]",
"line": 5,
"col": 18
},
"status": 400
}
Why? And How to properly filter based on user_id?
You can use term query to filter the result by user_id.
Update your query as below:
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this test"
}
}
],
"filter": [
{
"term": {
"user_id": 10
}
}
]
}
}
}
The query should be like this:
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this test"
}
},
{
"match": {
"user_id": 10
}
}
]
}
}
}
Use bool query to combine filters
{
"query": {
"bool": {
"must": [
{
"match": {
"content": "this is content"
}
},
{
"term": {
"user_id": {
"value": 47545
}
}
}
]
}
}
}

Resources