Elasticsearch match one array with another array - elasticsearch

Let's say I have two indexes kids and outings_for_kids with the following data
kids
[
{
"name": "little kid 1",
"i_like":["drawing","teddybears"]
},
]
outings for kids
[
{
"name": "Teddybear drawing fights with apples!",
"for_kids_that_like":["apples","teddybears","drawing", "play outside games"]
},
{
"name": "drawing and teddies!",
"for_kids_that_like":["teddybears","drawing"]
}
]
I want to find an outing that likes the same things little kid 1 likes and a lower score if it has more.
Little kid 1 should not match 100% with the first outing. It has what little kid 1 wants, but but it has more e.g. apples, it should match 50%.
It should match 100% with the second outing.

This will be a 2 step process:
Get i_like value from fields index
Use i_like from step 1 to query outings index
Use terms query to match each value
Use script to compare array size with number of values
Use constant score to give same score based on index count
Query
GET outings/_search
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()==2" --> replace 2 with size of elements searched
}
}
]
}
},
"boost": 100
}
},
{
"constant_score": {
"filter": {
"bool": {
"must": [
{
"term": {
"for_kids_that_like": {
"value": "teddybears"
}
}
},
{
"term": {
"for_kids_that_like": {
"value": "drawing"
}
}
},
{
"script": {
"script": "doc['for_kids_that_like.keyword'].size()>2"
}
}
]
}
},
"boost": 50
}
}
]
}
}
}
Result:
"hits" : [
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IH7tVHEBbLcSRUWr6wPj",
"_score" : 100.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing"
]
}
},
{
"_index" : "outings",
"_type" : "_doc",
"_id" : "IX7zVHEBbLcSRUWrhgM9",
"_score" : 50.0,
"_source" : {
"name" : "Teddybear drawing fights with apples!",
"for_kids_that_like" : [
"teddybears",
"drawing",
"apples"
]
}
}
]
If you just want to show exact match documents on top followed by partial matches then you don't need constant score(must query with term search will work). By default exact matches are given higher score

Related

Search multi field with term query

I have some documents in a index..
"hits" : [
{
"_index" : "siem-referencedata-table-table2d526444eff99b1706053853ef7",
"_type" : "_doc",
"_id" : "0table222cc244b04b59d9ecafb0476e6",
"_score" : 1.0,
"_source" : {
"column-name1" : "10.1.10.1",
"column-name2" : "range(100,200)",
"column-name3" : "nam3",
"create_time" : "2022-05-21 03:30:39",
"last_seen" : "2022-05-21 03:30:39",
"id" : "0table222cc244b04b59d9ecafb0476e6"
}
},...
I want to search documents with three fields column-name1, column-name2 and column-name3.
I use below query with term to search exact considered word:
{
"query": {
"bool": {
"must": [
{
"term": {
"column-name1": {"value":"10.1.10.1"}
}
},
{
"term": {
"column-name2": {"value":"range(100,200)"}
}
},
{
"term": {
"column-name3": {"value":"nam3"}
}
}
]
}
}
}
It works without "column-name2": {"value":"range(100,200)"}.. what should I do with range ability? Is there another way to handle this?
The query solved with adding keyword to filed as below:
{
"query": {
"bool": {
"must": [
{
"term": {
"column-name1.keyword": {"value":"10.1.10.1"}
}
},
{
"term": {
"column-name2.keyword": {"value":"range(100,200)"}
}
},
{
"term": {
"column-name3.keyword": {"value":"nam3"}
}
}
]
}
}
}
Thank from Barkha Jain!

Elastic search dynamic field mapping with range query on price field

I have two fields in my elastic search which is lowest_local_price and lowest_global_price.
I want to map dynamic value to third field price on run time based on local or global country.
If local country matched then i want to map lowest_local_price value to price field.
If global country matched then i want to map lowest_global_price value to price field.
If local or global country matched then i want to apply range query on the price field and boost that doc by 2.0.
Note : This is not compulsary filter or query, if matched then just want to boost the doc.
I have tried below solution but does not work for me.
Query 1:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price]]
],
"boost" => 2.0
]
]
];
Query 2:
$params["body"] = [
"runtime_mappings" => [
"price" => [
"type" => "double",
"script" => [
"source" => "if (params['_source']['country_en_name'] == '$country_name' ) { emit(params['_source']['lowest_local_price']); } else { emit( params['_source']['global_rates']['$country->id']['lowest_global_price']); }"
]
]
],
"query" => [
"bool" => [
"filter" => [
"range" => [ "price" => [ "gte" => $min_price, "boost" => 2.0]]
],
]
]
];
None of them working for me, because it can boost the doc. I know filter does not work with boost, then what is the solution for dynamic field mapping with range query and boost?
Please help me to solve this query.
Thank you in advance!
You can (most likely) achieve what you want without runtime_mappings by using a combination of bool queries, here's how.
Let's define test mapping
We need to clarify what mapping we are working with, because different field types require different query types.
Let's assume that your mapping looks like this:
PUT my-index-000001
{
"mappings": {
"dynamic": "runtime",
"properties": {
"country_en_name": {
"type": "text"
},
"lowest_local_price": {
"type": "float"
},
"global_rates": {
"properties": {
"UK": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"FR": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
},
"US": {
"properties":{
"lowest_global_price": {
"type": "float"
}
}
}
}
}
}
}
}
Note that country_en_name is of type text, in general such fields should be indexed as keyword but for the sake of demonstration of the use of runtime_mappings I kept it text and will show later how to overcome this limitation.
bool is the same as if for Elasticsearch
The query without runtime mappings might look like this:
POST my-index-000001/_search
{
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"match": {
"country_en_name": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
This can be interpreted as the following:
Any document
OR (
(document with country_en_name=UK AND lowest_local_price > X)
OR
(document with global_rates.UK.lowest_global_price > X)
)[boost this part of OR]
The match_all is needed to return also documents that do not match the other queries.
How will the response of the query look like?
Let's put some documents in the ES:
POST my-index-000001/_doc/1
{
"country_en_name": "UK",
"lowest_local_price": 1500,
"global_rates": {
"FR": {
"lowest_global_price": 1000
},
"US": {
"lowest_global_price": 1200
}
}
}
POST my-index-000001/_doc/2
{
"country_en_name": "FR",
"lowest_local_price": 900,
"global_rates": {
"UK": {
"lowest_global_price": 950
},
"US": {
"lowest_global_price": 1500
}
}
}
POST my-index-000001/_doc/3
{
"country_en_name": "US",
"lowest_local_price": 950,
"global_rates": {
"UK": {
"lowest_global_price": 1100
},
"FR": {
"lowest_global_price": 1000
}
}
}
Now the result of the search query above will be something like:
{
...
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 4.9616585,
"hits" : [
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "1",
"_score" : 4.9616585,
"_source" : {
"country_en_name" : "UK",
"lowest_local_price" : 1500,
...
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "3",
"_score" : 3.0,
"_source" : {
"country_en_name" : "US",
"lowest_local_price" : 950,
"global_rates" : {
"UK" : {
"lowest_global_price" : 1100
},
...
}
}
},
{
"_index" : "my-index-000001",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"country_en_name" : "FR",
"lowest_local_price" : 900,
"global_rates" : {
"UK" : {
"lowest_global_price" : 950
},
...
}
}
}
]
}
}
Note that document with _id:2 is on the bottom because it didn't match any of the boosted queries.
Will runtime_mappings be of any use?
Runtime mappings are useful in case there's an existing mapping with data types that do not permit to execute a certain type of query. In previous versions (before 7.11) one would have to do a reindex in such cases, but now it is possible to use runtime mappings (but the query is more expensive).
In our case, we have got country_en_name indexed as text which is suited for full-text search and not for exact lookups. We should rather use keyword instead. This is how the query may look like with the help of runtime_mappings:
POST my-index-000001/_search
{
"runtime_mappings": {
"country_en_name_keyword": {
"type": "keyword",
"script": {
"source": "emit(params['_source']['country_en_name'])"
}
}
},
"query": {
"bool": {
"should": [
{
"match_all": {}
},
{
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"country_en_name_keyword": "UK"
}
},
{
"range": {
"lowest_local_price": {
"gte": 1000
}
}
}
]
}
},
{
"range": {
"global_rates.UK.lowest_global_price": {
"gte": 1000
}
}
}
],
"boost": 2
}
}
]
}
}
}
Notice how we created a new runtime field country_en_name_keyword with type keyword and used a term lookup instead of match query.

Update the score of Pinned Documents in Elastic Search

I have a requirement to show some documents always on top of search results and for that, I have used pinned query to pin some documents and the pinned documents will have a score value of 1.7014122E38.
But I have another requirement to modify this score of pinned documents which I'm unable to achieve at the query level.
Sample Documents
"docs": [
{
"_id": 1,
"name": "jack"
},
{
"_id": 2,
"name": "ryan"
},
{
"_id": 3,
"name": "mark"
},
{
"_id": 4,
"name": "taylor"
},
{
"_id": 5,
"name": "taylor"
}
]
}
ES Query
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [
"3"
],
"organic": {
"query": {
"bool": {
"must": [
{
"multi_match": {
"name": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
}
]
}
}
}
Now I want to multiply the pinned document score weight with some value which I'm unable to achieve in ES.
Can someone please help me to solve this requirement?
Since the pinned queries' scores are calculated at query time, there's no way of knowing what they're will end up being. It could be 1.7014122E38 but also 1.7014122402528844E38 etc.
What you could do is use a sort script and check whether the implicit score is unusually high (I chose Integer.MAXV_VALUE as the boundary) which'd indicate whether or not you're dealing with a pinned. If that's the case, you can override the pinned documents' scores however you like.
POST your-index/_search?track_scores&filter_path=hits.hits._id,hits.hits._source,hits.hits.sort
{
"query": {
"bool": {
"should": [
{
"pinned": {
"ids": [ "3" ],
"organic": {
"bool": {
"must": [
{
"multi_match": {
"query": "taylor",
"fields": [
"name"
]
}
}
]
}
}
}
}
]
}
},
"sort": [
{
"_script": {
"order": "desc",
"type": "number",
"script": {
"source": "_score >= Integer.MAX_VALUE ? params.score_rewrite : _score",
"params": {
"score_rewrite": 42
}
}
}
}
]
}
Note that it's necessary to set the track_scores URI parameter because when sorting on a field, the scores are not computed by default.
That way, the resulting hits would look along the lines of:
{
"hits" : {
"hits" : [
{
"_id" : "3", <-- pinned ID
"_source" : {
"name" : "mark"
},
"sort" : [
42.0 <-- overridden sort
]
},
{
"_id" : "4",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926 <-- default sort
]
},
{
"_id" : "5",
"_source" : {
"name" : "taylor"
},
"sort" : [
0.875468730926
]
}
]
}
}
P.S.: Integer.MAX_VALUE is arbitrary and there's absolutely no guarantee that it'll catch all pinned docs. In other words, a bit of experimentation will be needed to choose a bulletproof boundary.

Elasticsearch multi-match fields doesn't include query string

I am doing a multi-match search using the following query object:
{
_source: [
'baseline',
'cdrp',
'date',
'description',
'dev_status',
'element',
'event',
'id'
],
track_total_hits: true,
query: {
bool: {
filter: [{name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]}],
should: [
{
multi_match:{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
}
},
highlight: { fields: { '*': {} } },
sort: [],
from: 0,
size: 50
}
I'm expecting the word "national" to be found within description or narrative.* fields but only one record out of 2 returned meet my expectations. I'm trying to understand why.
elasticsearch.config.ts
"settings": {
"analysis": {
"analyzer": {
"search_synonyms": {
"tokenizer": "whitespace",
"filter": [
"graph_synonyms",
"lowercase",
"asciifolding"
],
}
}
}
},
"mappings": {
"properties": {
"description": {
"type": "text",
"analyzer": "search_synonyms"
},
"narrative": {
"type":"object",
"properties":{
"_all":{
"type": "text",
"analyzer": "search_synonyms"
}
}
},
}
}
Should clause works like OR, it doesn't filter out documents it affects scoring. Documents which match should clause are scored higher.
If you want to filter on multi-match you can move it inside filter clause
filter: [
{
name: "baseline", values: ["1f.0.1.0", "1f.1.8.3"]
},
{
multi_match:
{
query: "national",
fields: ["cdrp","description","narrative.*","title","cop"]
}
}
]
Filter vs Must:- Both return documents matching clauses specified. Filter doesn't score documents. So if you are not interested in score of documents or are not concerned with order of documents returned, you can use filter. So both are same with difference of scoring.
Documents with more matches are scored higher
Multi_match by default uses best_fields
Finds documents which match any field, but uses the _score from the
best field.
It uses score returned for field with maximum number of matches to calculate score for each document.
Example
Document 1 has matches in two field , field1 (score 2), field2 (score 1)
Document 2 has matches in one field , field2 (score 3)
Documnet 2 will be ranked higher even if 1 field has matched.
You can change it to most_fields
Finds documents which match any field and combines the _score from
each field.
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"query": "test",
"fields": [],
"type": "most_fields"
}
}
]
}
}
}
Still a document with fewer number of fields matched can be ranked higher due to high score in a field caused by multiple terms.
If you want to give same score to a single field irrespective of number of tokens matched. You need to use constant_score query
{
"query": {
"bool": {
"should": [
{
"constant_score": {
"filter": {
"term": {
"field1": "test"
}
}
}
},
{
"constant_score": {
"filter": {
"term": {
"field2": "test"
}
}
}
}
]
}
},
"highlight": {
"fields": {
"field1": {},
"field2": {}
}
}
}
Result:
"hits" : [
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iSCe6nEB8J88APx3YBGn",
"_score" : 2.0, --> one score per field matched
"_source" : {
"field1" : "test",
"field2" : "test"
},
"highlight" : {
"field1" : [
"<em>test</em>"
],
"field2" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iiCe6nEB8J88APx3ghF-",
"_score" : 1.0,
"_source" : {
"field1" : "test",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em>"
]
}
},
{
"_index" : "index18",
"_type" : "_doc",
"_id" : "iyCf6nEB8J88APx3UhF8",
"_score" : 1.0,
"_source" : {
"field1" : "test do",
"field2" : "abc"
},
"highlight" : {
"field1" : [
"<em>test</em> do"
]
}
}
]
}

How to sort result set in order of matching words

How to sort result set in order of matching words?
I have a couple words "heinz meyer"
my query returns:
Heinz A. Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz Meyer
Karl-Heinz Meyer GmbH
but i need, order by positions matching like next :
Heinz Meyer
Heinz Meyer GmbH Heizung-Sanitär
Heinz A. Meyer
Karl-Heinz Meyer GmbH
my query is:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "meyer*"
}
}],
"must_not": [],
"should": [],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
FINAL SOLUTION:
{
"query": {
"bool": {
"must": [{
"wildcard": {
"name": "heinz*"
}
}, {
"wildcard": {
"name": "mayer*"
}
}, {
"span_near": {
"clauses": [{
"span_term": {
"name": {
"value": "heinz"
}
}
}, {
"span_term": {
"name": {
"value": "mayer"
}
}
}],
"slop": 4,
"in_order": true
}
}],
"must_not": [],
"should": [{
"span_first": {
"match": {
"span_term": {
"name": "heinz"
}
},
"end": 1
}
}, {
"span_first": {
"match": {
"span_term": {
"name": "mayer"
}
},
"end": 2
}
}],
"filter": {
"bool": {
"must": [{
"range": {
"latestRevenueStatistics.revenue": {
"gte": "0",
"lte": "40000000"
}
}
}, {
"range": {
"latestRevenueStatistics.number_of_employees": {
"gte": "0",
"lte": "300"
}
}
}, {
"term": {
"addresses.postal_code_length": 5
}
}]
}
}
}
},
"from": 0,
"size": 10
}
You can implement the match query using combination of Span First, Span Term and Span Near Query
For the sake of simplicity, I've created a sample index with only one field labeled name of type text along with the below documents.
Documents:
POST sortindex/_doc/1
{
"name": "Heinz A. Meyer"
}
POST sortindex/_doc/2
{
"name": "Heinz Meyer GmbH Heizung-Sanitär"
}
POST sortindex/_doc/3
{
"name": "Heinz Meyer"
}
POST sortindex/_doc/4
{
"name": "Karl-Heinz Meyer GmbH"
}
Query:
POST sortindex/_search
{
"query": {
"bool": {
"must": [
{
"span_near": { <---- Span Near Query
"clauses": [
{
"span_term": { <---- Span Term Query
"name": {
"value": "heinz"
}
}
},
{
"span_term": {
"name": {
"value": "meyer"
}
}
}
],
"slop": 4, <---- Retrieve all docs having both heinz and meyer with distance of <= 4 words
"in_order": true <---- Heinz must always come before Meyer
}
}
],
"should": [
{
"span_first": { <---- Span First Query
"match": {
"span_term": { <---- Span Term Query
"name": "heinz"
}
},
"end": 1 <---- Retrieve docs having heinz's postition <= 1 and > 0 i.e. the first word
}
}
]
}
}
}
Notice that Span Near is placed in must clause whereas Span First is placed in should clause. That way the documents conforming to the should clause would get higher score as compared to the ones that doesn't match.
Internally for both, we search using Span Term which is nothing but like a term query but it is specifically mean for using with Span Queries.
I'd suggest you to go through the links if you would like to understand more on Span Queries.
From the link:
Span queries are low-level positional queries which provide expert
control over the order and proximity of the specified terms. These are
typically used to implement very specific queries on legal documents
or patents.
Response:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 0.38327998,
"hits" : [
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.38327998,
"_source" : {
"name" : "Heinz Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "2",
"_score" : 0.26893127,
"_source" : {
"name" : "Heinz Meyer GmbH Heizung-Sanitär"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.25940484,
"_source" : {
"name" : "Heinz A. Meyer"
}
},
{
"_index" : "sortindex",
"_type" : "_doc",
"_id" : "4",
"_score" : 0.19908611,
"_source" : {
"name" : "Karl-Heinz Meyer GmbH"
}
}
]
}
}
You can go ahead and add the above query to the one you have.
Hope this helps!

Resources