How to score based on number of substring occurrences within a field in ElasticSearch - elasticsearch

I have an ElasticSearch document setup like so
{
metadata: {
content: "",
other_fields: ...
},
other_fields: ...
}
and am querying the content field like so
{
"function_score": {
"query": {
"multi-match": {
"query": "searchTerm",
"fields": ["content", "other_fields", ...]
}
},
"functions": [
{
"weight": {
"filter": {
"match": {
"metadata.content": {
"query": "searchTerm"
},
"weight": 3
}
}
},
{
other weight functions for the other fields with custom score values
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
This works great calculating the score for all of the fields based on matches (including the content field). However, I want to multiply the content field score by the number of occurrences of the search term in the content field.
For example when I search for "test" in the doc below, the score is 3 but I want it to be 9:
{
metadata: {
content: "test test test"
}
}
Any suggestions on how I can do this?

I could not replicate your query -- too many typos/syntax issues. Here's what I could reconstruct:
Let's first create some sample docs
POST metaa/_doc
{
"metadata": {
"content": "test"
}
}
POST metaa/_doc
{
"metadata": {
"content": "test test"
}
}
POST metaa/_doc
{
"metadata": {
"content": "test test test"
}
}
then querying & using script_score, inspired by this cool answer:
GET metaa/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "test",
"fields": [
"metadata.content",
"other_fields"
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": """
def docval = doc['metadata.content.keyword'].value;
String temp = docval.replace('test', "");
return (docval.length() - temp.length()) / 4;
"""
}
},
"filter": {
"match": {
"metadata.content": {
"query": "test"
}
}
},
"weight": 3
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
yielding
[
{
"_index":"metaa",
"_type":"_doc",
"_id":"suh763EBG_KW3EFnjwNq",
"_score":9.0,
"_source":{
"metadata":{
"content":"test test test"
}
}
},
{
"_index":"metaa",
"_type":"_doc",
"_id":"seh763EBG_KW3EFnewP6",
"_score":6.0,
"_source":{
"metadata":{
"content":"test test"
}
}
},
{
"_index":"metaa",
"_type":"_doc",
"_id":"s-h_63EBG_KW3EFnMQMU",
"_score":3.0,
"_source":{
"metadata":{
"content":"test"
}
}
}
]
_score = 9 -> 6 -> 3.
Note: you may want to perform some validity checks within the script (could be a simple try/catch). But that's an exercise for the reader.

Related

how to match multiple fields inside filter keyword in elastic search query?

I want to add one more field inside match inside function block in my query, but when i am adding, i am getting an error ------ "reason" : "[match] query doesn't support multiple fields, found [gender] and [id]",
How do i do it?
GET exp/_search
{
"_source": ["score","answer","gender","id"]
, "query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"match":{
"gender":"male",
"id":1
}
},
"weight": 2
}
]
}
}
}
You can create bool query inside filter and it will be resolved your issue. match query does not support providing 2 diffrent field and values. You can use bool query for same purpose.
{
"_source": [
"score",
"answer",
"gender",
"id"
],
"query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"bool": {
"must": [
{
"match": {
"gender": "male"
}
},
{
"match": {
"id": 1
}
}
]
}
},
"weight": 2
}
]
}
}
}
Also, If you want to apply two different boosting value for gender and id then you can give two filter clause as shown below:
{
"_source": [
"score",
"answer",
"gender",
"id"
],
"query": {
"function_score": {
"query": {
"match": {
"score": 10
}
},
"functions": [
{
"filter": {
"match": {
"gender": "male"
}
},
"weight": 2
},
{
"filter": {
"match": {
"id": 1
}
},
"weight": 1
}
]
}
}
}

Full-text search through complex structure Elasticsearch

I have the following issue in case of a full-text search in Elasticsearch. I would like to search for all indexed attributes. However, one of my Project attributes is a very complex array of hashes/objects:
[
{
"title": "Group 1 title",
"name": "Group 1 name",
"id": "group_1_id",
"items": [
{
"pos": "1",
"title": "Position 1 title"
},
{
"pos": "1.1",
"title": "Position 1.1 title",
"description": "<p>description</p>",
"extra_description": {
"rotation": "2 years",
"amount": "1.947m²"
},
"inputs": {
"unit_price": true,
"total_net": true
},
"additional_inputs": [
{
"name": "additonal_input_name",
"label": "Additional input label:",
"placeholder": "Additional input placeholder",
"description": "Additional input description",
"type": "text"
}
]
}
]
}
]
My mappings look like this:
{:title=>{:type=>"text", :analyzer=>"english"},
:description=>{:type=>"text", :analyzer=>"english"},
:location=>{:type=>"keyword"},
:company=>{:type=>"keyword"},
:created_at=>{:type=>"date"},
:due_date=>{:type=>"date"},
:specification=>
{:type=>:nested,
:properties=>
{:id=>{:type=>"keyword"},
:title=>{:type=>"text"},
:items=>
{:type=>:nested,
:properties=>
{:pos=>{:type=>"keyword"},
:title=>{:type=>"text"},
:description=>{:type=>"text", :analyzer=>"english"},
:extra_description=>{:type=>:nested, :properties=>{:rotation=>{:type=>"keyword"}, :amount=>{:type=>"keyword"}}},
:additional_inputs=>
{:type=>:nested,
:properties=>
{:label=>{:type=>"keyword"},
:placeholder=>{:type=>"text"},
:description=>{:type=>"text"},
:type=>{:type=>"keyword"},
:name=>{:type=>"keyword"}
}
}
}
}
}
}
}
The question is, how to properly seek through it? For no nested attributes, it works as a charm, but for instance, I would like to seek by title in the specification, no result is returned. I tried both:
query:
{ nested:
{
multi_match: {
query: keyword,
fields: ['title', 'description', 'company', 'location', 'specification']
}
}
}
Or
{
nested: {
path: 'specification',
query: {
multi_match: {
query: keyword
}
}
}
}
Without any result.
Edit:
It's with elasticsearch-ruby for Ruby.
I am trying to query by: MODEL_NAME.all.search(query: with_specification("Group 1 title")) where with_specification is:
def with_specification(keyword)
{
bool: {
should: [
{
nested: {
path: 'specification',
query: {
bool: {
should: [
{
match: {
'specification.title': keyword,
}
},
{
multi_match: {
query: keyword,
fields: [
'specification.title',
'specification.id'
]
}
},
{
nested: {
path: 'specification.items',
query: {
match: {
'specification.items.title': keyword,
}
}
}
}
]
}
}
}
}
]
}
}
end
Querying on multi-level nested documents must follow a certain schema.
You cannot multi-match on nested & non-nested fields at the same time and/or query on nested fields under different paths.
You can wrap your queries in a bool-should but keep the 2 rules above in mind:
GET your_index/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "specification",
"query": {
"bool": {
"should": [
{
"match": {
"specification.title": "TEXT" <-- standalone match
}
},
{
"multi_match": { <-- multi-match but 1st level path
"query": "TEXT",
"fields": [
"specification.title",
"specification.id"
]
}
},
{
"nested": {
"path": "specification.items", <-- 2nd level path
"query": {
"match": {
"specification.items.title": "TEXT"
}
}
}
}
]
}
}
}
}
]
}
}
}

Sort Elasticsearch results based on field value

Assuming I have 3 documents (users), and they have knowledge of multiple programming languages - with scores associated, as described below, how can I search for multiple fields (multi-match for example), and if some search-keywords hits a language, sort by its score?
// user1
{
"name": "John Bayes",
"prog_langs": [
{
"name": "python",
"score": 10
},
{
"name": "java",
"score": 500
}
]
}
// user2
{
"name": "John Russel",
"prog_langs": [
{
"name": "python",
"score": 100
},
{
"name": "PHP",
"score": 200
}
]
}
// user3
{
"name": "Terry Guy",
"prog_langs": [
{
"name": "C++",
"score": 600
},
{
"name": "Javascript",
"score": 200
}
]
}
For example: searching "John python"
Should return user1 and user2, but user2 showing up first
**I've been trying to use sort and functions, but I think they always use lowest/highest/average values of score.
Thanks!
[Edit]
**In the meantime I got it working in a testing way to see if without full-text/multi-matched works, and I found out I had to make "prog_langs" nested, so I changed the mapping and it works as expected.
Now I'm only missing the part where a full-text search with multi-match merges with current query.
Thanks again!
I managed to fix the query and now it's working as expected.
Before posting my solution, just have to leave a few things to keep in mind:
I made a new mapping, and added some nested objects, so my original query had to suffer some changes (prog_langs are now of type nested)
I wanted at least two fields to match, being mandatory which should match at least once
{
"query": {
"bool": {
"must": [
{
"query": {
"match": {
"name": {
"query": "john python",
"boost": 5
}
}
}
},
{
"bool": {
"should": [
{
"nested": {
"path": "prog_langs",
"query": {
"match": {
"prog_langs.name": {
"query": "john python",
"boost": 5
}
}
}
}
}
]
}
}
],
"should": [
{
"function_score": {
"query": {
"match": {
"prog_langs.name": "john python"
}
},
"functions": [
{
"script_score": {
"script": "_score * (1 + doc['prog_langs.score'].value)"
}
}
]
}
}
]
}
},
"highlight": {
"fields": {
"name": {},
"prog_langs.name": {}
}
}
}

elasticsearch boost query in feild having multiple value

I have some document in elasticsearch index. Here is the sample document
DOC1
{
"feild1":["hi","hello","goodmorning"]
"feild2":"some string"
"feild3":{}
}
DOC2
{
"feild1":["hi","goodmorning"]
"feild2":"some string"
"feild3":{}
}
DOC3
{
"feild1":["hi","hello"]
"feild2":"some string"
"feild3":{}
}
I want to query for feild1 having values "hi" and "hello" if both is present then that document should come first if any one is present then it should come after that.
for example:
result should be in order of DOC1, DOC3, DOC2. I tried with boost query. but it is retuning not in the order that I want. Here is the query that I am trying.
{
"query": {
"bool": {
"must": [
{
"match_phrase": {
"avail_status": true
}
},
{
"bool": {
"should": [
{
"constant_score": {
"filter": {
"terms": {
"feild1": [
"hi"
]
}
},
"boost": 20
}
},
{
"constant_score": {
"filter": {
"terms": {
"feild1": [
"hello"
]
}
},
"boost": 18
}
}
]
}
}
]
}
}
}
this is returning me first those document having "hi" and then those having "hello". Thanks in advance!
To add extra boost for documents with larger field1, you can put funtion_score script score.
Mappings
{
"mappings": {
"document_type" : {
"properties": {
"field1" : {
"type": "text",
"fielddata": true
},
"field2" : {
"type": "text"
},
"field3" : {
"type": "text"
}
}
}
}
}
Index documents
POST custom_score_index1/document_type
{
"feild1":["hi","hello","goodmorning"],
"feild2":"some string",
"feild3":{}
}
POST custom_score_index1/document_type
{
"feild1":["hi","goodmorning"],
"feild2":"some string",
"feild3":{}
}
POST custom_score_index1/document_type
{
"feild1":["hi","hello"],
"feild2":"some string",
"feild3":{}
}
Query with function score add extra _score for larger size for field1
POST custom_score_index1/document_type/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"must": [{
"match_phrase": {
"avail_status": true
}
},
{
"bool": {
"should": [{
"constant_score": {
"filter": {
"terms": {
"feild1": [
"hi"
]
}
},
"boost": 20
}
},
{
"constant_score": {
"filter": {
"terms": {
"feild1": [
"hello"
]
}
},
"boost": 18
}
}
]
}
}
]
}
},
"functions": [{
"script_score": {
"script": {
"inline": "_score + 10000 * doc['field1'].length"
}
}
}],
"score_mode": "sum",
"boost_mode": "sum"
}
}
}

Query a multi level nested document at different levels

I have data in the following format
{
"mappings": {
"blog": {
"properties": {
"comments": {
"type": "nested",
"properties": {
"subComments": {
"type": "nested"
}
}
}
}
}
}
}
And i have multiple documents with data like
{
"blog_post_id": "blog1",
"comments": [
{
"id": "c1",
"user_id": "u1",
"timestamp": 1487781975676,
"value": "CVLA1",
"subComments": [
{
"value": "sub comment 1"
},
{
"value": "sub comment 2"
}
]
},
{
"id": "c2",
"user_id": "u1",
"timestamp": 1487781975686,
"value": "CVLA2",
"subComments": [
{
"value": "sub comment 3"
},
{
"value": "sub comment 4"
}
]
}
]
}
I'd like match the blog documents which have comment value CVLA1 and a suc comment which has value "sub comment 2".
I wrote a query like
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{
"match": {
"comments.value": "CVLA1"
}
},
{
"nested": {
"path": "comments.subComments",
"query": {
"match": {
"commnets.subComments.value": "sub comment 2"
}
}
}
}
]
}
}
}
}
}
But this one doesn't work as expected. Any help how to query at different levels of a multi level nested document.
You have a typo in your query around commnets.subComments.value. It should be comments.subComments.value. So the entire query would look like this:
{
"query": {
"nested": {
"path": "comments",
"query": {
"bool": {
"must": [
{
"match": {
"comments.value": "CVLA1"
}
},
{
"nested": {
"path": "comments.subComments",
"query": {
"match": {
"comments.subComments.value": "sub comment 2"
}
}
}
}
]
}
}
}
}
}
I double checked - it works fine for me.

Resources