Boost score based on integer value - Elasticsearch - elasticsearch

I'm not very experienced with ElasticSearch and would like to know how to boost a search based on a certain integer value.
This is an example of a document:
{
"_index": "links",
"_type": "db1",
"_id": "mV32vWcBZsblNn1WqTcN",
"_score": 8.115617,
"_source": {
"url": "example.com",
"title": "Example website",
"description": "This is an example website, used for various of examples around the world",
"likes": 9,
"popularity": 543,
"tags": [
{
"name": "example",
"votes": 5
},
{
"name": "test",
"votes": 2
},
{
"name": "testing",
"votes": 1
}
]
}
}
Now in this particular search, the focus is on the tags and I would like to know how to boost the _score and multiply it by the integer in the votes under tags.
If this is not possible (or very hard to achieve), I would simply like to know how to boost the _score by the votes (not under tags)
Example, add 0.1 to the _score for each integer in votes
This is the current search query I'm using (for searching tags only):
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool":{
"should":{
"match":{
"tags.name":"example,testing,something else"
}
}
}
}
}
}
}
I couldn't find much online, and hope someone can help me out.
How do I boost the _score with an integer value?
Update
For more info, here is the mapping:
{
"links": {
"mappings": {
"db1": {
"properties": {
"url": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"description": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"likes": {
"type": "long"
},
"popularity": {
"type": "long"
},
"tags": {
"type": "nested",
"properties": {
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"votes": {
"type": "long"
}
}
}
}
}
}
}
}
Update 2
Changed the tags.likes/tags.dislikes to tags.votes, and added a nested property to the tags

This took a long time to figure out. I have learnt so much on my way there.
Here is the final result:
{
"query": {
"nested": {
"path": "tags",
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"tags.name": "example"
}
},
{
"match": {
"tags.name": "testing"
}
},
{
"match": {
"tags.name": "test"
}
}
]
}
},
"functions": [
{
"field_value_factor": {
"field": "tags.votes"
}
}
],
"boost_mode": "multiply"
}
}
}
}
}
The array in should has helped a lot, and was glad I could combine it with function_score

You are looking at function score query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
And field value factor https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html#function-field-value-factor.
Snippet from documentation:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "tags.dislikes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Or with script score because your nested tags field (not sure if field value score works fine with nested structure).

Related

Elasticsearch - terms lookup - filter

I have two indices: users and cars
users contains user_id and car ratings of the user.
ratings objects represent car ratings (by the user)
"user_id": 3,
"ratings": [
{
"score": 10.0,
"car_id": "xxx"
},
{
"score": 50.0,
"car_id": "yyy"
}
]
I'm trying to build a query that fetches cars, rated by user 3 with score higher than 20.
That means, the query would return "yyy" car only (based on the document above) as user 3 has two ratings, but only of them has score greater than 20.
I've managed to build a query that returns all cars rated by a given user.
GET _search
{
"query": {
"bool": {
"filter": [
{
"terms": {
"car_id": {
"index": "users",
"type": "_doc",
"id": "3",
"path": "ratings.car_id"
}
}
}
]
}
}
}
The problem is that I can't figure out how to filter ratings by the ratings.score.
This query is not returning any car even if there are two cars rated by the user 3 with score greater than 20:
GET _search
{
"query": {
"bool": {
"filter": [
{
"terms": {
"car_id": {
"index": "users",
"type": "_doc",
"id": "3",
"path": "ratings.car_id"
}
}
},
{
"range": {
"ratings.score": {
"gte": 20
}
}
}
]
}
}
}
Can you tell me what's wrong and how to make it work?
MAPPINGS
users
{
"mappings": {
"_doc": {
"properties": {
"ratings": {
"type": "nested",
"properties": {
"car_id": {
"type": "text"
},
"score": {
"type": "float"
},
"type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"user_id": {
"type": "integer"
}
}
}
}
}
cars
{
"mappings": {
"_doc": {
"properties": {
"name": {
"type": "text"
},
"color": {
"type": "boolean"
},
....
The field "ratings" is nested. Recommend use Nested Query.
I couldnt test this query:
"filter": [
{
"terms": {
"car_id": {
"index": "users",
"id": "3",
"path": "user_id"
}
}
},
{
"nested": {
"path": "ratings",
"query": {
"range": {
"ratings.score": {
"gte": 20
}
}
}
}
}
]

Elasticsearch search for attributes matching query over multiple documents

I have data modeled where multiple documents with different attributes are logically connected over a chainID because the documents are indexed with an undefined amount of time between them i.e. after they're executed in the backend. All documents are indexed on the same index. Example documents:
Doc 1:
{
"att1": "a",
"att2": "b",
"chainID": "123"
}
Doc 2:
{
"att3": "c",
"att4": "d",
"chainID": "123"
}
Doc 3:
{
"att1": "x",
"att2": "y",
"chainID": "678"
}
Doc 4:
{
"att3": "z",
"att4": "u",
"chainID": "678"
}
Mapping:
{
"properties": {
"att1": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att2": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att3": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"att4": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"chainID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
I want to group the documents by chainID and search through the aggregated results so that a query with att1=a AND att3=c would have chainID=123 as a result.
I tried the following query which resulted in no matching documents
{
"query": {
"bool": {
"must": [
{
"term": {
"att1.keyword": "a"
}
},
{
"term": {
"att3.keyword": "c"
}
}
]
}
},
"aggs": {
"chainIDs": {
"terms": {
"field": "chainID.keyword"
},
"aggs": {
"docs": {
"top_hits": {
"_source": [
"chainID"
]
}
}
}
}
}
}
It seems like the aggregation happens after the query is processed. What I would like to do is aggregate the documents per their chainID and run the query against the aggregated documents. Is this possible with elasticsearch or do I need to adjust my mappings/data model?
Try replacing "must" with should (logic OR). "Must" requires the same document to have att1=1 and att3=c (logic AND).
{
"query": {
"bool": {
"should": [
{
"term": {
"att1.keyword": "a"
}
},
{
"term": {
"att3.keyword": "c"
}
}
]
}
},
"aggs": {
"chainIDs": {
"terms": {
"field": "chainID.keyword"
},
"aggs": {
"docs": {
"top_hits": {
"_source": [
"chainID"
]
}
}
}
}
}
}

Searching Elasticsearch document by existing field not found but the field exists

First of all, I must say I'm on Elasticsearch 5.6.16
I'm trying to figuring out what's happening here. I have several documents indexed with this mapping (I copied the document directly from Kibana):
{
"_index": "my_index",
"_type": "doc",
"_id": "Outbreak_10346",
"_version": 1,
"_score": 1,
"_source": {
"outbreakId": 10346,
"reference": "XX-AD-2021-00003",
"countryCode": "BE",
"adisNotificationReasonType": {
"code": "TERRESTRIAL"
},
"approximateLocation": false,
"latitude": 50.93766,
"longitude": 3.97156,
"adminZoneLevelOne": {
"zoneId": 40,
"zoneCode": "BE2"
},
"affectedSpecies": [
{
"speciesId": 16703,
"name": "Swine",
"measuringUnit": "ANIMAL",
"casesQuantity": 10,
"deadQuantity": 1,
"susceptibleQuantity": 100,
"isAquatic": false
}
],
"affectedSpeciesTotalSusceptible": 100,
"affectedSpeciesTotalCases": 10
}
}
If I do this query in Kibana:
GET my_index/_search
{
"query": {
"exists": {
"field": "adminZoneLevelOne"
}
}
}
I don't get any results. But if I change the field to any of the others I find the documents.
Also, when I retrieve the documents I can access the adminZoneLevelOne field.
How's this possible? Why Elasticsearch doesn't find any document with that field?
The index mapping for adminZoneLevelOne field is:
"adminZoneLevelOne": {
"type": "nested",
"properties": {
"zoneCode": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "WHITESPACE"
},
"zoneId": {
"type": "long"
}
}
}
And for adisNotificationReasonType that works fine, is:
"adisNotificationReasonType": {
"properties": {
"code": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
},
"analyzer": "LOWERCASE_KEYWORD"
}
}
}
Since adminZoneLevelOne is of nested type, you need to use exists query along with the nested query as
{
"query": {
"nested": {
"path": "adminZoneLevelOne",
"query": {
"bool": {
"must": [
{
"exists": {
"field": "adminZoneLevelOne"
}
}
]
}
}
}
}
}

How to apply filter only if nested mapping exists

I'm trying to apply location radius on nested ES query but the nested value is not present all the time causing exception
"[nested] nested object under path [contact.address] is not of nested type"
I tried to check if the property exists then apply filter but nothing worked so far
The mapping is like:
{
"records": {
"mappings": {
"user": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"properties": {
"address": {
"properties": {
"city": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"location": {
"properties": {
"lat": {
"type": "long"
},
"lng": {
"type": "long"
},
"lon": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
},
"email": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"properties": {
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
},
"created_at": {
"type": "date"
}
}
}
}
}
}
and sometimes the records do not have the location or address data which cases problems. sample record:
{
"contact": {
"name": {
"first_name": "Test",
"last_name": "User"
},
"email": "test#user.com",
"address": {}
},
"user_id": 532188
}
here is what i'm trying:
GET records/_search
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.address.location"
}
}
],
"minimum_should_match": 1,
"should": [
{
"bool": {
"filter": {
"nested": {
"ignore_unmapped": true,
"path": "contact.address",
"query": {
"geo_distance": {
"distance": "50mi",
"contact.address.location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
}
}
}
}
}
]
}
}
}
You need to define proper mapping with nested datatype to avoid this issue, looks dynamic mapping is creating some issue.
I defined my own mapping with nested datatype and even when I miss, some data in the nested fields, it doesn't complain.
Index def
{
"mappings": {
"properties": {
"user_id": {
"type": "long"
},
"contact": {
"type": "nested"
}
}
}
}
Index sample doc
{
"contact": {
"name": {
"first_name": "raza",
"last_name": "ahmed"
},
"email": "opster#user.com",
"address" :{ --> note empty nested field
}
},
"user_id": 123456
}
Index another doc with data in the nested field
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": {
"location" :{. --> note nested data as well
"lat" : 51.5073509,
"lon" : -0.1277583
}
}
},
"user_id": 123456
}
Index another doc, which doesn't have even empty nested data
{
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com"
},
"user_id": 123456
}
Search query using nested field
{
"query": {
"nested": {
"path": "contact", --> note this
"query": {
"bool": {
"must": [
{
"exists": {
"field": "contact.address"
}
},
{
"exists": {
"field": "contact.name.first_name"
}
}
]
}
}
}
}
}
The search result doesn't complain about the docs which don't include the nested doc (query which gives you issues)
"hits": [
{
"_index": "nested",
"_type": "_doc",
"_id": "3",
"_score": 2.0,
"_source": {
"contact": {
"name": {
"first_name": "foo",
"last_name": "bar"
},
"email": "opster#user.com",
"address": { --> note the nested doc
"location": {
"lat": 51.5073509,
"lon": -0.1277583
}
}
},
"user_id": 123456
}
}

elastic search for mark character

I have two fields in Vietnamese: "mắt biếc" and "mật mã" in an index call books.
In books index, i use accifolding to transform from "mắt biếc" to "mat biec" and "mật mã" to "mat ma".
In two fields above, i need to query for a term : "mắt". But the score of two field is equal and what i want is "mắt biếc" have score greater than "mật mã".
So, how can i do that in elastic search.
You should use Function Score Query
Try this (base on version 7.x):
GET my_index/_search
{
"query": {
"function_score": {
"query": {
"match": {
"title": "mật"
}
},
"functions": [
{
"filter": {
"term": {
"title.keyword": {
"value": "mắt biếc"
}
}
},
"weight": 30
}
],
"max_boost": 30,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
Mappings example
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"product_analyzer": {
"tokenizer": "standard",
"filter": [
"asciifolding"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"price": {
"type": "keyword"
},
"author": {
"type": "keyword"
},
"publisher": {
"type": "keyword"
}
}
}
}
You have to update your mappings in order to use title.keyword
Update Query
POST my_index/_mapping
{
"properties": {
"title": {
"type": "text",
"analyzer": "product_analyzer",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
And then
Update all documents
POST my_index/_update_by_query?conflicts=proceed
Hope this helps

Resources