I have an Elasticsearch database in which all documents have a title field and are queried by this field.
By default, Elasticsearch search gives higher score to document with short title. But, in my use case, short titles are irrelevant.
For example, when I search for 'Deep Learning' the first results are
'Deep Learning'
'Machine Learning and Deep Learning'
'Semi-Supervised Deep Learning With Memory'
I would like the document titled 'Semi-Supervised Deep Learning With Memory' to appear before the document titled 'Deep Learning'.
Is there any solution to achieve that without changing the mapping?
Thanks
The best way would be to compute the content length ant the ingesting time, keep it in a field and use it with a function score query.
{
"query": {
"function_score": {
"query": {
"match": { "title": "elasticsearch" }
},
"script_score": {
"script": {
"source": "doc['title_length'].value * _score"
}
}
}
}
}
Or you can use a script score query which use the field size if you have a keyword mapping on this field.
If not you will have to enable field data on it. But it's very expensive and not recommended.
https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-script-score-query.html
GET /_search
{
"query": {
"script_score": {
"query": {
"match": { "title": "elasticsearch" }
},
"script": {
"source": "doc['title'].value.length() * _score"
}
}
}
}
Related
I'm using cosineSimilarity in elasticsearch for searching documents and the query looks like the following:
{
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}}
The issue here is that I'll be getting all the results despite the similarity score. I want to filter my results based on a threshold filter value.
I tried using bool with following script:
query = {
"query": {
"bool" : {
"must": {
"match_all": {}
},
"filter" : {
"script" : {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0 > 1.4",
"params": {
"queryVector": list(feat)
}
}
}
}
}
}
But this throws an error:
RequestError(400, 'x_content_parse_exception', '[source] query malformed, no start_object after query name')
From Text similarity search with vector fields
Important limitations
The script_score query is designed to wrap a restrictive query, and modify the scores of the documents it returns. However, we’ve provided a match_all query, which means the script will be run over all documents in the index. This is a current limitation of vector similarity in Elasticsearch — vectors can be used for scoring documents, but not in the initial retrieval step. Support for retrieval based on vector similarity is an important area of ongoing work.
EDIT
Adding min_score to the request will filter out based on the calculated score after doing the match_all.
{
"min_score": 1.4,
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'title_vector') + 1.0",
"params": {
"queryVector": list(feat)
}
}
}
}
}
I have a problem with scoring in elasticsearch. When user enter a query that contains 3 terms, sometimes a document that has two words a lot, outscores a document that contains all three words. for example if user enters "elasticsearch query tutorial", I want documents that contains all these words score higher than a document with a lot of "tutorial" and "elasticsearch" terms in it.
PS: I am using minimum should match and shingls in my query. also they made ranking a lot better, they did not solve this problem completely. I need something like query coordination in lucene's practical scoring function. is there anything like that in elastic with BM-25?
One of the possible solutions could be using function score:
{
"query": {
"function_score": {
"query": { "match_all": {} },
"functions": [
{
"filter": { "match": { "title": "elasticserch" } },
"weight": 1
},
{
"filter": { "match": { "title": "tutorial" } },
"weight": 1
}
],
"score_mode": "sum"
}
}
}
In this case, you would have clearly a better position for documents with more matches. However, this would completely ignore TF-IDF or any other parameters.
As a logged in user, I want to be able to hide a single record that I never want to see again if I perform the same search. Is this possible with ElasticSearch?
I've read about multitenancy and filters but I'm not quite sure how a top level implementation might look like.
One of my ideas is that I store some reference to the unwanted record in an RDB and then add those references into a filter query but I'm not sure what reference to use since Elastic Search generates it's own ID's that may not stay the same when a re-index happens.
It depends. If you have not many users and not too big documents you can go with field on the document, Add field dismissedBy and when use dismiss write update to document
POST test/type1/1/_update
{
"script" : {
"inline": "ctx._source.dismissedBy.add(params.userId)",
"lang": "painless",
"params" : {
"userId" : "1"
}
}
}
And query:
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"term": {
"dismissedBy": 1
}
}
}
}
}
Problem with this approach is that if you re-index document all settings will be overwritten so you must keep copy in some other places too.
Other option if documents are large or you have lots of users then I would go with parent/child approach
If user hit dismiss then you should index it
PUT /indexname/dissmisses/1?parent=dismissforid
{
"userId": 1
}
Then when you search you do
POST /index/documents/_search
{
"query": {
"bool": {
"must_not": {
"has_child": {
"type": "dissmiss",
"query": {
"term": {
"userId": 1
}
}
}
}
}
}
}
I've setup an index that has many types representing user data such as a ShoppingList, Playlist, etc. Each type has an "identity_id" field for the user's unique identifier. I use the following query to search across all types and fields for a user (for a search function in a website):
GET _search
{
"query": {
"filtered": {
"query": {
"match_phrase_prefix": {
"_all": "awesome"
}
},
"filter": {
"match": {
"identity_id": 1
}
}
}
}
}
My questions are:
Is there a way to give a higher score to matches on fields that have "name" in the field name? For example, the ShoppingList type will have a shopping_list_name field, and I want a match on that to be higher than its other fields.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
How about this query that boosts certain fields:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name",
"field*"
]
}
},
"functions": [
{
"weight": 2,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"*_name"
]
}
}
},
{
"weight": 1,
"filter": {
"multi_match": {
"query": "awesome",
"fields": [
"field*"
]
}
}
}
]
}
}
}
What the query above does is to boost (weigth: 2) the *_name fields query and not do apply any boosting to fields called field*.
Is the above way of doing a full text search for a particular user (query then filter) the most efficient way? What about creating an index per user?
Regarding this ^ question, that's more complicated and you also need to consider how many users you have, the hardware resources the cluster has, structure of data, queries used etc.
I have an elasticsearch index that I am trying to search for matches based on multiple fields (title and description). If a particular term shows up in the title I want to be able to boost the score by 2*original score. If it is in description it should remain the original score. I am a bit confused by the elasticsearch documentation. Can anyone help me adjust the following query to reflect this logic?
{
"query": {
"query_string": {
"query": "string",
"fields": ["title","description"]
}
}
}
You just need to add ^2 to get a boost on the field you want:
{
"query": {
"query_string": {
"query": "string",
"fields": ["title^2","description"]
}
}
}