Possible to have a document always return above certain position - sorting

I've got a bunch of documents from a query which are sorted by a modified date. However I'd like certain documents (identified by a field value) to always return in the top ten results regardless of whether there are ten or more documents with a more recent modified date.
From what I've read about the various ways of sorting in Elasticsearch (score, boost, scripts) I don't think I have any way of determining the actual position of a document in the search results, let alone some way of manipulating the score to push a document into the top ten.

Assuming that you have a field called "important_field" which contains value 1, for documents you in top and say 0 for all other documents, you can use multi field sorting as below
{
"sort": [
{ "important_field": { "order": "desc" }},
{ "modified_date": { "order": "desc" }}
]
}
This way of sorting means it will sort by important_field value and if they are same then will be sorted by modified_date. So all documents with important_field value 1 will come on top and rest will still be sorted by modified_date.

Related

Elasticsearch - Limit of total fields [1000] in index exceeded

I saw that there are some concerns to raising the total limit on fields above 1000.
I have a situation where I am not sure how to approach it from the design point of view.
I have lots of simple key value pairs:
key1:15, key2:45, key99999:1313123.
Where key is a string and value is a integer on which I would like to sort my results upon on where as if a certain document receives a key it gets sorted by the value.
I ended up creating an object and just put the key value pairs inside so I can match it easy.
For example I have sorting: "object.key".
I was wondering if I just use a simple object with bunch of strings inside that are just there for exact matching should I worry about raising this limit to 10k, or 20k.
Because I now have an issue where there can be more then 1k of these records. I've found I could use nested sorting but it still has a default limit of 10k.
Is there a good design pattern approach for this or should I not be worried by raising the field limits?
Simplified version of the query:
GET products/_search
{
"query": {
"match_all": {}
},
"sort": [
{
"sortingObject.someSortingKey1": {
"order": "desc",
"missing": 2,
"unmapped_type":"float"
}
}
]
}
Point is that I get the sortingKey from request and I use it to sort my results. There are 100k different ways to sort the result for example
There were some recent improvements (in 7.16) that should help there, but 10K or 20K fields is still a lot of overhead.
I'm not sure what kind of queries you need to run on those keyX fields, but maybe the flattened data-type would work for you? https://www.elastic.co/guide/en/elasticsearch/reference/current/flattened.html

Search After (pagination) in Elasticsearch when sorting by score

Search after in elasticsearch must match its sorting parameters in count and order. So I was wondering how to get the score from previous result (example page 1) to use it as a search after for next page.
I faced an issue when using the score of the last document in previous search. The score was 1.0, and since all documents has 1.0 score, the result for next page turned out to be null (empty).
That's actually make sense, since I am asking elasticsearch for results that has lower rank (score) than 1.0 which are zero, so which score do I use to get the next page.
Note:
I am sorting by score then by TieBreakerID, so one possible solution is using high value (say 1000) for score.
What you're doing sounds like it should work, as explained by an Elastic team member. It works for me (in ES 7.7) even with tied scores when using the document ID (copied into another indexed field) as a tiebreaker. It's true that indexing additional documents while paginating will make your scores slightly unstable, but not likely enough to cause a significant problem for an end user. If you need it to be reliable for a batch job, the Scroll API is the better choice.
{
"query": {
...
},
"search_after": [
12.276552,
14173
],
"sort": [
{ "_score": "desc" },
{ "id": "asc" }
]
}

ElasticSearch Score Function Depending on Neighbor Documents

I have an ElasticSearch index with 2 mappings (types).
In the app I need to display a paginated feed containing items of both types.
Currently the items are sorted just by creation date, but I also want to have control on how the items alternate with each other on the page.
For example, I want to set a rule for sequence "3 items of type A, 1 item of type B, and so on".
I need it to make sure items of both types are displayed on each page and equally distributed across the pages.
But as far as I see it's not possible to access another documents in custom score function script.
Of course it's easy to implement directly in the app logic, but it's not clear how to implement pagination using this way.
Any ideas on how to achieve that?
I don't think you can do this.
One approach (that doesn't work) is to keep a global variable in a script and to increment that once every document is being returned/processed. And then to take this number, divide it by 3 and get the modulo number. Based on this number, to sort the docs. But "global" variables are not possible in sripts.
The only two approaches that I can think of is to use a script to generate a random number and based on that to sort. In this way, you get some chances to have a "mixed list of types.
Or, if you want the smallest deterministic way of sorting the docs, still in a script take the ID of the document (you said is a number) modulo 3 it and use the value to sort.
For the random approach:
"sort": [
{
"date": {
"order": "desc"
}
},
{
"_script": {
"script": "Math.random()",
"type": "number",
"order": "asc"
}
}
]

Retrieve largest document size in ElasticSearch

Is it possible to retrieve the largest document(or just its size) in ElasticSearch with a single query?
The motivation for doing so is to cache returned documents in a MySQL store, so I would like to get an idea of the order of magnitude of largest docs, to decide whether to go with TEXT, MEDIUMTEXT or LONGTEXT.
EDIT:
This is on ES 1.3.
To the best of my knowledge, there's no such possibility out of the box.
You could, however, try a scripted aggregation, where the value of the aggregation is the sum of the length of all fields (or all fields you care about).
Another option:
try setting a script sorting order for the documents. for example:
"sort": {
"_script": {
"script": "doc['field1'].value.size() + doc['field2'].value.size()",
"type": "number",
"order": "desc"
}
}

How to sort elastic search results by score + boost + field?

Given an index of books that have a title, an author, and a description, I'd like the resulting search results to be sorted this way:
all books that match the title sorted by downloads (a numeric value)
all books that match on author sorted by downloads
all books that match on description sorted by downloads
I use the search query below, but the problem is that each entry has a different score thus making sorting by downloads irrelevant.
e.g. when the search term is 'sorting' - title: 'sorting in elastic search' will score higher than title: 'postgresql sorting is awesome' (because of the word position).
query = QueryBuilders.multiMatchQuery(queryString, "title^16", "author^8", "description^4")
elasticClient.prepareSearch(Index)
.setTypes(Book)
.setQuery(query)
.addSort(SortBuilders.scoreSort())
.addSort(SortBuilders.fieldSort("downloads").order(SortOrder.DESC))
How do I construct my query so that I could get the desired book sorting?
I use standard analysers and I need to the search query to be analysed, also I will have to handle multi-word search query strings.
Thx.
What you need here is a way to compute score based on three weighted field and a numeric field. Sort will sum the score obtained from both , due to which if either one of them is too large , it will supersede the other.
Hence a better approach would be to multiple downloads with the score obtained by the match.
So i would recommend function score query -
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "sorting",
"fields": [
"title^16",
"author^8",
"description^4"
]
}
},
"function": [
{
"field_value_factor": {
"field": "downloads"
}
}
],
"boost_mode": "multiply"
}
}
}
This will compute the score based on all three fields. And then multiply that score with the value in download field to get the final score. The multiply boost_mode decides how the value computed by functions are clubbed together with the score computed by query.

Resources