I'm using elastic search and would like to sort my collection based on a combination of relevance and id.
When I search by 'id', if I were to search my name, "john kealy", I would have tons of John's come up before me. If I search by relevance, I lose all ability to search my john kealy's by id. From my understanding, a combination sort would sort by a second parameter in the event of a tie, but I think that there's such a difference between a result "john kealy" and "john blabla", that i'd like the john kealy's to come first, sorted by id, then everything else sorted by id.
is this possible?
First recommendation for this: custom scoring lets you weight/adjust/emphasize/deemphasize/whatever to reflect the ranking you want/need.
EDIT: Crikey, I'm getting old. What you want is function score, not custom score. Sorry about that.
https://www.elastic.co/guide/en/elasticsearch/reference/0.90/query-dsl-function-score-query.html
Related
Maybe a dummy question: is it possible to have multiple score fields?
I use a custom score based on function_score query. This score is being displayed to the user to show, how much each document matches his/her preferences. So far so good.
But! The user should be able to filter the documents and (of course) sort them not only by the custom relevance (how much each document matches his/her preferences) but also by the common relevance - how much each document matches the filter criteria.
So my first idea was to place the score calculated by function_score query to a custom field but it does not seems to be supported.
Or am I completely wrong and I should use another approach?
I took a different approach - in case user applies some filter the I run the query without function_score percolation and use the score calculated by ES and sort by it. Then I take all IDs from the result page and run percolation query with these IDs to get the custom "matching score". It does not seems to cause noticeable slowdown.
Anyway, I welcome any feedback.
i am having issues with search after api in elastic search.
please see this link where i posted the full description of the problem
https://discuss.elastic.co/t/weird-results-using-search-after-elastic-search/116609?u=ayshwarya_sree
As per the documentation for searchAfter
A field with one unique value per document should be used as the
tiebreaker of the sort specification. Otherwise the sort order for
documents that have the same sort values would be undefined. The
recommended way is to use the field _id which is certain to contain
one unique value for each document.
Since you are only passing gender as sorting criteria, on your next second request it assumes that you are expecting results after Female, which will be results with gender Male.
Try adding _id as sort and searchafter parameter too
I am familiar with how to sort GSA results on metadata.
I'm interested in sorting across multiple indexes.
For example, sort by Last Name, then by First Name.
So that Alice Smith appears before Bob Smith.
In SQL, this would be quite simple, equivalent to:
SELECT value FROM table ORDER BY last, first
Does GSA support this?
I've been playing with a few different syntaxes, but haven't found a way yet.
If it's only possible to sort on one index, how does google sort within the set of equivalent results? e.g. How does GSA determine whether Alice or Bob appears first? I can't find any good explanation on this.
Sorry if I post it as answer but I can't comment your question because of my reputation is still too low.. (wtf stackoverflow!?).
I just wanna know if you find a way to solve this problem. Thank you!
From what I can tell, GSA does not support multiple dependent sort order.
Instead, I've built an additional meta index that combines the two indexes I want to sort.
So, for example, I have index A for "First Name", index B for "Last Name", and index C which is the combination of both values into "Last Name"_"First Name".
This seems to be working well for me so far.
i have an index for any quarter of a year ("index-2015.1","index-2015.2"... )
i have around 30 million documents on each index.
a document has a text field ('title')
my document sorting method is (1)_score (2)created date
the problem is:
when searching for some text on on 'title' field for all indexes ("index-201*"), always the first results is from one index.
lets say if i am searching for 'title=home' and i have 10k documents on "index-2015.1" with title=home and 10k documents on "index-2015.2" with title=home then the first results are all documents from "index-2015.1" (and not from "index-2015.2", or mixed) even that on "index-2015.2" there are documents with "created date" higher then in "index-2015.1".
is there a reason for this?
The reason is probably, that the scores are specific to the index. So if you really have multiple indices, the result score of the documents will be calculated (slightly) different for each index.
Simply put, among other things, the score of a matching document is dependent on the query terms and their occurrences in the index. The score is calculated in regard to the index (actually, by default even to each separate shard). There are some normalizations elasticsearch does, but I don't know the details of those.
I'm not really able to explain it well, but here's the article about scoring. I think you want to read at least the part about TF/IDF. Which I think, should explain why you get different scores.
https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html
EDIT:
So, after testing it a bit on my machine, it seems possible to use another search_type, to achieve a score suitable for your case.
POST /index1,index2/_search?search_type=dfs_query_then_fetch
{
"query" : {
"match": {
"title": "home"
}
}
}
The important part is search_type=dfs_query_then_fetch. If you are programming java or something similar, there should be a way to specify it in the request. For details about the search_types, refer to the documentation.
Basically it will first collect the term-frequencies on all affected shards (+ indexes). Therefore the score should be generalized over all these.
according to Andrei Stefan and Slomo, index boosting solve my problem:
body={
"indices_boost" : { "index-2015.4" : 1.4, "index-2015.3" : 1.3,"index-2015.2" : 1.2 ,"index-2015.1" : 1.1 }
}
EDIT:
using search_type=dfs_query_then_fetch (as Slomo described) will solve the problem in better way (depend what is your business model...)
I have within my SOLR index song objects which belong to a higher level album object. An example is shown below:
<song>
<album title>Blood Sugar Sex Magic</album title>
<song title>Under the Bridge</song title>
<description>A sad song about junkies</description>
</song>
What I can do at the moment is create a facet on the album title so that a search on songs will also show me what albums contain hits for that keyword.
The default behaviour for SOLR is that the facets are shown in the order of most hits to least. However what I want to achieve is the facet list to be sorted according to the relevancy of the top hit for that album.
For example a search on the word "sad" may show a facet with one hit for "Blood Sugar Sex Magic" and there may also be an album called "Sad Clown songs" where there are 10 hits. "Sad clown songs" will show as the first facet even though it may be that "Under the bridge" comes up as the most relevant song.
My question is how can I get all the facets back but then have them ordered by the relevancy of the songs within them? If I would need to change or extend some underlying SOLR code what would that be?
Thanks in advance.
Solr can only sort facets in lexicographical order or by count (see the facet.sort parameter).
If you want to implement a different sort order I'd start in the SimpleFacets class.
In the end, we decided the easiest way to do this without needing to modify SOLR source code, would be to query solr, ask for the facets then iterate through the results.
Not ideal, but works for now.
You could use Edismax to perform your search query, and use result grouping to group by a specific field, in your case you mentioned Album Title.
https://lucene.apache.org/solr/guide/7_0/result-grouping.html
https://lucene.apache.org/solr/guide/7_0/the-extended-dismax-query-parser.html