I have a lucene.net-index having fields like "title", "description" and something else.
For searching I use the TopScoreDocDollector. If I search the result is prioritized by Lucenes score.
Now, some results have the same score and Lucene sort the documents first by score, second by create-date in index.
I would´ve first sorting by Lucene-score and second by Lucene-score of the "title"-field. Is there such a thing?
I only find a way for sorting first by Lucene-score and second by title alphabetically.
You will need to do your own sorting, or write a custom Collector.
By default, TopScoreDocDollector will sort by score, and then by docid. The ordering by docid is important internally when the scores are equal.
Doing a custom sort when you are displaying results should be trivial to do.
This doesn't really make sense. If the title fields don't get the same score, it would be reasonable to assume the overall score won't be the same either, so your secondary sort would never actually have any effect on the order. The case where overall score is the same but the score for particular field is different is fairly unlikely.
If you meant the reverse, to sort first on title score, then on overall score, I would just do that by boosting the title field. Index-time boosting might make the most sense in this case.
Related
I'm scraping few sites and relisting their products, each site has their own index in Elasticsearch. Some sites have affiliate programs, I'd like to list those first in my search results.
Is there a way for me to "boost" results from a certain index?
Should I write a field hasAffiliate: true into ES when I'm scraping and then boosting the query clauses that have that has that value? Or is there a better way?
Using boost could be difficult to guarantee that they appear first in the search. According to the official guide:
Practically, there is no simple formula for deciding on the “correct”
boost value for a particular query clause. It’s a matter of
try-it-and-see. Remember that boost is just one of the factors
involved in the relevance score
https://www.elastic.co/guide/en/elasticsearch/guide/current/query-time-boosting.html
It depends on the type of queries you are doing, but here you have other couple of options:
A score function with weights: could be a more predictable option.
Simply using a sort by hasAffiliate (the easiest one).
Note: Not sure if sorting by boolean field is possible, in that case you could set hasAffiliate mapping as integer byte (smallest one), setting it as 1 when true.
I have a requirement on sorting a field which is when a value matches its field, then this document has a higher score than other documents. Can Algolia do this?
To reflect the importance of an attribute compared to another, the way to go using Algolia is definitely to order the attributes you want to search in in the searchableAttributes index setting.
For instance, if you want to search in both title and description, but title is more important; you should go for:
searchableAttributes:
- title
- description
Compared to the boosting approach, this ensures the number of match occurrences you have won't impact the overall ranking (common issue in ES is: is 4 words matching here and there in description better than 2 words matching exactly in title?).
With Algolia, the objects matching the longest expression (in terms of proximity between query words in the text) will always be used to identify the best matching attribute; and then to sort the results according to the attributes importance.
I'm trying to understand boosting and if boosting is the answer to my problem.
I have an index and that has different types of data.
EG: Index Animals. One of the fields is animaltype. This value can be Carnivorous, herbivorous etc.
Now when a we query in search, I want to show results of type carnivorous at top, and then the herbivorous type.
Also would it be possible to show only say top 3 results from a type and then remaining from other types?
Let assume for a herbivourous type we have a field named vegetables. This will have values only for a herbivourous animaltype.
Now, can it be possible to have boosting rules specified as follows:
Boost Levels:
animaltype:Carnivorous
then animaltype:Herbivorous and vegatablesfield: spinach
then animaltype:herbivoruous and vegetablesfield: carrot
etc. Basically boosting on various fields at various levels. Im new to this concept. It would really helpful to get some inputs/guidance.
Thanks,
Kasturi Chavan
Your example is closer to sorting than boosting, as you have a priority list for how important each document is - while boosting (in Solr) is usually applied a bit more fluent, meaning that there is no hard line between documents of type X and type Y.
However - boosting with appropriately large values will in effect give you the same result, putting the documents into different score "areas" which will then give you the sort order you're looking for. You can see the score contributed by each term by appending debugQuery=true to your query. Boosting says that 'a document with this value is z times more important than those with a different value', but if the document only contains low scoring tokens from the search (usually words that are very common), while other documents contain high scoring tokens (words that are infrequent), the latter document might still be considered more important.
Example: Searching for "city paris", where most documents contain the word 'city', but only a few contain the word 'paris' (but does not contain city). Even if you boost all documents assigned to country 'germany', the score contributed from city might still be lower - even with the boost factor than what 'paris' contributes alone. This might not occur in real life, but you should know what the boost actually changes.
Using the edismax handler, you can apply the boost in two different ways - one is to use boost=, which is multiplicative, or to use either bq= or bf=, which are additive. The difference is how the boost contributes to the end score.
For your example, the easiest way to get something similar to what you're asking, is to use bq (boost query):
bq=animaltype:Carnivorous^1000&
bq=animaltype:Herbivorous^10
These boosts will probably be large enough to move all documents matching these queries into their own buckets, without moving between groups. To create "different levels" as your example shows, you'll need to tweak these values (and remember, multiple boosts can be applied to the same document if something is both herbivorous and eats spinach).
A different approach would be to create a function query using query, if and similar functions to result in a single integer value that you can use as a sorting value. You can also calculate this value when indexing the document if it's static (which your example is), and then sort by that field instead. It will require you to reindex your documents if the sorting values change, but it might be an easy and effective solution.
To achieve the "Top 3 results from a type" you're probably going to want to look at Result grouping support - which makes it possible to get "x documents" for each value in a single field. There is, as far as I know, no way to say "I want three of these at the top, then the rest from other values", except for doing multiple queries (and excluding the three you've already retrieved from the second query). Usually issuing multiple queries works just as fine (or better) performance wise.
I have a very specific order I would like facets returned in. I see that the default for elastic search is count, and optionally you can do term which is alphabetical. (see: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html)
Besides doing the sort in my application I was curios if there was a way of sorting the facets in the order I want them on the ES side.
Can't say about any arbitrary order, but if you have something in your document to order to be relied upon, you can sort documents in query/filter/aggregation before picking up facets. By the way, don't use facets at all - aggregations are faster (by ten times in my case) and more powerful along with almost same syntax. The catch is, ordering can change search results if there are more than "top results".
I'm using elasticsearch to find similar documents to a given document using the "more like this" query.
Is there an easy way to get the elasticsearch scoring between 0 and 1 (using cosine similarity) ?
Thanks!
You may want to look into the Function Score functionality of Elasticsearch, more specifically the script_score and field_value_factor functions. This will allow you to take the score from default scoring (_score) and enhance or replace it in other ways. It really depends on what sort of boosting or transformation you'd like. The default scoring model takes into account the Vector model but other things as well .
I don't think that's possible to retrieve directly.
But perhaps this workaround would make sense?
Elasticsearch always bring back max_score in hits document.
You can potentially divide your document _score by max_score. Report with highest value will score as 1, documents, that are not so like given one, will score less.
The Elasticsearch uses the Boolean model to find matching documents, and a formula called the practical scoring function to calculate relevance. This formula borrows concepts from term frequency/inverse document frequency and the vector space model but adds more-modern features like a coordination factor, field length normalization, and term or query clause boosting.