Is it possible to boost boolean fields in Solr so that they receive a higher score?
We've got an index which looks a bit like this:
document_id
title
description
keywords
is_reviewed
When searching, documents that have been reviewed (ie. is_reviewed = true) should be weighted more heavily than those that haven't, rather than exclude them completely.
Using is_review:true^100 doesn't seem to work, and excludes unreviewed items instead of just giving them a lower weighting. If there a different way this can be achieved? Thanks!
Some query parsers have a feature dedicated to this kind of usage. For example, the dismax query parser has a boost query bq which allows you to boost documents which match a query by adding its clauses to the original query. There is also a boost function bf which allows you to multiply scores by the result of a function. For example, using is_review as this bf parameter,
the score of every document whose is_review field is undefined will be multiplied by 0.
the score of every document so that is_review=false will be multiplied by one.
the score of every documentso that is_review=true will be multiplied by two.
is_review:true^100 shouldn't exclude non reviewed items unless you are using AND as the default query operator. In this case, you could try to replace is_review:true^100 by (is_review:true^100 OR is_review:false^0).
If you are interested in the boost feature of the dismax query parser but would like to stick to the default query parser, you can use the boost query parser which will allow you to multiply the scores of any query with any function.
Drupal
Here is the solution for those who're using Drupal CMS.
First, find your field name in Schema Browser at /solr/admin/schema.jsp
Then, depending on the module which you use, try the following examples:
Apachesolr module
Code example:
/**
* Implements hook_apachesolr_query_alter().
*/
function hook_apachesolr_query_alter(DrupalSolrQueryInterface $query) {
$query->addParam('bq', array('is_review' =>
'(is_review:true^100 OR is_review:false^0)'
));
}
Search Solr API module
Code example:
/**
* Implements hook_search_api_solr_query_alter().
*/
function hook_search_api_solr_query_alter(&$call_args, SearchApiQueryInterface $query) {
$call_args['params']['bq'][] = '(is_review:true^100 OR is_review:false^0)';
}
Related
I have 10+ Indexes on my Elasticsearch server.
Each Index has 1 or more fields with different kind of Analyzers: keyword, standard, ngram and etc...
For Global search I am using multi_match without specifying any explicit fields.
For querying I am using using elasticsearch-dsl library, the code is bellow:
def search_for_index(indice, term, num_of_result=10):
s = Search(index=indice).sort({"_score": "desc"})
s = s[:num_of_result]
s = s.query('multi_match', query=term, operator='and')
response = s.execute()
return response.to_dict()['hits']['hits']
I get very good result, and search is working just fine, but sometimes someone enters a bit longer text, and I am getting maxClauseCount error.
For example, search that raises an error when search term term is equal to:
term=We are working on your request and will keep you posted at the earliest.
Or any other little longer text raises the same error.
Can you help me figure it out maybe some better approach for my Global search so that I can avoid this kind of error?
First of all - this limitation is there for a reason. The more boolean clauses you have - the heavier search would be. Think of it as crossing (AND) or joining (OR) subset of document ids for each of the clause. This is very heavy operation, that is why initially it has a limit of 1024 clauses.
General recommendation would be to try reduce number of fields you're searching. Maybe you have fields which consist no text data or just have some internal ids. You could cross them out during multi_match query by specifying fields section explicitly.
If you're still decided to go with current approach and you're using Elasticsearch 5.5+ and higher you could alter those by adding following line in elasticsearch.yml and restart your instance.
indices.query.bool.max_clause_count: 250000
If you're using pre-5 version of Elasticsearch the setting is called index.query.bool.max_clause_count
We do understand the behavior of user by analyzing the tags he usually search for.
Now we need to give higher precedence for such tags for these users. I would like to know how we can achieve this using Elasticsearch in an elegant manner.
Well the best approach for this would be to
Analyse the behavior of the user
See which all keywords are of his interests
Maintain one document per user in another index which have all these keywords.
On the searches for that user , boost the occurrence of these keywords using function_score query
You can use terms filter inside boost function to achieve this.Add the boost function under functions in the function score query
In terms filter , you can point to this users document and get the values dynamically
Use custom filter key so that the cache key constructed wont eat too much memory
In this approach , you can avoid lots of code paths in client code.
Search query which I send to SOLR is:
?q=iphone 4s&sort=sold desc
By default the search works great, but the problem appears when I want to
sort results by some field for eg. sold - No. of sold products.
SOLR finds all the results which have: (iphone 4s) or (iphone) or (4s)
So, when I apply sort by field 'sold' first result is: "iPhone 3GS..." which is problem.
I need the results by phrase ("iphone 4s") first and then the rest of the results - all sorted by sold.
So, the questions are:
Is it possible to have query like this, and how?
q=iphone 4s&sort={some algoritam for phrase results first} desc, sold desc
Or, can I perform this by setting up query analyzer and how?
At the moment this is solved by sending 2 requests to SOLR,
first with phrase "iphone 4s" and, if this returns 0 results,
I perform second request without the phrase - only: iphone 4s.
If sorting by score, id, field is not sufficient, Lucene lets you implement custom sorting mechanism by providing your own subclass of FieldComparatorSource abstract base class.
With in that custom-sort-logic, you can implement the way that realizes your requirements.
Example Java code:
If(modelNum1.equals(modelNum2)){
//return based on number of units sold.
}else{
//ALWAYS return a value such that the preferred model beats others.
}
DISCLAIMER: This may lead to maintenance problems as you will have to change the logic when a new phone model arrives.
Steps:
1) Sort object accepts FieldComparatorSource type instance during instantiation.
2) Extend the FieldComparatorSource
3) You've to load the required field information that participates in 'SORTING' using FieldCache within the FieldComparatorSource in setNextReader()
4) Override the FieldComparatorSource.newComparator() to return your custom FieldComparator.
5) In the method FieldComparator.compare(slot1DocId, slot2DocId), you may include your custom logic by accessing the corresponding field information, via loaded FieldCache, using the docIds passed in.
Incorporating Lucene code into Solr as a plug-in should not trouble you..
EDIT:
Can not use space in that function. Term is only without space.
As of Solr3.1, sorting can also be done on arbitrary function queries
(as in FunctionQuery) that produce a single value per document.
So, I will use function termfreq in sort
termfreq(field,term) returns the number of times the term appears in
the field for that document.
Search query will be
q=iphone 4s&sort=termfreq(product_name,"iphone 4s") desc, sold desc
Note: The function termfreq is active from Solr 4.0 version
We're running Solr 3.6 and are trying to apply a conditional sort on the result set. To clarify, the data is a set of bids, and we want to add the option to sort by the current user's bid, so it can't function as a regular sort (as the bid will be different for each user that runs the query).
The documents in the result set include a "CurrentUserId" and "CurrentBid" field, so I think we need something like the following to sort:
sort=((CurrentUserId = 12345) ? CurrentBid : 0) desc
This is just pseudocode, but the idea is that if the currentUserId in Solr matches the user Id (12345 in this example), then sort by CurrentBid, otherwise, just use 0.
It seems like doing a sort by query might be the way to go with achieving this (or at least form part of the solution), using something like the following query:
http://localhost:8080/solr/select/?q=:&sort=query(CurrentUserId:10330 AND CurrentBid:[1 TO *])+desc
This doesn't seem to be working for me though, and results in the following error:
sort param could not be parsed as a query, and is not a field that exists in the index: ...
The Solr documentation indicates that the query function can be used as a sort parameter from Solr 1.4 onwards, so this seems like it should work.
Any advice on how to go about achieving this would be greatly appreciated.
According to the Solr Documentation link you provided,
Any type of subquery is supported through either parameter dereferencing $otherparam or direct specification of the query string in the LocalParams via "v".
So based on the examples and your query, I think one or both of the following should work:
http://localhost:8080/solr/select/?q=:&sort=query($qq)+desc&qq=(CurrentUserId:10330 AND CurrentBid:[1 TO *])
http://localhost:8080/solr/select/?q=:&sort=query({v='CurrentUserId:10330 AND CurrentBid:[1 TO *]'})+desc
I've got an Entity model (in Mongoid) that I'm trying to search on its keywords field which is an array. I want to do a query where I pass in an array of potential search terms, and any entity that matches any of the terms will pass.
I don't have this working well yet.
But, why I'm asking this question, is that it's more complex. I also DONT want to return any entities that have been marked as "do not return" which I do via a "ignore_project_ids" parameter.
So, when I query, I get 0 results. I was using Bonsai.io. But, I've moved this to my own EC2 instance to reduce complexity/variables on solving the problem.
So, what am I doing wrong? Here are the relevant bits of code.
https://gist.github.com/3405763
You want a terms query rather than a term query - a term query is only interested in equality, whereas a terms query requires that the field match any of the specified values.
Given that you don't seem to care about the query score (you're sorting by another attribute), you'll get faster queries by using a filtered query and expressing your conditions as filters