Solr boost query sort by whether result is boosted then by another field - sorting

I'm using Solr to run a query on one of our cores. Suppose my documents have two fields: ID, and Name. I also have a separate list of IDs I'm grabbing from a database and passing into the query to boost certain results.
If the document gets returned in the query and the ID is in the list it goes to the top of the results, and if it gets returned in the query and the ID is not in the list then it goes below those that are in the list. The former is from the "boost". My query is something like this -
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&bq=Id%3a(36+OR+76+OR+90+OR+224+OR+391)
I am able to get the boost query working but I need the boosted results to be in alphabetical order by name, then the non boosted results under that also in alphabetical order by name. I need to know what to user for the &sort= parameter.
&sort=score%20desc,Name+asc does not work.
I've looked over a lot of documentation, but I still don't know if this even possible. Any help is appreciated. Thanks!
Solr version is 6.0.1. I am actually using SolrNet to interface with Solr, but I think I can figure out the SolrNet part if I know what the url's &sort= parameter value needs to be.

I figured it out, by doing away with the boost query. I added a sort query using the "exists" function and passing it a sub-query for the ID. The exists returns a boolean value to sort on, then I added the name as a second sort. It works perfect!!
The URL looks like this:
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&sort=exists(query({!v=%27Id:(36+OR+76+OR+90+OR+224+OR+391)%27}))%20DESC,%20Name%20ASC

The closest match to your requirement is the query elevation component[1] .
In your particular case I would first sort my Ids according to my requirements ( sorting them by name for example), then maintain them in the elevate.xml.
At query time you can use the "forceElevation" parameter to force the elevation and then sort the remaining results by name.
[1] https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

Related

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

Possible to use GroupBy in ElasticSearch querystring?

I have a few records in my elasticsearch collection and i want to use a GroupBy aggregation in elasticsearch querystring.
I want to know if it is possible, because i tried to google it always give result about this
i want to use this something like this in the query string , which can
give me records in the group.
For i.e.
http://localhost:9200/_all/tweets/_count?q=user:Pu*+user:Kim*
This will give me count of all the records which has name starts from Pu and Kim,
But i want to know that how many records are there has name starting with Pu
and Kim,
aggregations need to be specified in addition in the search request, you cannot specify them as part of a query string query.
You could also just execute two queries to find out this particular requirement...

How to return fields in correct order for an ElasticSearch query

I'm performing a multimatch search against an ElasticSearch index, and I want to get back the source object with fields in the same order as they were stored in.
However, when I get the response back from the ElasticSearch query, the fields are in alphabetical order (which is not particularly useful for what I'm doing). I'm fairly confident that it used to behave the desired way in a previous version of ES, but since I upgraded recently it is only returning the fields in alphabetical order.
Edit: Note that if I perform a standard match_all search, then I do get the fields back in the original order. I wonder if it has something to do with the multimatch query?
Edit 2: OK, I just ran it again and it returned the fields in a random order (not alphabetical). Maybe this is a bug in ElasticSearch?
You cannot guarantee any order in what is returned. The source document is a plain old JSON object and by definition:
An object is an unordered set of name/value pairs.

Give advantage to search by phrase in sort SOLR

Search query which I send to SOLR is:
?q=iphone 4s&sort=sold desc
By default the search works great, but the problem appears when I want to
sort results by some field for eg. sold - No. of sold products.
SOLR finds all the results which have: (iphone 4s) or (iphone) or (4s)
So, when I apply sort by field 'sold' first result is: "iPhone 3GS..." which is problem.
I need the results by phrase ("iphone 4s") first and then the rest of the results - all sorted by sold.
So, the questions are:
Is it possible to have query like this, and how?
q=iphone 4s&sort={some algoritam for phrase results first} desc, sold desc
Or, can I perform this by setting up query analyzer and how?
At the moment this is solved by sending 2 requests to SOLR,
first with phrase "iphone 4s" and, if this returns 0 results,
I perform second request without the phrase - only: iphone 4s.
If sorting by score, id, field is not sufficient, Lucene lets you implement custom sorting mechanism by providing your own subclass of FieldComparatorSource abstract base class.
With in that custom-sort-logic, you can implement the way that realizes your requirements.
Example Java code:
If(modelNum1.equals(modelNum2)){
//return based on number of units sold.
}else{
//ALWAYS return a value such that the preferred model beats others.
}
DISCLAIMER: This may lead to maintenance problems as you will have to change the logic when a new phone model arrives.
Steps:
1) Sort object accepts FieldComparatorSource type instance during instantiation.
2) Extend the FieldComparatorSource
3) You've to load the required field information that participates in 'SORTING' using FieldCache within the FieldComparatorSource in setNextReader()
4) Override the FieldComparatorSource.newComparator() to return your custom FieldComparator.
5) In the method FieldComparator.compare(slot1DocId, slot2DocId), you may include your custom logic by accessing the corresponding field information, via loaded FieldCache, using the docIds passed in.
Incorporating Lucene code into Solr as a plug-in should not trouble you..
EDIT:
Can not use space in that function. Term is only without space.
As of Solr3.1, sorting can also be done on arbitrary function queries
(as in FunctionQuery) that produce a single value per document.
So, I will use function termfreq in sort
termfreq(field,term) returns the number of times the term appears in
the field for that document.
Search query will be
q=iphone 4s&sort=termfreq(product_name,"iphone 4s") desc, sold desc
Note: The function termfreq is active from Solr 4.0 version

Conditional sorting in Solr 3.6

We're running Solr 3.6 and are trying to apply a conditional sort on the result set. To clarify, the data is a set of bids, and we want to add the option to sort by the current user's bid, so it can't function as a regular sort (as the bid will be different for each user that runs the query).
The documents in the result set include a "CurrentUserId" and "CurrentBid" field, so I think we need something like the following to sort:
sort=((CurrentUserId = 12345) ? CurrentBid : 0) desc
This is just pseudocode, but the idea is that if the currentUserId in Solr matches the user Id (12345 in this example), then sort by CurrentBid, otherwise, just use 0.
It seems like doing a sort by query might be the way to go with achieving this (or at least form part of the solution), using something like the following query:
http://localhost:8080/solr/select/?q=:&sort=query(CurrentUserId:10330 AND CurrentBid:[1 TO *])+desc
This doesn't seem to be working for me though, and results in the following error:
sort param could not be parsed as a query, and is not a field that exists in the index: ...
The Solr documentation indicates that the query function can be used as a sort parameter from Solr 1.4 onwards, so this seems like it should work.
Any advice on how to go about achieving this would be greatly appreciated.
According to the Solr Documentation link you provided,
Any type of subquery is supported through either parameter dereferencing $otherparam or direct specification of the query string in the LocalParams via "v".
So based on the examples and your query, I think one or both of the following should work:
http://localhost:8080/solr/select/?q=:&sort=query($qq)+desc&qq=(CurrentUserId:10330 AND CurrentBid:[1 TO *])
http://localhost:8080/solr/select/?q=:&sort=query({v='CurrentUserId:10330 AND CurrentBid:[1 TO *]'})+desc

Resources