Sort by an attribute value sequence - sorting

We have a specific requirement of sorting the products as per a specific attribute value sequence. Any pointer or source of info would help us.
Example of the scenario;
Let's say for search result I want to sort results based on a attribute producttype. Where producttype has following values:
A, B, C, D
so while in Solr query I can give either producttype asc, producttype desc.
But I want get result in a specific way by saying first give me All results of values 'C' then B, A, D.
We are using Solr with IBM wcs7.

I recommend you to take a look on this article - http://sujitpal.blogspot.ru/2011/05/custom-sorting-in-solr-using-external.html or sorting by FunctionQuery - http://wiki.apache.org/solr/FunctionQuery#Sort_By_Function
UPD. Also, one possible solution is add term query with boosting, e.g.
*:* OR type:A^100 OR type:B^200 OR type:C^2
So, it will be boosted in needed way. All you need to do, is play with boosting weight a little bit

Related

How to sort (and give weight) by Availability dates in SolR

i'm facing a big problem in my SolR DB.
My objects have a datetime field "Available_From" and a datetime field "Available_To".
We also have a "Ranking" field for the sorting.
I can search correctly with direct queries (eg. give me all the items that are available at the moment) but when i do a regular search i cannot find a way to show the items that result "available NOW" in the first places in the results, usually sorted by "Ranking" field.
How can i do this? Am I forced to write some java classes (the nearest thing i've found is there https://medium.com/#devchaitu18/sorting-based-on-a-custom-function-in-solr-c94ddae99a12) or is there a way to do with standard SolR queries?
Thanks in advance to everyone!
In your case you actually don't want sorting, since that indicates that you want one field to determine the returned sequence of documents.
Instead, use boosting - apply a very large boost to those that are available now, either through bq or boost, then apply a boost based on ranking. You'll have to tweak the weights given to each part based on how you want the search results to be presented.

Solr boost query sort by whether result is boosted then by another field

I'm using Solr to run a query on one of our cores. Suppose my documents have two fields: ID, and Name. I also have a separate list of IDs I'm grabbing from a database and passing into the query to boost certain results.
If the document gets returned in the query and the ID is in the list it goes to the top of the results, and if it gets returned in the query and the ID is not in the list then it goes below those that are in the list. The former is from the "boost". My query is something like this -
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&bq=Id%3a(36+OR+76+OR+90+OR+224+OR+391)
I am able to get the boost query working but I need the boosted results to be in alphabetical order by name, then the non boosted results under that also in alphabetical order by name. I need to know what to user for the &sort= parameter.
&sort=score%20desc,Name+asc does not work.
I've looked over a lot of documentation, but I still don't know if this even possible. Any help is appreciated. Thanks!
Solr version is 6.0.1. I am actually using SolrNet to interface with Solr, but I think I can figure out the SolrNet part if I know what the url's &sort= parameter value needs to be.
I figured it out, by doing away with the boost query. I added a sort query using the "exists" function and passing it a sub-query for the ID. The exists returns a boolean value to sort on, then I added the name as a second sort. It works perfect!!
The URL looks like this:
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&sort=exists(query({!v=%27Id:(36+OR+76+OR+90+OR+224+OR+391)%27}))%20DESC,%20Name%20ASC
The closest match to your requirement is the query elevation component[1] .
In your particular case I would first sort my Ids according to my requirements ( sorting them by name for example), then maintain them in the elevate.xml.
At query time you can use the "forceElevation" parameter to force the elevation and then sort the remaining results by name.
[1] https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

Solr Sorting by Relevancy and Containing of Multiple fields

I am using Solr 5.3.0 to make a news searching system. Assume I have these following fields of news:{
Title
Content
Date
NewsType
}
I am searching both of the company name and the manager name in this searching system. Lets say "Stark Industries" as the company name and "Tony Stark" as the manager name. I want to sort the result by date (this is easy to do), relevancy, and following rules:
A:
News that the terms exist in both 'Title' field and 'Content' field.
News that the terms exist only in the 'Title' field.
News that the terms exist only in the 'Content' field.
B:
News that both Company Name (Stark Industries)and Manager Name (Tony Stark) exist.
News that only Company Name exist.
News that only Manager Name exist.
The order should be 1>2>3 (which mean 1 should on the top of 2). And A and B should be two different ways to score the news. And the final score may equal to A*B.
I give "Title" field more weight than "Content" field using this code defType = edismax & qf=notice_title^200+notice_content. So I make the "Title" field more important than the "Content" field.
But in this way, I cannot make sure that A1 > A2 > A3. It only increases the score of the 'Title' field.
Same with the rule B, I only can use qf to increase the weight of the Comany Name.
If there is a way to increase the weight of (Title && Content):(CompanyName && ManagerName) should help. (I try to mean both terms exist in both fields.) But this syntax doesn't work in qf.
Any helps will be appreciated.
You can set omitTermFreqAndPosition for your field, which will ignore the frequency of the terms in the field, making the score independent on the number of times the term appear in the document.
That being said, it's usually better to be a bit more fluent in relevancy calculations than having hard rules like this, but you can implement them by sorting by a function query. Using the function query, you can issue the queries by themselves and then sort by each query.
Make use of Solr boost queries to achieve that.

Sorting Solr multivalue fields based on field values

I have multiple Solr instances with separate schemas.
I need to receive multivalue field in sorted order, e.g. by type: train_station, airport, city_district, and so on:
q=köln&sort=query({!v="type:(airport OR train_station)"}) desc
I would like to see airport type document before train_station type. For now I am always getting train_station type at the top.
How should I write the query?
You are getting train_stations at the top because of the IDF.
A quick hack to fix it would be to use a range query (which has the advantage of having constant scores) and query boosts: q=köln&sort=query({!v="type:([airport TO airport]^3 OR [train_station TO train_station]^2)"}) desc.
This way, documents which have airport in their type field will have a score of 3, documents which have train_station in their type field will have a score of 2 and documents which have airport and train_station in their field type will have a score of 2+3=5 (to a multiplicative constant).
A more elegant (and effective) way of doing this would be to write a custom query parser (or even a function query).
You can sort on a function only if it returns a single value per document. You definitely can't sort on a multiValued field or any field that is tokenized. Seems like you would need a function that returns "airport" if the field contains "airport" (even if it contains "train station") and "train station" if it contains "train station" but not "airport", and then sort on that.
Another option would be to handle this at index time. Add a field called "airport_train_station_sort" that returns 1 if the field contains "airport", 2 if the field contains "train station" but NOT airport, and 3 if it contains neither. Then simply sort on that field.
You cannot solve this problem inside SOLR. Check the documentation, SOLR does not sort multivalued fields. Older versions of SOLR let you try, but the results were undefined and unpredictable.
You either change your schema and put this sort data into single value indexed fields, or you need to make several queries, first for airports, then city districts, then train stations.
To order items within the field itself you have to either index it in order you want, or do post processing. Solr's sort will sort only docs!

Resources