Solr Sorting by Relevancy and Containing of Multiple fields - sorting

I am using Solr 5.3.0 to make a news searching system. Assume I have these following fields of news:{
Title
Content
Date
NewsType
}
I am searching both of the company name and the manager name in this searching system. Lets say "Stark Industries" as the company name and "Tony Stark" as the manager name. I want to sort the result by date (this is easy to do), relevancy, and following rules:
A:
News that the terms exist in both 'Title' field and 'Content' field.
News that the terms exist only in the 'Title' field.
News that the terms exist only in the 'Content' field.
B:
News that both Company Name (Stark Industries)and Manager Name (Tony Stark) exist.
News that only Company Name exist.
News that only Manager Name exist.
The order should be 1>2>3 (which mean 1 should on the top of 2). And A and B should be two different ways to score the news. And the final score may equal to A*B.
I give "Title" field more weight than "Content" field using this code defType = edismax & qf=notice_title^200+notice_content. So I make the "Title" field more important than the "Content" field.
But in this way, I cannot make sure that A1 > A2 > A3. It only increases the score of the 'Title' field.
Same with the rule B, I only can use qf to increase the weight of the Comany Name.
If there is a way to increase the weight of (Title && Content):(CompanyName && ManagerName) should help. (I try to mean both terms exist in both fields.) But this syntax doesn't work in qf.
Any helps will be appreciated.

You can set omitTermFreqAndPosition for your field, which will ignore the frequency of the terms in the field, making the score independent on the number of times the term appear in the document.
That being said, it's usually better to be a bit more fluent in relevancy calculations than having hard rules like this, but you can implement them by sorting by a function query. Using the function query, you can issue the queries by themselves and then sort by each query.

Make use of Solr boost queries to achieve that.

Related

How to sort (and give weight) by Availability dates in SolR

i'm facing a big problem in my SolR DB.
My objects have a datetime field "Available_From" and a datetime field "Available_To".
We also have a "Ranking" field for the sorting.
I can search correctly with direct queries (eg. give me all the items that are available at the moment) but when i do a regular search i cannot find a way to show the items that result "available NOW" in the first places in the results, usually sorted by "Ranking" field.
How can i do this? Am I forced to write some java classes (the nearest thing i've found is there https://medium.com/#devchaitu18/sorting-based-on-a-custom-function-in-solr-c94ddae99a12) or is there a way to do with standard SolR queries?
Thanks in advance to everyone!
In your case you actually don't want sorting, since that indicates that you want one field to determine the returned sequence of documents.
Instead, use boosting - apply a very large boost to those that are available now, either through bq or boost, then apply a boost based on ranking. You'll have to tweak the weights given to each part based on how you want the search results to be presented.

Elasticsearch multi term search

I am using Elasticsearch to allow a user to type in a term to search. I have the following property 'name' I'd like to search, for instance:
'name': 'The car is black'
I'd like to have this document returned if the following is used to search black car or car black.
I've tried doing a bool must and doing multiple terms ['black', 'car'] but it seems like it only works if the entire string is a match.
So what I'd really like to do is more of a, does the term contain both words in any order.
Can someone please get me on the right track? I've been banging my head on this one for a while.
If it seems like it only works if the entire string is a match, first make sure that in index mapping your string property name is analysed, i.e. mapping for this property doesn't contain "index": "not_analyzed". If it isn't so, you'll need to reindex your index in order to be able to search for tokens rather than for the whole phrase only.
Once you're sure your strings are analysed you can use:
Terms query with "minimum_should_match" parameter equalling to the number of words entered.
Bool query with must clause containing term queries per each word.
Common terms query which has a nice clean syntax for this purpose (you don't need to break down input string and construct more complex query structure in your app like with previous two) in addition to taking a smarter approach to stopwords analysing.

Give advantage to search by phrase in sort SOLR

Search query which I send to SOLR is:
?q=iphone 4s&sort=sold desc
By default the search works great, but the problem appears when I want to
sort results by some field for eg. sold - No. of sold products.
SOLR finds all the results which have: (iphone 4s) or (iphone) or (4s)
So, when I apply sort by field 'sold' first result is: "iPhone 3GS..." which is problem.
I need the results by phrase ("iphone 4s") first and then the rest of the results - all sorted by sold.
So, the questions are:
Is it possible to have query like this, and how?
q=iphone 4s&sort={some algoritam for phrase results first} desc, sold desc
Or, can I perform this by setting up query analyzer and how?
At the moment this is solved by sending 2 requests to SOLR,
first with phrase "iphone 4s" and, if this returns 0 results,
I perform second request without the phrase - only: iphone 4s.
If sorting by score, id, field is not sufficient, Lucene lets you implement custom sorting mechanism by providing your own subclass of FieldComparatorSource abstract base class.
With in that custom-sort-logic, you can implement the way that realizes your requirements.
Example Java code:
If(modelNum1.equals(modelNum2)){
//return based on number of units sold.
}else{
//ALWAYS return a value such that the preferred model beats others.
}
DISCLAIMER: This may lead to maintenance problems as you will have to change the logic when a new phone model arrives.
Steps:
1) Sort object accepts FieldComparatorSource type instance during instantiation.
2) Extend the FieldComparatorSource
3) You've to load the required field information that participates in 'SORTING' using FieldCache within the FieldComparatorSource in setNextReader()
4) Override the FieldComparatorSource.newComparator() to return your custom FieldComparator.
5) In the method FieldComparator.compare(slot1DocId, slot2DocId), you may include your custom logic by accessing the corresponding field information, via loaded FieldCache, using the docIds passed in.
Incorporating Lucene code into Solr as a plug-in should not trouble you..
EDIT:
Can not use space in that function. Term is only without space.
As of Solr3.1, sorting can also be done on arbitrary function queries
(as in FunctionQuery) that produce a single value per document.
So, I will use function termfreq in sort
termfreq(field,term) returns the number of times the term appears in
the field for that document.
Search query will be
q=iphone 4s&sort=termfreq(product_name,"iphone 4s") desc, sold desc
Note: The function termfreq is active from Solr 4.0 version

Sorting Solr multivalue fields based on field values

I have multiple Solr instances with separate schemas.
I need to receive multivalue field in sorted order, e.g. by type: train_station, airport, city_district, and so on:
q=köln&sort=query({!v="type:(airport OR train_station)"}) desc
I would like to see airport type document before train_station type. For now I am always getting train_station type at the top.
How should I write the query?
You are getting train_stations at the top because of the IDF.
A quick hack to fix it would be to use a range query (which has the advantage of having constant scores) and query boosts: q=köln&sort=query({!v="type:([airport TO airport]^3 OR [train_station TO train_station]^2)"}) desc.
This way, documents which have airport in their type field will have a score of 3, documents which have train_station in their type field will have a score of 2 and documents which have airport and train_station in their field type will have a score of 2+3=5 (to a multiplicative constant).
A more elegant (and effective) way of doing this would be to write a custom query parser (or even a function query).
You can sort on a function only if it returns a single value per document. You definitely can't sort on a multiValued field or any field that is tokenized. Seems like you would need a function that returns "airport" if the field contains "airport" (even if it contains "train station") and "train station" if it contains "train station" but not "airport", and then sort on that.
Another option would be to handle this at index time. Add a field called "airport_train_station_sort" that returns 1 if the field contains "airport", 2 if the field contains "train station" but NOT airport, and 3 if it contains neither. Then simply sort on that field.
You cannot solve this problem inside SOLR. Check the documentation, SOLR does not sort multivalued fields. Older versions of SOLR let you try, but the results were undefined and unpredictable.
You either change your schema and put this sort data into single value indexed fields, or you need to make several queries, first for airports, then city districts, then train stations.
To order items within the field itself you have to either index it in order you want, or do post processing. Solr's sort will sort only docs!

Custom sorting for lucene

I have document with fields like (title, content, datetime)
I want to sort the results with the following formula
1) title boosts 2.5
2) content boost 1.5
3) IMPORTANT (boost those documents that is newer means datetime field is near today date) boost 3
how can I write a query considering the above criteria
what should I do for #3
any help would be greatly appreciate.
+title:foo^2.5 +content:bar^1.5 datetime:20100721^3
Obviously, fill in appropriate values for the datetime field. The key here is that the datetime term is not a required term; it only functions increase the score for documents that match the term. You can add another datetime term for yesterday's date, and another for the day before, and so on, while decreasing the boost as you get farther away from today's date.
You can use a function query to boost the score for documents containing each of the text fields i.e. Title and Content (both ranked by date). Then after this multiplying the recency boost by your weightings given above.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
{!boost b=product(recip(ms(NOW,datetime),3.16e-11,1,1),2.5)}Title:<query>
{!boost b=product(recip(ms(NOW,Created),3.16e-11,1,1),1.5)}Content:<query>
You can't use a sort as the ordering of the secondary and tertiary sorts will be meaningless unless of course the precision of your dates is sufficiently low.
If you are looking for Custom Sorting based on your own definition then you can look at below example. But it will only help you define your sort on an individual field. You can later add multiple sorts to your query.
Not entirely sure if that helps
https://github.com/smadha/lucene-sorting-example/blob/master/CustomSorter.java

Resources