Custom sorting for lucene - sorting

I have document with fields like (title, content, datetime)
I want to sort the results with the following formula
1) title boosts 2.5
2) content boost 1.5
3) IMPORTANT (boost those documents that is newer means datetime field is near today date) boost 3
how can I write a query considering the above criteria
what should I do for #3
any help would be greatly appreciate.

+title:foo^2.5 +content:bar^1.5 datetime:20100721^3
Obviously, fill in appropriate values for the datetime field. The key here is that the datetime term is not a required term; it only functions increase the score for documents that match the term. You can add another datetime term for yesterday's date, and another for the day before, and so on, while decreasing the boost as you get farther away from today's date.

You can use a function query to boost the score for documents containing each of the text fields i.e. Title and Content (both ranked by date). Then after this multiplying the recency boost by your weightings given above.
http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents
{!boost b=product(recip(ms(NOW,datetime),3.16e-11,1,1),2.5)}Title:<query>
{!boost b=product(recip(ms(NOW,Created),3.16e-11,1,1),1.5)}Content:<query>
You can't use a sort as the ordering of the secondary and tertiary sorts will be meaningless unless of course the precision of your dates is sufficiently low.

If you are looking for Custom Sorting based on your own definition then you can look at below example. But it will only help you define your sort on an individual field. You can later add multiple sorts to your query.
Not entirely sure if that helps
https://github.com/smadha/lucene-sorting-example/blob/master/CustomSorter.java

Related

How to disable data aggregation in AWS QuickSight?

I have simple data analytics to display in AWS QuickSight: some date fields and amounts. QS aggregates by default all date-fields and the lowest granularity is aggregate by minute. But my need is to display all data without any aggregation at all. I have searched but not found how it could be possible disable aggregation at all? Any ideas?
First contribution to Stack Overflow, feels good man.
I ran into the same problem and the solution is to convert the field selected for the y-axis to a dimension rather than a measure.
Converting fields from measure to dimension
Aggregations will automatically be applied to 'measure' fields and therefore they need to be converted to 'dimension'

ElasticSearch and Searching in Arrays

We have an ES index which has a field which stores its data as an array. In this field, we include the original text, plus text without any punctuation, special characters, etc. The problem is, when searching on the field, the multiple values appears to be skewing the score.
For example, if we search on the term 'up', the document which has the array ['up, up and away', 'up up and away'] is scoring higher with a multi_match (we are using because we may search more than one field) than the document with the array as simply ['up'].
In the end, I guess what I am looking for is a score that emulates calculating a score for each item in the array and returning me the highest. I believe in this case, comparing 'up' to 'Up' and 'Up, Up and Away' will give me a higher score for 'Up'.
With my research, I believe I may need to do custom scoring on this field...? If that is true, am I looking at "score_mode": "max" as what I want?
I think you slightly over-engineered your index. You don't need to create duplicate fields for the same information and remove punctuation, lowercase fields yourself.
I'd recommend you to read what are elasticsearch token filters and how to create multiple analyzers for the same field.
For your exact use case, if you provided a document sample, it would certainly help. But in any case looking at what you are dealing with - index your array of strings with default analyzer and with a custom one that you'll build yourself. Then you can use the same field, but with different analyzers (differently processed text) to control your score.

How to sort (and give weight) by Availability dates in SolR

i'm facing a big problem in my SolR DB.
My objects have a datetime field "Available_From" and a datetime field "Available_To".
We also have a "Ranking" field for the sorting.
I can search correctly with direct queries (eg. give me all the items that are available at the moment) but when i do a regular search i cannot find a way to show the items that result "available NOW" in the first places in the results, usually sorted by "Ranking" field.
How can i do this? Am I forced to write some java classes (the nearest thing i've found is there https://medium.com/#devchaitu18/sorting-based-on-a-custom-function-in-solr-c94ddae99a12) or is there a way to do with standard SolR queries?
Thanks in advance to everyone!
In your case you actually don't want sorting, since that indicates that you want one field to determine the returned sequence of documents.
Instead, use boosting - apply a very large boost to those that are available now, either through bq or boost, then apply a boost based on ranking. You'll have to tweak the weights given to each part based on how you want the search results to be presented.

Elasticsearch query on string representation of number

Good day:
I have an indexed field called amount, which is of string type. The value of amount can be either one or 1. Say in this example, we have amount=1 as an indexed document but, I try to search for one, ElasticSearch will not return the value unless I put 1 for the search query. Thoughts on how I can get this to work? I'm thinking a tokenizer is what's needed.
Thanks.
You probably don't want this for sevenmillionfourhundredfifteenthousendtwohundredfourteen and the like, but only for a small number of values.
At index time I would convert everything to a proper number and store it in a numerical field, which then even allows to sort --- if you need it. Apart from this I would use synonyms at index and at query time and map everything to the digit-strings, but in a general text field that is searched by default.

Sorting Solr multivalue fields based on field values

I have multiple Solr instances with separate schemas.
I need to receive multivalue field in sorted order, e.g. by type: train_station, airport, city_district, and so on:
q=köln&sort=query({!v="type:(airport OR train_station)"}) desc
I would like to see airport type document before train_station type. For now I am always getting train_station type at the top.
How should I write the query?
You are getting train_stations at the top because of the IDF.
A quick hack to fix it would be to use a range query (which has the advantage of having constant scores) and query boosts: q=köln&sort=query({!v="type:([airport TO airport]^3 OR [train_station TO train_station]^2)"}) desc.
This way, documents which have airport in their type field will have a score of 3, documents which have train_station in their type field will have a score of 2 and documents which have airport and train_station in their field type will have a score of 2+3=5 (to a multiplicative constant).
A more elegant (and effective) way of doing this would be to write a custom query parser (or even a function query).
You can sort on a function only if it returns a single value per document. You definitely can't sort on a multiValued field or any field that is tokenized. Seems like you would need a function that returns "airport" if the field contains "airport" (even if it contains "train station") and "train station" if it contains "train station" but not "airport", and then sort on that.
Another option would be to handle this at index time. Add a field called "airport_train_station_sort" that returns 1 if the field contains "airport", 2 if the field contains "train station" but NOT airport, and 3 if it contains neither. Then simply sort on that field.
You cannot solve this problem inside SOLR. Check the documentation, SOLR does not sort multivalued fields. Older versions of SOLR let you try, but the results were undefined and unpredictable.
You either change your schema and put this sort data into single value indexed fields, or you need to make several queries, first for airports, then city districts, then train stations.
To order items within the field itself you have to either index it in order you want, or do post processing. Solr's sort will sort only docs!

Resources