Weird results using Search After () elastic search - elasticsearch

i am having issues with search after api in elastic search.
please see this link where i posted the full description of the problem
https://discuss.elastic.co/t/weird-results-using-search-after-elastic-search/116609?u=ayshwarya_sree

As per the documentation for searchAfter
A field with one unique value per document should be used as the
tiebreaker of the sort specification. Otherwise the sort order for
documents that have the same sort values would be undefined. The
recommended way is to use the field _id which is certain to contain
one unique value for each document.
Since you are only passing gender as sorting criteria, on your next second request it assumes that you are expecting results after Female, which will be results with gender Male.
Try adding _id as sort and searchafter parameter too

Related

elasticseach similarity mechanism in array field

My usecase is I have a field called subjects in elasticsearch index which is a list. This field will be having multiple values. For example one doc has ['subject one', 'subject two', 'subject three'] in field subjects, another doc has ['one test', 'one example', 'two'] in field name. So when I search for subject one in field name, I should get the first document first since it is most relevant, but I was getting the second doc first, even though I am sorting the result by _score.
Basically what I want is for when the user searches multiple search terms, and if all the search terms are present in one documents field then that document should get listed first. For text fields and all, it works fine, But for array fields, it didn't. my list field has more data.
Is there anyway that we can achieve this using any ES similarity mechanisms like BM25..
Thank you

Solr boost query sort by whether result is boosted then by another field

I'm using Solr to run a query on one of our cores. Suppose my documents have two fields: ID, and Name. I also have a separate list of IDs I'm grabbing from a database and passing into the query to boost certain results.
If the document gets returned in the query and the ID is in the list it goes to the top of the results, and if it gets returned in the query and the ID is not in the list then it goes below those that are in the list. The former is from the "boost". My query is something like this -
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&bq=Id%3a(36+OR+76+OR+90+OR+224+OR+391)
I am able to get the boost query working but I need the boosted results to be in alphabetical order by name, then the non boosted results under that also in alphabetical order by name. I need to know what to user for the &sort= parameter.
&sort=score%20desc,Name+asc does not work.
I've looked over a lot of documentation, but I still don't know if this even possible. Any help is appreciated. Thanks!
Solr version is 6.0.1. I am actually using SolrNet to interface with Solr, but I think I can figure out the SolrNet part if I know what the url's &sort= parameter value needs to be.
I figured it out, by doing away with the boost query. I added a sort query using the "exists" function and passing it a sub-query for the ID. The exists returns a boolean value to sort on, then I added the name as a second sort. It works perfect!!
The URL looks like this:
http://mysolrserver:8983/solr/MyCore/MyQueryHandler?q=Smith&start=0&rows=25&sort=exists(query({!v=%27Id:(36+OR+76+OR+90+OR+224+OR+391)%27}))%20DESC,%20Name%20ASC
The closest match to your requirement is the query elevation component[1] .
In your particular case I would first sort my Ids according to my requirements ( sorting them by name for example), then maintain them in the elevate.xml.
At query time you can use the "forceElevation" parameter to force the elevation and then sort the remaining results by name.
[1] https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

Lucene: Filter query by doc ID

I want to have in the search response only documents with specified doc id. In stackoverflow I found this question (Lucene filter with docIds) but as far as I understand there is created the additional field in the document and then doing search by this field. Is there another way to deal with it?
Lucene's docids are intended only to be internal keys. You should not be using them as search keys, or storing them for later use. Those ids are subject to change without warning. They will be changed when updating or reindexing documents, and can change at other times, such as segment merges, as well.
If you want your documents to have a unique identifier, you should generate that key separate from the docId, and index it as a field in your document.

Solr: Excluding certain documents from getting sorted

I have a Solr query where i am trying to sort the results based on a certain field.
I want to modify it in such a way that only a particular set of documents get sorted and the remaining are simply appended to the end of the sorted list.
Is there a way to achieve this?
Please help.
Regards.
If you want to Sort by a particular field condition which is dynamic, you can boost the field with matching condition higher and sort by score.
for e.g. bq=some_field:some_value^10
This will boost the scores of the documents only matching the criteria.
Also, for all the other documents the score would be unchanged and would follow the boasted documents as is.
EDIT :-
you can boost on multiple fields e.g. bq=string_array_field:some_value^10&bq=ranking^10 would boost the documents matching the value and the having higher ranking to the top.
The rest of the documents would follow.
For each <fieldType> definition in your schema.xml you can set a sortMissingLast="true" option that would give you the desired sorting behavior. For your specific example, I would recommend creating a new field with the sortMissingLast="true" set and use then populate this additional field based on your given criteria and not setting a value for those documents you want to appear at the end when sorted.

Sorting Solr multivalue fields based on field values

I have multiple Solr instances with separate schemas.
I need to receive multivalue field in sorted order, e.g. by type: train_station, airport, city_district, and so on:
q=köln&sort=query({!v="type:(airport OR train_station)"}) desc
I would like to see airport type document before train_station type. For now I am always getting train_station type at the top.
How should I write the query?
You are getting train_stations at the top because of the IDF.
A quick hack to fix it would be to use a range query (which has the advantage of having constant scores) and query boosts: q=köln&sort=query({!v="type:([airport TO airport]^3 OR [train_station TO train_station]^2)"}) desc.
This way, documents which have airport in their type field will have a score of 3, documents which have train_station in their type field will have a score of 2 and documents which have airport and train_station in their field type will have a score of 2+3=5 (to a multiplicative constant).
A more elegant (and effective) way of doing this would be to write a custom query parser (or even a function query).
You can sort on a function only if it returns a single value per document. You definitely can't sort on a multiValued field or any field that is tokenized. Seems like you would need a function that returns "airport" if the field contains "airport" (even if it contains "train station") and "train station" if it contains "train station" but not "airport", and then sort on that.
Another option would be to handle this at index time. Add a field called "airport_train_station_sort" that returns 1 if the field contains "airport", 2 if the field contains "train station" but NOT airport, and 3 if it contains neither. Then simply sort on that field.
You cannot solve this problem inside SOLR. Check the documentation, SOLR does not sort multivalued fields. Older versions of SOLR let you try, but the results were undefined and unpredictable.
You either change your schema and put this sort data into single value indexed fields, or you need to make several queries, first for airports, then city districts, then train stations.
To order items within the field itself you have to either index it in order you want, or do post processing. Solr's sort will sort only docs!

Resources