Elasticseach sort only the result by date field - sorting

I have an elasticsearch index where I want to retrieve 10 records by relevance score and then sort only those 10 results by a date field. But when I use the sort option below I am not getting desirable results & the results are not sorted by date in ascending order too. Any suggestion?
"sort": [
{ "_score": { "order": "desc" }},
{ "startTime": { "order": "desc" }}
]

Related

Elasticsearch pagination does not work (at least not in the way it would be naturally understood)

I have around 38,000 documents in my index. I can only query 10,000 at a time according to Elasticsearhc.
This query works:
GET /vendor/vendors/_search
{
"from": 0,
"size": 10000,
"_source": ["_id", "name", "vendor_type"],
"query": {
"match_all": {}
},
"sort": {
"weight": {
"order": "desc"
}
}
}
this query does not! How am I supposed to get the next 10,000 documents If I can't even get the next 10 documents?
ERROR GIVEN
type": "query_phase_execution_exception",
"reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10010]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting."
FOR THIS QUERY
GET /vendor/vendors/_search
{
"from": 10000,
"size": 10,
"_source": ["_id", "name", "vendor_type"],
"query": {
"match_all": {}
},
"sort": {
"weight": {
"order": "desc"
}
}
}
Using the scroll API I was able to do what I needed to do.

Elasticsearch include other fields in top level aggregation

My indexed documents are as follows:
{
"user": {
"email": "test#test.com",
"firstName": "test",
"lastName": "test"
},
...
"category": "test_category"
}
Currently I have an aggregation which counts documents by the user's email and then a sub aggregation to count categories for each user:
"aggs": {
"users": {
"terms": {
"field": "user.email",
"order": {
"_count": "desc"
}
},
"aggs": {
"categories": {
"terms": {
"field": "category",
"order": {
"_count": "desc"
}
}
}
}
}
}
I am trying to include the user's first and last name to the buckets generated by the top aggregation, while still getting the same results from the categories sub aggregation. I've tried including the top_hits aggregation, but I didn't have any luck getting the results I want.
Any advice? Thanks!
EDIT:
Let me rephrase. I actually did get the desired result in terms of user data with the top_hits aggregation, I just don't know how to properly include it in my original aggregation so that the categories sub aggregation still gives me the same result. I tried the following top_hits aggregation:
"aggs": {
"user": {
"top_hits": {
"size": 1,
"_source": {
"include": ["user"]
}
}
}
}
I want to have the user data in the top level agg buckets and then still have the aggregation by category below that.
If i right, user and firstname lastname have a bijection.
So you could retrieve them using a customs script on these fields (and extract these buckets value on client side spliting with the "_" or wathever separator)
aggs: {
users: {
terms: {
script: 'doc["users.email"].value + "_" + doc["users.firstName"].value + "_" + doc["users.lastName"].value'
}
}
}

how to order on doc count for terms aggregation within a composite aggregation?

I was trying the composite aggregation in elastic-search but found it weird that what i can do within a terms aggregation normally, isn't supported for terms within a composite aggregation!
See the query below :
GET _search
{
"size": 0,
"query": {
"match_all": {}
},"aggs": {
"compo": {
"composite": {
"sources": [
{
"terms_inside": {
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // not supported here!
}
}
}
}
]
}
},
"just_terms" :{
"terms": {
"field": "result_type",
"order": {
"_count": "asc" // supported here
}
}
}
}
}
Is the just the way it is, or is there a way to get sorted buckets on doc count with nested terms aggregation. I want to use paging and sorting on the terms aggregation.
It cannot be done as composite results paginate the aggregation and thus its function is designed to not fetch the count on all fields, only those in the first paginated set.
https://discuss.elastic.co/t/composite-aggregation-order-by/139563/5
You cannot aggregate on multiple terms and order on doc_count before elastic 7.12. On elasticsearch 7.12, you can use a multi terms aggregation.

Elasticsearch: How to sort a result of a Query String query?

I'm using Elasticsearch to search Japanese documents, and combining multiple fields (Morphological analysis + N-Gram) for querying.
The query below:
{
"query":{
"bool":{
"must":{
"query_string":{
"query":"QUERY_KEYWORD",
"fields":[
"title",
"description",
"content",
"content.ngram^3"
]
}
},
"should":[
[
{
"range":{
"created":{
"boost":"5",
"gte":"now-1M"
}
}
}
]
]
}
}
"sort":{
"view_count": {"order": "desc"}
}
}
returns every document in the index sorted by view_count (ignoring the score from query_string), which is isn't what I want.
Multilevel Sorting doesn't work good either:
"sort": [
{ "view_count": { "order": "desc" }},
{ "_score": { "order": "desc" }}
]
returns the same as the query above, and
"sort": [
{ "_score": { "order": "desc" }},
{ "view_count": { "order": "desc" }}
]
dosen't sort anything at all (I suspect view_count is only working as a tie-braker for _score).
The question is: How can I get documents matching QUERY_KEYWORD sorted by view_count?
Thanks in advance.
A sort overrules sorting by score. In your case I think you have to have a look at the function score query. Specifically the Field value factor.
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/query-dsl-function-score-query.html#function-field-value-factor
With this, the amount of views influences the score based on the number of views.

Filter elasticsearch results to contain only unique documents based on one field value

All my documents have a uid field with an ID that links the document to a user. There are multiple documents with the same uid.
I want to perform a search over all the documents returning only the highest scoring document per unique uid.
The query selecting the relevant documents is a simple multi_match query.
You need a top_hits aggregation.
And for your specific case:
{
"query": {
"multi_match": {
...
}
},
"aggs": {
"top-uids": {
"terms": {
"field": "uid"
},
"aggs": {
"top_uids_hits": {
"top_hits": {
"sort": [
{
"_score": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
The query above does perform your multi_match query and aggregates the results based on uid. For each uid bucket it returns only one result, but after all the documents in the bucket were sorted based on _score in descendant order.
In ElasticSearch 5.3 they added support for field collapsing. You should be able to do something like:
GET /_search
{
"query": {
"multi_match" : {
"query": "this is a test",
"fields": [ "subject", "message", "uid" ]
}
},
"collapse" : {
"field" : "uid"
},
"size": 20,
"from": 100
}
The benefit of using field collapsing instead of a top hits aggregation is that you can use pagination with field collapsing.

Resources