How to sort parents by number of children in Elasticsearch - elasticsearch

I have parent/child-related documents in my index and want to get list of parents sorted by number of children. Is it any way to do it? I'm using Elasticsearch 1.5.1
Right now I can easily get number of children documents together with parent query results by using inner_hits feature, but it seems no way to access inner_hits.{child_type_name}.hits.total value from the script or search/score function. Any ideas?

Well, I found answer myself, finally. Thanks to hints from #doctorcal on #elasticsearch IRC
As I mentioned in the question, we can get list of children together with each parent using inner_hits in Elasticsearch 1.5.
To be able to sort parents by number of their children we need to use a small trick - put number of children into the parent's score (which is used to sort by default). For that, we just use the score mode sum for has_child query:
{
"query": {
"has_child": {
"type": "comment",
"score_mode": "sum",
"query": {
"match_all": {}
},
"inner_hits": {}
}
}
}
NOTE: this query has a limitation - it seems you can't keep information about initial scores (relevance scores for the query), since we replace them with number of children.

Related

Search for parents only (with joins)

I'm relatively new to elasticsearch and I'm trying to follow this documentation page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
It is mentioned here that a plain match_all query will return everything, both the "parents" and the "children" - https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html#_searching_with_parent_join .
What isn't mentioned there (maybe because it's basic knowledge) is how do you get the parents only? I simply want to get all the parents, without the children.
How would a query like that look like?
I believe that you have already known the relationship of your join when you do the mappings. Let's use the example from the documentation, you know that the parent will be question, and the answer will be children. You simply need to query your join field to be question only:
{
"query": {
"term": {
"my_join_field": "question"
}
},
"sort": ["my_id"]
}
If you use this query, only the documents with my_join_field of question will be returned (and they are your parent documents). If you want only the children, you could do the same (my_join_field will be answer).

Search After (pagination) in Elasticsearch when sorting by score

Search after in elasticsearch must match its sorting parameters in count and order. So I was wondering how to get the score from previous result (example page 1) to use it as a search after for next page.
I faced an issue when using the score of the last document in previous search. The score was 1.0, and since all documents has 1.0 score, the result for next page turned out to be null (empty).
That's actually make sense, since I am asking elasticsearch for results that has lower rank (score) than 1.0 which are zero, so which score do I use to get the next page.
Note:
I am sorting by score then by TieBreakerID, so one possible solution is using high value (say 1000) for score.
What you're doing sounds like it should work, as explained by an Elastic team member. It works for me (in ES 7.7) even with tied scores when using the document ID (copied into another indexed field) as a tiebreaker. It's true that indexing additional documents while paginating will make your scores slightly unstable, but not likely enough to cause a significant problem for an end user. If you need it to be reliable for a batch job, the Scroll API is the better choice.
{
"query": {
...
},
"search_after": [
12.276552,
14173
],
"sort": [
{ "_score": "desc" },
{ "id": "asc" }
]
}

How does elasticsearch fetch AND operator query from its indexes

Suppose I have a AND/MUST operator query in elasticsearch on two different indexed fields
as follows :
"bool": {
"must": [
{
"match" : {
"query": "Will",
"fields": [ "first",],
"minimum_should_match": "100%" // assuming this is q1
}
},
{
"match" : {
"query": "Smith",
"fields": [ "last" ]
"minimum_should_match": "100%" //assuming this is q2
}
}
]
}
Now I wanted to know how in background elastic search will fetch documents.
Whether it will get all id of documents where index matches q1 and then iterate over all which also has index q2.
or
It does intersection of two sets and how?.
How can I index my data to optimize and QUERIES on two separate fields?
First some basics: ElasticSearch uses lucene behind the scenes. In lucene a query returns a scorer, and that scorer is responsible for returning the list of documents matching the query.
Your boolean query will internally be translated to lucene BooleanQuery which in this case will return ConjunctionScorer, as it has only must clauses.
Each of the clauses is a TermQuery that returns a TermScorer which, when advanced, gives next matching document in increasing order of document id.
ConjunctionScorer computes intersection of the matching documents returned by scorers for each clause by simply advancing each scorer in turns.
So you can think of TermScorer as of one returning an ordered list of the documents, and of ConjunctionScorer as of one simply intersecting two ordered lists.
There's not much you can do to optimize it. Maybe, since you're not really interested in scores, you could use a filter query instead and let ElasticSearch cache it.

elasticsearch: find the newest elements, return "asc"

Using Elasticsearch in Go, I need to search for the newest last X elements, ordered by time.
I think having something like this will accomplish the goal:
"query": {"constant_score": {}},
"sort": {"time": {"order": "desc"}},
"size": X
However, this would return the newest elements in reverse order, wouldn't it?
Is there a way to return the newest X elements in ascending order?
This request will give you the oldest elements matching the query.
To achieve your goal, you could make a count query (ordering not needed) and then a desc sorted request with the start parameter set to count-X. This solution is ugly and very inefficient.
You'd be a lot better off desc sorting the results in your Go app.

How is Elastic Search sorting when no sort option specified and no search query specified

I wonder how Elastic search is sorting (on what field) when no search query is specified (I just filter on documents) and no sort option specified. It looks like sorting is than random ... Default sort order is _score, but score is always 1 when you do not specify a search query ...
You got it right. Its then more or less random with score being 1. You still get consistent results as far as I remember. You have the "same" when you get results in SQL but don't specify ORDER BY.
Just in case someone may see this post even it posted over 6 yrs ago..
When you wanna know how elasticsearch calculate its own score known as _score, you can use the explain option.
I suppose that your query(with filter & without search) might like this more or less (but the point is making the explain option true) :
POST /goods/_search
{
"explain": true,
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"maker_name": "nike"
}
}
}
}
}
As running this, you will notice that the _explaination of each hits describes as below :
"_explanation" : {
"value" : 1.0,
"description" : "ConstantScore(maker_name:nike)",
"details" : [ ]
}
which means ES gave constant score to all of the hits.
So to answer the question, "yes".
The results are sorted kinda randomly because all the filtered results have same (constant) score without any search query.
By the way, enabling an explain option is more helpful when you use search queries. You will see how ES calculates the score and will understand the reason why it returns in that order.
Score is mainly used for sorting, Score is calculated by lucene score calculating using several constraints,For more info refer here .

Resources