Search for parents only (with joins) - elasticsearch

I'm relatively new to elasticsearch and I'm trying to follow this documentation page:
https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html
It is mentioned here that a plain match_all query will return everything, both the "parents" and the "children" - https://www.elastic.co/guide/en/elasticsearch/reference/current/parent-join.html#_searching_with_parent_join .
What isn't mentioned there (maybe because it's basic knowledge) is how do you get the parents only? I simply want to get all the parents, without the children.
How would a query like that look like?

I believe that you have already known the relationship of your join when you do the mappings. Let's use the example from the documentation, you know that the parent will be question, and the answer will be children. You simply need to query your join field to be question only:
{
"query": {
"term": {
"my_join_field": "question"
}
},
"sort": ["my_id"]
}
If you use this query, only the documents with my_join_field of question will be returned (and they are your parent documents). If you want only the children, you could do the same (my_join_field will be answer).

Related

Group by field in found document

The best way to explain what I want to accomplish is by example.
Let us say that I have an object with fields name and color and transaction_id. I want to search for documents where name and color match the specified value and that I can accomplish easily with boolean queries.
But, I do not want only documents which were found with search query. I also want transaction to which those documents belong, and that is specified with transaction_id. For example, if a document has been found with transaction_idequal to 123, I want my query to return all documents with transaction_idequal to 123.
Of course, I can do that with two queries, first one to fetch all documents that match criteria, and the second one that will return all documents that have one of transaction_idvalues found in first query.
But is there any way to do it in a single query?
You can use parent-child relation ship between transaction and your object. Or nest the denormalize your data to include the objects in the transactions. Otherwise you'll have to do an application side join, meaning 2 queries.
Try an index mapping similar to the following, and include a parent_id in the objects.
{
"mappings": {
"transaction": {},
"object": {
"_parent": {
"type": "transaction"
}
}
}
}
Further reading:
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html

Relative Performance of ElasticSearch on inner fields vs outer fields

All other things being equal, including indexing, I'm wondering if it is more performant to search on fields closer to the root of the document.
For example, lets say we have a document with a customer ID. Two ways to store this:
{
"customer_id": "xyz"
}
and
{
"customer": {
"id": "xyz"
}
}
Will it be any slower to search for documents where "customer.id = 'xyq'" than to search for documents where "customer_id = 'xyz'" ?
That's pure syntactic sugar. The second form, i.e. using object type, will be flattened out and internally stored as
"customer.id": "xyz"
Hence, both forms you described are semantically equivalent as far as what gets indexed into ES, i.e.:
"customer_id": "xyz"
"customer.id": "xyz"

How to sort parents by number of children in Elasticsearch

I have parent/child-related documents in my index and want to get list of parents sorted by number of children. Is it any way to do it? I'm using Elasticsearch 1.5.1
Right now I can easily get number of children documents together with parent query results by using inner_hits feature, but it seems no way to access inner_hits.{child_type_name}.hits.total value from the script or search/score function. Any ideas?
Well, I found answer myself, finally. Thanks to hints from #doctorcal on #elasticsearch IRC
As I mentioned in the question, we can get list of children together with each parent using inner_hits in Elasticsearch 1.5.
To be able to sort parents by number of their children we need to use a small trick - put number of children into the parent's score (which is used to sort by default). For that, we just use the score mode sum for has_child query:
{
"query": {
"has_child": {
"type": "comment",
"score_mode": "sum",
"query": {
"match_all": {}
},
"inner_hits": {}
}
}
}
NOTE: this query has a limitation - it seems you can't keep information about initial scores (relevance scores for the query), since we replace them with number of children.

one-to-many relationships in Elastic Search

Suppose I have 2 tables called "twitter_user" and "twitter_comments".
twitter_users has the fields: username and bio
twitter_comments has the fields: username and comment
Obviously, an user has 1 entry in twitter_users and potentially many in twitter_comments
I want to model both twitter_users and twitter_comments in Elastic Search, have ES search both models when I query, knowing that a comment counts towards the overall relevancy score for a twitter user.
I know I can mimic this with just 1 model, by creating a single extra field (in addition to username and bio) with all the comments concatenated. But is there another "cleaner" way?
It depends.
If you just want to be able to search for a users comments ,full-text and over all fields, simply store all comments within the user object (no need to concatenate anything):
{
"user" : {
"username" : "TestUser",
"bio" : "whatever",
"comments" : [
{
"title" : "First comment",
"text" : "My 1st comment"
},
{
"title" : "Second comment",
"text" : "My 2nd comment"
}
]
}
}
If you need per-comment-based queries you need to map the comments as nested (before submitting any data), so that every comment gets treated as a single item.
For your scoring, simply add another field "comment_count" and use this for your boost/scoring.
As Thorsten already suggested you can use nested query and it's a good approach.
Alternatively, you can index comments as children of users. Then you can can search users as you do now, search comments using top_children query to find all relevant to your search comments, and finally combine scores from both of them together using bool or dis_max queries.
Nested approach would be more efficient during search, but you will have to reindex the user and all comments every time an additional comment is added. With child/parent approach you will need to index only new comments, but search will be slower and it will require more memory.

How can I query/filter an elasticsearch index by an array of values?

I have an elasticsearch index with numeric category ids like this:
{
"id": "50958",
"name": "product name",
"description": "product description",
"upc": "00302590602108",
"**categories**": [
"26",
"39"
],
"price": "15.95"
}
I want to be able to pass an array of category ids (a parent id with all of it's children, for example) and return only results that match one of those categories. I have been trying to get it to work with a term query, but no luck yet.
Also, as a new user of elasticsearch, I am wondering if I should use a filter/facet for this...
ANSWERED!
I ended up using a terms query (as opposed to term). I'm still interested in knowing if there would be a benefit to using a filter or facet.
As you already discovered, a termQuery would work. I would suggest a termFilter though, since filters are faster, and cache-able.
Facets won't limit result, but they are excellent tools. They count hits within your total results of specific terms, and be used for faceted navigation.

Resources