Elasticsearch get sibling documents for a document matching a query - elasticsearch

I have parent and child documents in my Elasticsearch index related through a join: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/parent-join.html.
I would like to be able to submit a query which matches on child documents and returns the siblings of the matching child documents.
My situation is i have students divided into groups, each student in my index is a separate child document and all students in the same group have the same parentId. The parent document contains no meaningful fields other than a groupId. My query is I want to get the list of all of the students who are in the group with student X with a single query.
For example my query would look similar to:
{
"query": {
"match": {
"studentName": "Bob"
}
}
}
And my response would list all the students who are in the same group as "Bob"
NOTE: I realize this problem could easily be solved by nesting the children who are in a group together into a single document, however, for my use case i cannot do this as i need to support a second query which is to be able to search for a student by name and return the results in sorted order based on relevancy. If i nest the student documents inside the same document, to my understanding, i can no longer achieve this second query.
Does anyone know if the search for siblings query is possible?
Or more broadly does anyone know of any ES construct that would allow me to achieve both searching for students in the group with student X with a single query AND searching for student by name in a single query?

Looks like that can be achieved by nesting has_child inside has_parent. Still, you cant sort by child doc's properties + this query is going to be slow depending on your index size.

Related

ElasticSearch: update by query from a different index

I have the following problem with ElasticSearch. Let's say I have one index called "products". In general, its documents have the following fields:
productId
productPackId
productName
price
And then (for reason that I cannot explain here, but let's say weren't my decision) I have another index called "productPacks" with:
productPackId
name
imageUrl
Now, I need to get the imageUrl field of the index "productPacks" in the "products" index according to which *productPackId" each document on the "products" index has. To clarify: let's say that in "productPacks" the document with
"productPackId" = 1
has as
imageUrl: "https://mywebsite.com/image1.jpg",
what I need is that all documents on the "products" index that have "productPackId" === 1 get then
imageUrl: "https://mywebsite.com/image1.jpg"
I can't find a way of doing it.
Thanks in advance!
(This, of course, would be super easy on a SQL database.)
What you basically want to do is join the two indices, on the "productPackId".
This is not possible in elasticsearch over two different indices.
There is a simple solution:
Iterate over each and every document in the index with the image url's(Index 2) and update by query into index 1, use the productPackId to make the query. That way you will be able to add the image_urls into index1.
Elasticsearch does not have any concept of Join's across indexes.
HTH.
The result you expect, you can only do it with a SQL request
https://www.elastic.co/guide/en/elasticsearch/reference/master/xpack-sql.html

Elasticsearch extract/add id's from multiple queries

I have multiple queries that need to filter data on elasticsearch. This queries are returning document ids from indexes that match the filter.
However i need to do another operation depending from user selection, to extract/add document unique id's from previous sum of queries with current query. The maximum number of query search is 5.
Is there an option in elastic so it will extract/add document id's from previous query? Right now i am doing this part in PHP with foreach iteration that takes a lot of time.
Edit
Example :
Ok let say we have one query on same index that contains :
{"query":{"bool":{"filter":[{"wildcard":{"182_empanalyzed":"example"}}]}}}
we will need to substract the document ids from the following query on same index :
{"query":{"bool":{"must_not":[{"nested":{"path":"184","query":{"exists":{"field":"184.*"}}}}]}}}
Keep in mind that this queries are example with only one condition in it, there might be more complexes queries with many fields to be searched on in each query. And from each following query there is an option to substract/add documents ids

Querying and returning Sub Document/Nested Objects in ElasticSearch

Good day:
I'm currently have the following structure indexed school -> children...meaning for every school document, there's a list of children sub documents. Children is a nested list of objects inside School. My objective is to query the parent school.id=id and only return the sub-documents matching children.userId = userId while paginating the children size/from. I'm not sure how to accomplish this but, any help using NEST would be appreciated.
Thanks.
EDIT:
I didn't realize you only wanted to paginate the results from a single document. In that case you can use inner_hits which has it's own from and size parameters you can use.
Reference: inner_hits documentation
ORIGINAL ANSWER:
I don't think you can paginate directly on the inner object when you have a nested type. Instead you would want to index school and children in separate documents and use a join type to create a parent/child relationship between them. Then you could use a has_parent query to search for children and paginate on the children returned.
Reference:
How to create the mapping: Join Relationship
How to create the query: has_parent query

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

combine fields of different documents in same index

I have 2 fields type in my index;
doc1
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
}
doc2
{
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
now I need such a query that filter in same url field and returns fields both of documents:
result:
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
The results given by elasticsearch are always per document, means that if there are multiple documents satisfying your query/filter, they would always appear as a different documents in the result and never merged into a single document. Hence merging them at client side is the one option which you can use. To avoid getting complete document and just to get the relevant fields, you can use "fields" in your query.
If this is not what you need and still needs narrowing down the result from the query itself, you can use top hit aggregations. It will give you the complete list of documents under a single bucket. But it would also have source field which would contain the complete documents itself.
Try giving a read to page:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-top-hits-aggregation.html

Resources