XQuery for element disregarding parent - xpath

I'm new to XQuery and XPath,
I'm trying to query for a certain element type, say a "city". However, this element is listed under various other types of elements, such as "country" and "region". What is a good way to query for all "city" element types regardless of their parent element?
thank you

You could write //city to find all city elements anywhere in the document.
You could also use /descendant-or-self::city, which is different from the first statement in that it will give you any city elements from the context (starting point where you are executing the statement from).
If you would like to limit the city elements to all with country as parent, you would write: //country/city.
If you would like to get city elements at a specific depth in a tree, write something like: /*/*/city.

Related

Elasticsearch get sibling documents for a document matching a query

I have parent and child documents in my Elasticsearch index related through a join: https://www.elastic.co/guide/en/elasticsearch/reference/6.3/parent-join.html.
I would like to be able to submit a query which matches on child documents and returns the siblings of the matching child documents.
My situation is i have students divided into groups, each student in my index is a separate child document and all students in the same group have the same parentId. The parent document contains no meaningful fields other than a groupId. My query is I want to get the list of all of the students who are in the group with student X with a single query.
For example my query would look similar to:
{
"query": {
"match": {
"studentName": "Bob"
}
}
}
And my response would list all the students who are in the same group as "Bob"
NOTE: I realize this problem could easily be solved by nesting the children who are in a group together into a single document, however, for my use case i cannot do this as i need to support a second query which is to be able to search for a student by name and return the results in sorted order based on relevancy. If i nest the student documents inside the same document, to my understanding, i can no longer achieve this second query.
Does anyone know if the search for siblings query is possible?
Or more broadly does anyone know of any ES construct that would allow me to achieve both searching for students in the group with student X with a single query AND searching for student by name in a single query?
Looks like that can be achieved by nesting has_child inside has_parent. Still, you cant sort by child doc's properties + this query is going to be slow depending on your index size.

ES 6.x, join type, when parent and child doc have the same field. store it as one field or two?

As we know, ES6.x store parent and child type in one document one type.
My model is:
parent: {id, name, point}
child: {id, point}
What is the suggestion to design es document?
{id, name, point} or {id, name, point, child_point}
The question is: when parent and child have the field with the same name, should I store it in one field or two?
I believe you can go both ways. Which one to go depends on what kind of queries you want to perform against parent and child documents.
if you reuse field names
Let's say, you make name reused field name.
You can query all documents by name - parent documents will be returned along with child documents. This may be undesirable (or may be not, depends on what you want).
In order to query by name only parent or only child documents you can still use a must like this: name: X AND type: parent (or child).
If you reuse text fields, the relevance scoring might be affected - because a field with the same name will use the same inverted index, and parent token frequencies will affect relevance of child text searches.
if you don't reuse field names
Let's say you decide for parent documents to only populate "parent fields", and for child documents - only "child fields". For example, in parent you would use name and in child childName.
In this case to search for only parent documents by name it would be enough to use one query: name: X. To search for only child documents by name: childName: X.
If you will need to mix the results (parent and child documents in the same result set) you can still combine these two queries via a should.
The index for full text search of parent's name will not affect that one of childName.
So, which one to choose?
has_child and has_parent queries seem to not affect the choice, since they will distinguish the parent from child automatically.
Select reuse of the fields if you do a lot of queries that query parent and child documents in the same way.
Otherwise, do not reuse the field names.
Hope that helps!

elasticsearch: decide which query should run first

We have a simple web page, where the user can provide some input and query the database. We currently use mongodb but want to migrate to elasticsearch, since the queries are faster.
There are some required search fields, like start and end date, and some optional ones, like a search string to match an entry, or a parent search string, to match parent entries. Parent-child relations are just described through fields containing each entry's ancestors ids.
The question is the following: If both search and parent search string are provided, is there a way to know before executing the queries, which query should be executed first, in order to provide results faster and to be more performant?
For example, it could be that a specific parent search results in only 2 docs/parent entries, and then we can fetch all children matching the search string. In that case we should execute firstly the parent query and then the entry query.
One option would be to get the count of both queries and then execute first the one with the smallest count, but isn't this solution worse, since the queries are going to be executed twice? Once for the count and once for the actual query.
Are there any other options to solve this?
PS. We use elasticsearch v1.7
Example
Let's say the user wants to search for all entries matching the following fields.
searchString: type:BLOCK AND name:test
parentSearchString: name:parentTest AND NOT type:BLOCK
This means that we either have to
fetch all entries (parents) matching the parentSearchString and store their ids. Then, we have to fetch all entries that match the searchString and also have to contain any of the parent ids in the ancestors field.
OR
fetch all entries that match the searchString and store all ancestors ids. Then fetch all entries that match the parentSearchString and their id is one of the ancestors ids.
Just to clarify, both parent and children entries have the exact same structure and reside in the same index. We cannot have different indices since the pare-child relation can be 10 times nested, so an entry can be both a parent and a child. An entry looks more or less like:
{
id: "e32452365321",
name: "name",
type: "type",
ancestors: "id1 id2 id3" // stored in node as an array of ids
}
First of all, I would advise you, to upgrade your Elasticsearch version, if possible. There happened a lot since 1.7 and to be honest, I can't tell if all of what's written in the following article is valid for such an old version (probably it isn't).
But to your actual question: Hopefully I am understanding you correctly, but you try to estimate how costly a query for Elasticsearch is? Well, you don't have to. If you provide all 'queries' in one nested query, Elasticsearch will do that for you: https://www.elastic.co/blog/elasticsearch-query-execution-order
Regarding speed, there is one other thing I can mention: calculating score does take time. So if sorting is not based on the elasticsearch _score, you want to use boolean filter queries. This would also apply, if you want to sort only by _score of parent matches, then you could put the query for children into a filter.
update
Thanks to your example, I now see the problem. Self referencial Parent-Child relations are unfortunately not supported by ElasticSearch, so your approach is probably right. You might want to check out the short chapter of the documentation about application-joins.
So yes, in general, you want to send the second query with the least possible amount of ids/terms. While getting counts for both queries is not as bad as you might think, because the results are most likely still cached, does it actually help? Because if you're going from child to parent, you would have to count the ancestors (field values), and not the actual document count.
I would argue, that the most expensive operation is very often fetching result source from disk. So whichever way you go, you probably should only fetch what you need in the first query. So your options are:
Fetch only the id of parent matches, and then use a terms filter on ancestors in the second query.
Or, fetch only the ancestors field of child matches, and use an id filter in your second query.
Unfortunately, I can't help you more than that, since I don't have enough experience in comparing speed of those approaches. My guess would be, that an id filter might be faster in general. But that's just a guess...

parent-child documents relation retrieval

could you help me with little problem regarding parent-child documents relation?
Considering JSON, I have objects, each of them contains an array of sub-objects. Sub-objects contain some text fields.
I need to maintain full-text-search on these objects and construct snippets. I need highlighting for building snippets.
If I use nested objects, highlighting does not deal with them.
Therefore, I use Parent-Child relationships.
Now I need to retrieve Parent-documents, which children match the query_string. Furthermore, I need to get highlighted fields of matched children and associate each one(each child) with corresponding parent to construct snippets in my application.
Is it possible to accomplish my goal in one query?
I think that you should consider using the children aggregation. With that you can retrieve children items within their parents. It's aggregation so you are not able to get the whole document (just id) but with that you retrieve the relationship... Then with another query you can get document details quickly.
Link here : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-children-aggregation.html
And more details : https://www.elastic.co/guide/en/elasticsearch/guide/current/children-agg.html

LinkedList of objects added in alphabetical order according to object param

What would be the best way to add objects into my LinkedList in alphabetical order of one of the objects parameters? I have a class that takes in last name, first name, and some other stuff. I've made an object of that class and the parameters are all user submitted, and I have to store every object that's made into a LinkedList. The objects must be added to the linked list in alphabetical order according to the last name. What would be the best way to do this?
Thanks!!
You could do a binary search through your list using the “compareTo” function to find the correct index in which to insert the new value.
The binary search consists in comparing the middle element key value with a given key, in this case, your new element. If the key match you are done, that is the correct index, if it does not but the value is greater than your key value you have to do the search again with the left half of the array, on the contrary, you do the search again with the right half

Resources