how to use Elastic Search nested queries by object key instead of object property - elasticsearch

Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!

You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.

Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially

Related

Indexing strategy for hierarchical structures on ElasticSearch

Let's say I have hierarchical types such as in example below:
base_type
child_type1
child_type3
child_type2
child_type1 and child_type2 inherit metadata properties from base_type. child_type3 has all properties inherited from both child_type1 and base_type.
To add to the example, here's several objects with their properties:
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1",
child_type1_property: "ct1o_prop_value_2"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
child_type2_property: "ct2o_prop_value_2"
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
child_type1_property: "ct3o_prop_value_2",
child_type3_property: "ct3o_prop_value_3"
}
When I query for base_type_object, I expect to search base_type_property values in each and every one of the child types as well. Likewise, if I query for child_type1_property, I expect to search through all types that have such property, meaning objects of type child_type1 and child_type3.
I see that mapping types have been removed. What I'm wondering is whether this use case warrants indexing under separate indices.
My current line of thinking using example above would be to create 4 indices: base_type_index, child_type1_index, child_type2_index and child_type3_index. Each index would only have mappings of their own properties, so base_type_index would only have base_type_property, child_type1_index would have child_type1_property etc. Indexing child_type1_object would create an entry on both base_type_index and child_type1_index indices.
This seems convenient because, as far as I can see, it's possible to search multiple indices using GET /my-index-000001,my-index-000002/_search. So I would theoretically just need to list hierarchy of my types in GET request: GET /base_type_index,child_type1_index/_search.
To make it easier to understand, here is how it would be indexed:
base_type_index
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
}
child_type1_index
child_type1_object: {
child_type1_property: "ct1o_prop_value_2"
},
child_type3_object: {
child_type1_property: "ct3o_prop_value_2",
}
I think values for child_type2_index and child_type3_index are apparent, so I won't list them in order to keep the post length at a more reasonable level.
Does this make sense and is there a better way of indexing for my use case?

Match keys with sibling object JSONATA

I have an JSON object with the structure below. When looping over key_two I want to create a new object that I will return. The returned object should contain a title with the value from key_one's name where the id of key_one matches the current looped over node from key_two.
Both objects contain other keys that also will be included but the first step I can't figure out is how to grab data from a sibling object while looping and match it to the current value.
{
"key_one": [
{
"name": "some_cool_title",
"id": "value_one",
...
}
],
"key_two": [
{
"node": "value_one",
...
}
],
}
This is a good example of a 'join' operation (in SQL terms). JSONata supports this in a path expression. See https://docs.jsonata.org/path-operators#-context-variable-binding
So in your example, you could write:
key_one#$k1.key_two[node = $k1.id].{
"title": $k1.name
}
You can then add extra fields into the resulting object by referencing items from either of the original objects. E.g.:
key_one#$k1.key_two[node = $k1.id].{
"title": $k1.name,
"other_one": $k1.other_data,
"other_two": other_data
}
See https://try.jsonata.org/--2aRZvSL
I seem to have found a solution for this.
[key_two].$filter($$.key_one, function($v, $k){
$v.id = node
}).{"title": name ? name : id}
Gives:
[
{
"title": "value_one"
},
{
"title": "value_two"
},
{
"title": "value_three"
}
]
Leaving this here if someone have a similar issue in the future.

Set hint for update to use indexes

As per documentation it is possible to provide a hint to an update.
Now I'm using the java mongo client and mongo collection to do an update.
For this update I cannot find any way to provide a hint which index to use.
I see for the update I'm doing a COLSCAN in the logs, so wanting to provide the hint.
this.collection.updateOne(
or(eq("_id", "someId"), eq("array1.id", "someId")),
and(
addToSet("array1", new Document()),
addToSet("array2", new Document())
)
);
Indexes are available for both _id and array1.id
I found out in the logs the query for this update is using a COLSCAN to find the document.
Anyone who can point me in the right direction?
Using AWS DocumentDB, which is MongoDB v3.6
Lets consider a document with an array of embedded documents:
{ _id: 1, arr: [ { fld1: "x", fld2: 43 }, { fld1: "r", fld2: 80 } ] }
I created an index on arr.fld1; this is a Multikey index (indexes on arrays are called as so). The _id field already has the default unique index.
The following query uses the indexes on both fields - arr.fld1 and the _id. The query plan generated using explain() on the query showed an index scan (IXSCAN) for both fields.
db.test.find( { $or: [ { _id: 2 }, { "arr.fld1": "m" } ] } )
Now the same query filter is used for the update operation also. So, the update where we add two sub-documents to the array:
db.test.update(
{ $or: [ { _id: 1 }, { "arr.fld1": "m" } ] },
{ $addToSet: { arr: { $each: [ { "fld1": "xx" }, { "fld1": "zz" } ] } } }
)
Again, the query plan showed that both the indexes are used for the update operation. Note, I have not used the hint for the find or the update query.
I cannot come to conclusion about what the issue is with your code or indexes (see point Notes: 1, below).
NOTES:
The above observations are based on queries run on a MongoDB server
version 4.0 (valid for version 3.6 also, as I know).
The
explain
method is used as follows for find and update:
db.collection.explain().find( ... ) and
db.collection.explain().update( ... ).
Note that you cannot generate a query plan using explain() for
updateOne method; it is only available for findAndModify() and
update() methods. You can get a list of methods that can generate a
query plan by using the command at mongo shell:
db.collection.explain().help().
Note on Java Code:
The Java code to update an array field with multiple sub-document add, is as follows:
collection.updateOne(
or(eq("_id", new Integer(1)), eq("arr.fld1", "m")),
addEachToSet("arr", Arrays.asList(new Document("fld1", "value-1"), new Document("fld1", "value-2"))
);

Having values as keys VS having them as a nested object array in ElasticSearch

Currently , I have a elasticsearch index with a field that has subfields like say A,B,C as below:
"myfield":{
"A":{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
"B":{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
"C":{}
}
As can be seen, the structure of A and B fields are same, but the sub-props under the prop1 can be dynamic , meaning based on the documents added, the mapping might change but its not an issue as A and B exist as separate keys.However, because of this I am facing another problem, in that keeping on adding new documents, due to dynamic mapping, its possible that such sub-props or sub-fields like A,B,C,D ... and so on keep getting added to the mapping, which in turn might cause the mapping to exceed the index.mapping.total_fields.limit ,so to avoid that I am planning to make "myfield" and "prop1" fields as array of objects instead in the mapping, so that the fields A,B,C... are stored as array elements instead of keep getting added to the mapping as new fields.
The question is - is this a feasible solution and how to search for say, "myfield.A.prop1.sub-prop1" >= 3
the new mapping looks something like:
"myfield":[
{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
{}
]

Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects)

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)
Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Resources