Index main-object, sub-objects, and do a search on sub-objects (that return sib-objects) - elasticsearch

I've an object like it (simplified here), Each strain have many chromosomes, that have many locus, that have many features, that have many products, ... Here I just put 1 of each.
The structure in json is:
{
"name": "my strain",
"public": false,
"authorized_users": [1, 23, 51],
"chromosomes": [
{
"name": "C1",
"locus": [
{
"name": "locus1",
"features": [
{
"name": "feature1",
"products": [
{
"name": "product1"
//...
}
]
}
]
}
]
}
]
}
I want to add this object in Elasticsearch, for the moment I've add objects separatly: locus, features and products. It's okay to do a search (I want type a keyword, watch in name of locus, name of features, and name of products), but I need to duplicate data like public and authorized_users, in each subobject.
Can I register the whole object in elasticsearch and just do a search on each locus level, features and products ? And get it individually ? (no return the Strain object)

Yes you can search at any level (ie, with a query like "chromosomes.locus.name").
But as you have arrays at each level, you will have to use nested objects (and nested query) to get exactly what you want, which is a bit more complex:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.3/query-dsl-nested-query.html
For your last question, no, you cannot get subobjects individually, elastic returns the whole json source object.
If you want only data from subobjects, you will have to use nested aggregations.

Related

ElasticSearch field with different types in one single index

We have a scenario in a service that accepts multiple types of data and we want to store in ElasticSearch so we can benefit from its search capabilities.
Data could be a String, Number, Object or an Array of objects as the following:
POST my-index/_doc/1
{
"additionalData": [
{
"values": {
"some-field": "some-value",
"some-other-field": "some-value"
}
}
]
}
POST my-index/_doc/1
{
"additionalData": [
{
"values": [12345, 9875]
}
]
}
POST my-index/_doc/1
{
"additionalData": [
{
"values": "Some text"
}
]
}
Is there a way to store that in elasticSearch? or better to store in other NoSQL Databases like Mongodb?
PS: we are using Es 7.x, and would like to keep using ES.
If you don't need to search on those values, it's possible with a disabled field (i.e. not indexed, not stored)
However, if you want to search on those value, it's not possible. Each field must have a specific type (object, numeric, text, etc) and then you can only store values of that type in the field.

Match keys with sibling object JSONATA

I have an JSON object with the structure below. When looping over key_two I want to create a new object that I will return. The returned object should contain a title with the value from key_one's name where the id of key_one matches the current looped over node from key_two.
Both objects contain other keys that also will be included but the first step I can't figure out is how to grab data from a sibling object while looping and match it to the current value.
{
"key_one": [
{
"name": "some_cool_title",
"id": "value_one",
...
}
],
"key_two": [
{
"node": "value_one",
...
}
],
}
This is a good example of a 'join' operation (in SQL terms). JSONata supports this in a path expression. See https://docs.jsonata.org/path-operators#-context-variable-binding
So in your example, you could write:
key_one#$k1.key_two[node = $k1.id].{
"title": $k1.name
}
You can then add extra fields into the resulting object by referencing items from either of the original objects. E.g.:
key_one#$k1.key_two[node = $k1.id].{
"title": $k1.name,
"other_one": $k1.other_data,
"other_two": other_data
}
See https://try.jsonata.org/--2aRZvSL
I seem to have found a solution for this.
[key_two].$filter($$.key_one, function($v, $k){
$v.id = node
}).{"title": name ? name : id}
Gives:
[
{
"title": "value_one"
},
{
"title": "value_two"
},
{
"title": "value_three"
}
]
Leaving this here if someone have a similar issue in the future.

how to use Elastic Search nested queries by object key instead of object property

Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!
You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.
Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially

Count Unique Objects

My index looks like this:
"_source": {
"ProductName": "Random Product Name",
"Views": {
"Washington": [
{ "4nce5bbszjfppltvc": "2018-04-07T18:25:16.160Z" },
{ "4nce5bba8jfpowm4i": "2018-04-07T18:05:39.714Z" },
{ "4nce5bbszjfppltvc": "2018-04-07T18:36:23.928Z" },
]
}
}
I am trying to count the number of unique objects in Views.Washington.
In this case, the result would be 2, since two objects have the same key names. ( first and third object in the array ).
Obviously, my first thought was to use aggregations, but I am not sure how to use them with nested objects, like these.
Can this be done with normal aggregations?
Will I need to use a script?
Yes this can be done with Aggregations: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-nested-aggregation.html

Rethinkdb: Including a subdocument for nested doc

I am performing an operation, and it works, but I want to know if there is a better or more efficient way to do what I want.
I have an object in my db that looks like this:
{
"id": "testId",
"name": "testName",
"products": [
{
"name": "product1"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
},
{
"name": "product2"
"info": "sampleInfo",
"templateIds": [
"asdf-1",
"asdf-2"
]
}
]
}
As you can see, each "product" in the "products" array has a sub-array of templateIds. These match templates stored in another table. What I want to do is create a query that merges those templates onto each product object before I send it all back.
Currently I am doing this with sub-merges:
r.table('suites').get('testId').merge(function(suite){
return {
products: suite('products').merge(function(product){
return {
templates: r.expr(product('templateIds')).map(function(id) {
return r.table('templates').get(id)
})
}
})
}
})
My question is: is there a more efficient way to do this? Or is there a completely different way of thinking I should employ to do this?
Thanks guys!
That looks right to me. The only thing I can think of is that r.table('templates').get_all(r.args(product('templateIds'))) is shorter than product('templateIds').map(function(id){ return t.table('templates').get(id);}) and might well be faster.
EDIT: If you have a small number of templates, another thing that would make this run faster would be to do the substitution in the client instead and cache the retrieved templates by ID. RethinkDB will have to do a separate read for each template ID, even if it sees the same one over and over again, because it doesn't know enough to know whether or not caching those values is safe.

Resources