Indexing strategy for hierarchical structures on ElasticSearch - elasticsearch

Let's say I have hierarchical types such as in example below:
base_type
child_type1
child_type3
child_type2
child_type1 and child_type2 inherit metadata properties from base_type. child_type3 has all properties inherited from both child_type1 and base_type.
To add to the example, here's several objects with their properties:
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1",
child_type1_property: "ct1o_prop_value_2"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
child_type2_property: "ct2o_prop_value_2"
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
child_type1_property: "ct3o_prop_value_2",
child_type3_property: "ct3o_prop_value_3"
}
When I query for base_type_object, I expect to search base_type_property values in each and every one of the child types as well. Likewise, if I query for child_type1_property, I expect to search through all types that have such property, meaning objects of type child_type1 and child_type3.
I see that mapping types have been removed. What I'm wondering is whether this use case warrants indexing under separate indices.
My current line of thinking using example above would be to create 4 indices: base_type_index, child_type1_index, child_type2_index and child_type3_index. Each index would only have mappings of their own properties, so base_type_index would only have base_type_property, child_type1_index would have child_type1_property etc. Indexing child_type1_object would create an entry on both base_type_index and child_type1_index indices.
This seems convenient because, as far as I can see, it's possible to search multiple indices using GET /my-index-000001,my-index-000002/_search. So I would theoretically just need to list hierarchy of my types in GET request: GET /base_type_index,child_type1_index/_search.
To make it easier to understand, here is how it would be indexed:
base_type_index
base_type_object: {
base_type_property: "bto_prop_value_1"
},
child_type1_object: {
base_type_property: "ct1o_prop_value_1"
},
child_type2_object: {
base_type_property: "ct2o_prop_value_1",
},
child_type3_object: {
base_type_property: "ct3o_prop_value_1",
}
child_type1_index
child_type1_object: {
child_type1_property: "ct1o_prop_value_2"
},
child_type3_object: {
child_type1_property: "ct3o_prop_value_2",
}
I think values for child_type2_index and child_type3_index are apparent, so I won't list them in order to keep the post length at a more reasonable level.
Does this make sense and is there a better way of indexing for my use case?

Related

Having values as keys VS having them as a nested object array in ElasticSearch

Currently , I have a elasticsearch index with a field that has subfields like say A,B,C as below:
"myfield":{
"A":{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
"B":{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
"C":{}
}
As can be seen, the structure of A and B fields are same, but the sub-props under the prop1 can be dynamic , meaning based on the documents added, the mapping might change but its not an issue as A and B exist as separate keys.However, because of this I am facing another problem, in that keeping on adding new documents, due to dynamic mapping, its possible that such sub-props or sub-fields like A,B,C,D ... and so on keep getting added to the mapping, which in turn might cause the mapping to exceed the index.mapping.total_fields.limit ,so to avoid that I am planning to make "myfield" and "prop1" fields as array of objects instead in the mapping, so that the fields A,B,C... are stored as array elements instead of keep getting added to the mapping as new fields.
The question is - is this a feasible solution and how to search for say, "myfield.A.prop1.sub-prop1" >= 3
the new mapping looks something like:
"myfield":[
{
"name":"A",
"prop1":{
"sub-prop1":1,
"sub-prop2":2
},
"prop2":{}
},
{
"name":"B",
"prop1":{
"sub-prop1":3,
"sub-prop2":8,
"sub-prop3":4,
"sub-prop4":7,
},
"prop2":{}
},
{}
]

how to use Elastic Search nested queries by object key instead of object property

Following the Elastic Search example in this article for a nested query, I noticed that it assumes the nested objects are inside an ARRAY and that queries are based on some object PROPERTY:
{
nested_objects: [ <== array
{ name: "x", value: 123 },
{ name: "y", value: 456 } <== "name" property searchable
]
}
But what if I want nested objects to be arranged in key-value structure that gets updated with new objects, and I want to search by the KEY? example:
{
nested_objects: { <== key-value, not array
"x": { value: 123 },
"y": { value: 456 } <== how can I search by "x" and "y" keys?
"..." <=== more arbitrary keys are added now and then
]
}
Thank you!
You can try to do this using the query_string query, like this:
GET my_index/_search
{
"query": {
"query_string": {
"query":"nested_objects.\\*.value:123"
}
}
}
It will try to match the value field of any sub-field of nested_objects.
Ok, so my final solution after some ES insights is as follows:
1. The fact that my object keys "x", "y", ... are arbitrary causes a mess in my index mapping. So generally speaking, it's not a good ES practice to plan this kind of structure... So for the sake of mappings, I resort to the structure described in the "Weighted tags" article:
{ "name":"x", "value":123 },
{ "name":"y", "value":456 },
...
This means that, when it's time to update the value of the sub-object named "x", I'm having a harder (and slower) time finding it: I first need to query the entire top-level object, traverse the sub objects until I find one named "x" and then update its value. Then I update the entire sub-object array back into ES.
The above approach also causes concurrency issues in case I have multiple processes updating the same index. ES has optimistic locking I can use to retry when needed, or, I can queue updates and handle them serially

Elastic search deep tree model

I’m researching for database tool, and i’m not quite sure how Elastic can cope with my requirements.
I have a tree data structure, a family tree.
The root is the first man Adam, and afterward his children, there children and so on.
Elements looks like this (don't care about marriage relations this data just to get the idea) :
{
id: 1
name: “Adam”
parentId: 0
}, {
id: 2
name: “Cain”
parentId: 1
}, {
id: 3
name: “Abel”
parentId: 1
}, {
id: 4
name: “johnny(Cain junior)”
parentId: 2
}, … {
id: 12324568
name: “Cain b”
parentId: 1434
}
Queries I’d like to exec:
‘full text’ search on the element name, response should include the documents and the path to them. Fof example, searching for ‘Cain’ should replay:
a. Adam/Cain
b. ../David/Danny/Cain b
CRUD person by id (Ids are unique)
Get family tree by id, will respond hierarchical tree (nested JSON) , from ‘id’ as root
Tree is about ~20-30 level deep, up to 10,000 elements
Finally, my question:
Can elasticsearch provide me this functionality?
Should i use the parent/child scheme?
How should the index mapping should look.
To answer your questions:
3) Your index mapping could look something like this:
{
"mappings": {
"my_index": {
"properties": {
"id": {
"type": "integer",
"fielddata": true <-- you need this if you're using this field for aggregations
},
"parentId": {
"type": "integer"
},
"name": {
"type": "text" <-- can be text/keyword depending on your requirement
}
}
}
}
}
2) I would suggest you to use the parent-child mapping, so that you can have a one-to-many relationship. Elasticsearch maintains a map of how parents correspond with their children, and query-time joins are fast because of this mapping. You could read up on this SO to know the benchmark of parent-child mapping over the nested.
1) You could always do a full text search as long as you have your mapping type for your field as text. This should help you on identifying the difference of using the type text over keyword. You could add a single document to your index or else you could go with a bulk adding containing multiple documents. This goes hand in hand with other CRUD operations as well. I'm still unaware how the hierarchical tree would respond when you're requesting documents by a parent id.
Hope this helps!

How can perform an Elasticsearch Multisearch, with only suggesters?

I need to return suggestions from 4 separate suggesters, across two separate indices.
I am currently doing this by sending two separate requests to Elasticsearch (one for each index) and combining the results in my application. Obviously this does not seem ideal when the Multisearch API is available.
From playing with the Multisearch API I am able to combine these suggestion requests into one and it correctly retrieves results from all 4 completion suggesters from both indexes.
However, it also automatically performs a match_all query on the chosen indices. I can of course minimize the impact of this by setting searchType to count but the results are worse than the two separate curl requests.
It seems that no matter what I try I cannot prevent the Multisearch API from performing some sort of query over each index.
e.g.
{
index: 'users',
type: 'user'
},
{
suggest: {
users_suggest: {
text: term,
completion: {
size : 5,
field: 'users_suggest'
}
}
},
{
index: 'photos',
type: 'photo'
},
{
suggest: {
photos_suggest: {
text: term,
completion: {
size : 5,
field: 'photos_suggest'
}
}
}
}
A request like the above which clearly omits the {query:{} part of this multisearch request, still performs a match_all query and returns everything in the index.
Is there any way to prevent the query taking place so that I can simply get the combined completion suggesters results? Or is there another way to search multiple suggesters on multiple indices in one query?
Thanks in advance
Do make size=0, so that no hits will be returned but only suggestions.
{
"size": 0,
"suggest":{}
}
for every request.

NEST: How to query against multiple indices and handle different subclasses (document types)?

I’m playing around with ElasticSearch in combination with NEST in my C# project. My use case includes several indices with different document types which I query separately so far. Now I wanna implement a global search function which queries against all existing indices, document types and score the result properly.
So my question: How do I accomplish that by using NEST?
Currently I’m using the function SetDefaultIndex but how can I define multiple indices?
Maybe for a better understanding, this is the query I wanna realize with NEST:
{
"query": {
"indices": {
"indices": [
"INDEX_A",
"INDEX_B"
],
"query": {
"term": {
"FIELD": "VALUE"
}
},
"no_match_query": {
"term": {
"FIELD": "VALUE"
}
}
}
}
}
TIA
You can explicitly tell NEST to use multiple indices:
client.Search<MyObject>(s=>s
.Indices(new [] {"Index_A", "Index_B"})
...
)
If you want to search across all indices
client.Search<MyObject>(s=>s
.AllIndices()
...
)
Or if you want to search one index (thats not the default index)
client.Search<MyObject>(s=>s.
.Index("Index_A")
...
)
Remember since elasticsearch 19.8 you can also specify wildcards on index names
client.Search<MyObject>(s=>s
.Index("Index_*")
...
)
As for your indices_query
client.Search<MyObject>(s=>s
.AllIndices()
.Query(q=>q
.Indices(i=>i
.Indices(new [] { "INDEX_A", "INDEX_B"})
.Query(iq=>iq.Term("FIELD","VALUE"))
.NoMatchQuery(iq=>iq.Term("FIELD", "VALUE"))
)
)
);
UPDATE
These tests show off how you can make C#'s covariance work for you:
https://github.com/Mpdreamz/NEST/blob/master/src/Nest.Tests.Integration/Search/SubClassSupport/SubClassSupportTests.cs
In your case if all the types are not subclasses of a shared base you can still use 'object'
i.e:
.Search<object>(s=>s
.Types(typeof(Product),typeof(Category),typeof(Manufacturer))
.Query(...)
);
This will search on /yourdefaultindex/products,categories,manufacturers/_search and setup a default ConcreteTypeSelector that understands what type each returned document is.
Using ConcreteTypeSelector(Func<dynamic, Hit<dynamic>, Type>) you can manually return a type based on some json value (on dynamic) or on the hit metadata.

Resources