refer
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html
To update, add, or remove a nested object, we have to reindex the
whole document.
In this content, 'whole' mean
all document in single type that has the document
single document that updated, added, or removed
A nested object is just a component of a single document.
{
"users" : [
{ "username" : "pickypg", "country" : "US" }
]
}
If users is the nested array, then each object in it is a nested document. Every change to that array (add, update, or delete) causes all nested documents in the array to be rewritten with it.
On a related note: you never need to use nested type unless it's going to be an array in general (not all documents must make use of it, but at least some of them should or it's a big waste).
Not coincidentally, the entire document is also rewritten. The real burden is that each nested document, unlike the normal, whole documents, adds extra work when the segments get written.
Related
I'm creating an index with mappings the following way:
_elasticClient.Indices.Create(name, c => c.Map(/* omitted for brevity */));
_elasticClient.BulkAll(data, b => b
.Index(name)
.BackOffTime("30s")
.BackOffRetries(2)
.RefreshOnCompleted()
.MaxDegreeOfParallelism(Environment.ProcessorCount)
.Size(50)).Wait(TimeSpan.FromMinutes(15), _ => { });
_elasticClient.Indices.PutAlias(name, Alias);
I'm just trying to understand how the bulk update works and make sure I do this correctly. Even if I remove _elasticClient.Indices.Create, an index still seems to be created. Does a POST to index_v1/_bulk create the index if it doesn't exist, but update it with data if I create it first with my first line?
By default, if data is indexed into an index that does not yet exist, Elasticsearch will create the index and infer the mapping for documents based on the first document it sees. This can be useful, but typically for a search use case, you want to explicitly control the mapping for documents to apply specific analyzers, etc., so creating the index with the mapping you desire is the preferred approach.
Is there a way in ElasticSearch wherein I can remove some the objects in the nested field array.
So I have a nested field and it returns array of objects. I need to remove some objects in the nested field.
Is it possible to do so in the query or I need to do that in my code
These extra nested documents are hidden; we can’t access them directly. To update, add, or remove a nested object, we have to reindex the whole document. It’s important to note that, the result returned by a search request is not the nested object alone; it is the whole document.
Nested Objects Elastic search
As far as I know, In Elasticsearch you can't just remove part of a existing document. You should change the document (remove objects you don't need) and renew(rewrite) the document.
I have 2 fields type in my index;
doc1
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
}
doc2
{
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
now I need such a query that filter in same url field and returns fields both of documents:
result:
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
The results given by elasticsearch are always per document, means that if there are multiple documents satisfying your query/filter, they would always appear as a different documents in the result and never merged into a single document. Hence merging them at client side is the one option which you can use. To avoid getting complete document and just to get the relevant fields, you can use "fields" in your query.
If this is not what you need and still needs narrowing down the result from the query itself, you can use top hit aggregations. It will give you the complete list of documents under a single bucket. But it would also have source field which would contain the complete documents itself.
Try giving a read to page:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-top-hits-aggregation.html
I have a use-case, where I have got a set of predefined fields and also need to support adding dynamic fields to ElasticSearch with some basic searching on them. I am able to achieve this using dynamic template mapping. However, the frequency of adding such dynamic fields is quite high.
Consider the this ES document for the Event type:
{
"name":"Youth Conference",
"venue":"Ahmedabad",
"date":"10/01/2015",
"organizer":"Invincible",
"extensions":{
"about": {
"vision":"Visualizes the image of an ideal Country. ",
"mission":"Encapsulates the gravity of the top reformative solutions for betterment of Country."
}
// Any thing can go here..
}
}
In the example above, each event document may have any unknown/new fields. Hence, for every such new dynamic field introduced, ES will update the mapping of the type. My concern is what is the cost of adding new field mapping in the existing type?
I am planning to separate out all dynamic mappings(inside extensions) from Event type by introducing another type, say EventExtensions and using parent/child relationship to map it with Event type. I believe this may limit the cost(if any) of adding dynamic fields frequently to the type. However, to my knowledge, using parent/child relationship will need more memory.
The first thing to remember here is that field is per index and not per type.
So wherever you add new fields , it would be made in the same index. Be it , in another type or as parent or child.
So decoupling the new fields to another type but same index is not going to make any change.
Second field addition is not that very expensive thing. I know people who uses 1000 of fields and are good with it. That being said , there should be a tab on number of field so that it wont go out to crazy numbers.
Here we have multiple approaches to solve the problem
1) Lets assume that the new field data need not be exactly searchable. In this case , you can deserialize the entire JSON as a string and add it to a field. Also make sure this field is not indexed. This way you can search based on other fields but then on retrieval of the document , get the information that was deserialized.
2) Lets say the new field looks like this
{
"newInfo1" : "log Of Info",
"newInfo2" : "A lot more info"
}
Instead of this , you can use
{
"newInfo" : [
{
"fieldName" : "newInfo1",
"fieldValue" : "log Of Info"
},
{
"fieldName" : "newInfo2",
"fieldValue" : "A lot more info"
}
]
}
This way , fields wont increase. But then to make field level specific search , like give me all documents with filedName as newInfo2 and having the word more in it , you will need to make newInfo field nested.
Hope this helps.
If I have a object of class Car that has an nested object of class Engine where both classes have the field named "id" do I have to do anything special when I create the mapping? Or is it sufficient to add the type "nested" to the engine mapping.
Elasticsearch head GUI is showing unexpected rows, but the search seems to give the correct result so it would be good to know if I need to do anything else in the mapping if two or more objects have the same field name.
Seems like the structured query builder returns the engine document with the id that I search for when I select car.id from the dropdown.
There shouldn't be any problem, you can just use the dot notation to refer to the fields in the nested documents.
Also, if you have a single engine per car you don't need to declare the engine as nested in your mapping.