I'm reading about mapping in elasticsearch and I see these 2 terms: Nested-field & Depth. I think these 2 terms are quite equivalent. I'm currently confused by these 2. Please can anyone clear me out? Thank you.
And btw, are there any ways to check a document depth via Kibana?
Sorry for my english.
The source of confusion is probably because in Elasticsearch term nested can be used in two different contexts:
"nested" as a regular JSON notation nested, i.e. JSON object within JSON object;
"nested" as Elasticsearch nested data type.
In the mappings documentation page when they mention "depth" they refer to the first meaning. Here the setting index.mapping.depth.limit defines how deeply nested can your JSON documents be.
How is JSON depth interpreted by Elasticsearch mapping?
Here is an example of JSON document with depth 1:
{
"name": "John",
"age": 30
}
Now with depth 2:
{
"name": "John",
"age": 30,
"cars": {
"car1": "Ford",
"car2": "BMW",
"car3": "Fiat"
}
}
By default (as of ES 6.3) the depth cannot exceed 20.
What is a nested data type and why isn't it the same as a document with depth>1?
nested data type allows to index arrays of objects and query their items individually via nested query. What this means is that Elasticsearch will index a document with such fields differently (see the page Nested Objects of the Definitive Guide for more explanation).
For instance, if in the following example we do not define "user" as nested field in the mapping, a query for user.first: John and user.last: White will return a match and it will be a mistake:
{
"group" : "fans",
"user" : [
{
"first" : "John",
"last" : "Smith"
},
{
"first" : "Alice",
"last" : "White"
}
]
}
If we do, Elasticsearch will index each item of the "user" list as an implicit sub-document and thus will use more resources, more disk and memory. This is why there is also another setting on the mappings: index.mapping.nested_fields.limit regulates how many different nested fields one can declare (which defaults to 50). To customize this you can see this answer.
So, Elasticsearch documents with depth > 1 are not indexed as nested unless you explicitly ask it to do so, and that's the difference.
Can I have nested fields inside nested?
Yes, you can! Just to stop this confusion, yes, you can define a nested field inside nested field in a mapping. It will look something like this:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"user": {
"type": "nested",
"properties": {
"name": {
"type": "keyword"
},
"cars": {
"type": "nested",
"properties": {
"brand": {
"type": "keyword"
}
}
}
}
}
}
}
}
}
But keep in mind that the amount of implicit documents to be indexed will be multiplied, and it will be simply not that efficient.
Can I get the depth of my JSON objects from Kibana?
Most likely you can do it with scripts, check this blog post for further details: Using Painless in Kibana scripted fields.
Related
I have an index that contains documents of different types (not talking about _type here) and each document has a field document_type that states their type. Is it possible to define mappings for each type of document within this index?
Is it possible to define mappings for each type of document within this index?
No, if you think of using the same field name with different types. For instance, field name id of type string and integer won't work.
Having different document_type basically indicates different domains. What you could do is to group information under each respective domain or type. For instance, an employee and project, both have an id and name, but different types in this example. Some call that nesting.
An example index mapping:
PUT example
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"employee": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 64
}
}
}
}
},
"project": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword",
"ignore_above": 32
}
}
}
}
}
}
}
If you write the information, with different types.
PUT example/doc/1
{
"employee": {
"id": 4711,
"name": "John Doe"
},
"project": {
"id": "Project X",
"name": "Firebrand"
}
}
Others would argue to store employee and project in separate indices. This approach depends on your scenario and is also desirable. You allow both domains to evolve separately from each other.
Having a separate employee and project index gives you an advantage regarding maintenance. For querying some would argue, that you can group than with an alias. In the above example, it doesn't make sense since the field types are different. A search for the name over an analysed text field is different than over a keyword. Querying makes sense if you have the same field type.
No, if you want to use a single index, you would need to define a single mapping that combines the fields of each document type.
A better way might be to define separate indices on the same cluster for each document type. You can then create a single index alias that aliases to both of those indices if you want to be able to query across document types. Be sure that all fields that exist in both documents have the same data type in both mappings.
Having a single field name with more than one mapping type in the same index is not possible. Two options I can think of:
1. Separate the different doc types to separate indices.
2. Use different fields names for different doc types, so that each name can have different mapping. You can also use nesting, like: type_a.my_field and type_b.my_field, both in the same index.
How can I use, create two index or what?
I have one entity goods and one entity shop, should I create two index or two type in elastic search 6?
I have tried two mapping two type but it throw Exception.
How Can I do ?
In elacticsearch 6, you cannot create more than one doc type for an index. Earlier for an index company you could have doc type employee, infra, 'building' etc but now you it will throw an error.
In future versions doc type will be completely removed, so you will only have to deal with index.
An index in the elasticsearch is like table in normal database. And every document that you store will be row, and fields of that document will be columns.
Without seeing the data and knowing what you want to accomplish it is pretty hard to suggest how you should plan the schema of elasticsearch, but these information can help you decide.
you can use one of these two options:
1)Index per document type
2)Custom type field
for option 2:
PUT twitter
{
"mappings": {
"goods": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" },
}
},
"shop": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "date" }
}
}
}
}
see this
I have a document storage with multiple types. Each document type has some basic metadata, like uuid, and a single "entity" field holding a stringified json with the actual content. This is because the document, event though it has a type, does not have a strict schema and any user can provide data in any structure.
I need to be able to browse, filter and search through these documents so I will be putting them into ElasticSearch.
My question is: how should I structure the ES? I have read that having too many indexes is not good for ES and that it is better to have as least indexes as possible. But ES also does not like if documents of the same type have different structure(mapping) + you cannot change mapping for existing fields, only append for new ones.
The "schema" is fixed for every document type and user so I could create new index for each user with the same type(s) in it but as I've mentioned, having lots of indexes is bad.
So what is the recommended design in such case?
This might sound crazy but would it be feasible to parse the document into key/value format where the key would be the property path? The only issues I see here is that everything would have to be set as fulltext which does not sound like a good idea.
Edit: seems like ES does this on its own https://www.elastic.co/guide/en/elasticsearch/reference/current/object.html but I'm still not sure what to do.
What you could do is to have an array of nested object types with a key and value fields, i.e. your mapping would look like
"entity": {
"type": "nested",
"properties": {
"key": {
"type": "keyword"
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
That way you can store pretty much anything you want in the entity field without risking a mapping type explosion, for instance
{
"uuid": "",
"entity": [
{"key": "myfield1", "value": "Some value"},
{"key": "myfield2", "value": "Some value"},
{"key": "myfield3", "value": "Some value"}
]
}
Then you'll have to make sure to use nested queries when querying your data but it's definitely feasible.
I'm using Elastic 1.7.3 and I would like to have a boost on some fields in a index with documents like this fictional example :
{
title: "Mickey Mouse",
content: "Mickey Mouse is a fictional ...",
related_articles: [
{"title": "Donald Duck"},
{"title": "Goofy"}
]
}
Here eg: title is really important, content too, related_articles is a bit more important. My real document have lot of fields and nested object.
I would like to give more weight to the title field than content, and more to content than related_articles.
I have seen the title^5 way, but I must use it at each query and I must (I guess) list all my fields instead of a "_all" query.
I do a lot of search but I found lot of deprecated solutions (_boost by eg).
As I used to work with Sphinx : I search something that works like the field weight option where you can give some weight to field that are really important in your index than others.
You're right that the _boost meta-field that you could use at the type level has been deprecated.
But you can still use the boost property when defining each field in your mapping, which will boost your field at indexing time.
Your mapping would look like this:
{
"my_type": {
"properties": {
"title": {
"type": "string", "boost": 5
},
"content": {
"type": "string", "boost": 4
},
"related_articles": {
"type": "nested",
"properties": {
"title": {
"type": "string", "boost": 3
}
}
}
}
}
}
You have to be aware, though, that it's not necessarily a good idea to boost your field at index time, because once set, you cannot change it unless you are willing to re-index all of your documents, whereas using query-time boosting achieves the same effect and can be changed more easily.
Lets say I have the following mapping:
"site": {
"properties": {
"title": { "type": "string" },
"description": { "type": "string" },
"category": { "type": "string" },
"tags": { "type": "array" },
"point": { "type": "geo_point" }
"localities": {
type: 'nested',
properties: {
"title": { "type": "string" },
"description": { "type": "string" },
"point": { "type": "geo_point" }
}
}
}
}
I'm then doing an "_geo_distance" sort on the parent document and am able to sort the documents on "site.point". However I would also like the nested localities to be sorted by "_geo_distance", inside the parent document.
Is this possible? If so, how?
Unfortunately, no (at least not yet).
A query in ElasticSearch just identifies which documents match the query, and how well they match.
To understand what nested documents are useful for, consider this example:
{
"title": "My post",
"body": "Text in my body...",
"followers": [
{
"name": "Joe",
"status": "active"
},
{
"name": "Mary",
"status": "pending"
},
]
}
The above JSON, once indexed in ES, is functionally equivalent to the following. Note how the followers field has been flattened:
{
"title": "My post",
"body": "Text in my body...",
"followers.name": ["Joe","Mary"],
"followers.status": ["active","pending"]
}
A search for: followers with status == active and name == Mary would match this document... incorrectly.
Nested fields allow us to work around this limitation. If the followers field is declared to be of type nested instead of type object then its contents are created as a separate (invisible) sub-document internally. That means that we can use a nested query or nested filter to query these nested documents as individual docs.
However, the output from the nested query/filter clauses only tells us if the main doc matches, and how well it matches. It doesn't even tell us which of the nested docs matched. To figure that out, we'd have to write code in our application to check each of the nested docs against our search criteria.
There are a few open issues requesting the addition of these features, but it is not an easy problem to solve.
The only way to achieve what you want is to index your sub-docs as separate documents, and to query and sort them independently. It may be useful to establish a parent-child relationship between the main doc and these separate sub-docs. (see parent-type mapping, the Parent & Child section of the index api docs, and the top-children and has-child queries.
Also, an ES user has mailed the list about a new has_parent filter that they are currently working on in a fork. However, this is not available in the main ES repo yet.