elastic search 6 - use one or two type in one index? - elasticsearch

How can I use, create two index or what?
I have one entity goods and one entity shop, should I create two index or two type in elastic search 6?
I have tried two mapping two type but it throw Exception.
How Can I do ?

In elacticsearch 6, you cannot create more than one doc type for an index. Earlier for an index company you could have doc type employee, infra, 'building' etc but now you it will throw an error.
In future versions doc type will be completely removed, so you will only have to deal with index.
An index in the elasticsearch is like table in normal database. And every document that you store will be row, and fields of that document will be columns.
Without seeing the data and knowing what you want to accomplish it is pretty hard to suggest how you should plan the schema of elasticsearch, but these information can help you decide.

you can use one of these two options:
1)Index per document type
2)Custom type field
for option 2:
PUT twitter
{
"mappings": {
"goods": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "keyword" },
}
},
"shop": {
"properties": {
"field1": { "type": "text" },
"field2": { "type": "date" }
}
}
}
}
see this

Related

Good practice for ElasticSearch index mapping

Im new to Elasticsearch and I would like to know if there are any good practices for the use case I have.
I have heterogeneous data sent from an API that I save into a database (as a JSON) then save in Elasticsearch for search purposes. The data in sent in this format (because it's heterogeneous, the users can send any type of data, some metadata can be multivalued, other single values and the name of the key in the JSON may vary :)
{
"indices":{
"MultipleIndices":[
{
"index":"editors",
"values":[
"The Editing House",
"Volcan Editing"
]
},
{
"index":"colors",
"values":[
"Red",
"Blue"
]
}
],
"SimpleIndices":[
{
"index":"AuthorName",
"value": "George R. R. Martin"
},
{
"index":"NumberOfPages",
"value":"2898"
},
{
"index":"BookType",
"value":"Fantasy"
}
]
}
}
Once we receive this JSON, its formatted in the code and stored as a JSON in a database with this format :
{
"indices":{
"editors":[
"The Editing House",
"Volcan Editing"
],
"colors":[
"Red",
"Blue"
],
"AuthorName" : "George R. R. Martin"
"NumberOfPages" : "2898",
"BookType" : "Fantasy"
}
}
I then want to save this data into Elasticsearch, what's the best way I can map it ? Store it as a JSON in one field ? Will the search be efficilent if I do it this way ?
You must mapping each field individually.
You can take a look at field types to understand which type is ideal for your schema.
Another suggestion is to study the text analysis because it is responsible for the process of structuring the text to optimize the search.
My suggestion map:
PUT indices
{
"mappings": {
"properties": {
"editors": {
"type": "keyword"
},
"colors":{
"type": "keyword"
},
"author_name":{
"type": "text"
},
"number_pages":{
"type": "integer"
},
"book_type":{
"type": "keyword"
}
}
}
}
I think in your case, you don't have much choice apart from dynamic mapping, which Elasticsearch will generate for you as soon as first document is index in a particular index.
However, you can improve the process by using the dynamic template so that you can optimize your mapping, there is good examples of that in the official link I provided.

Elasticsearch Index Design based on very big nested array (could have more than 300,000 records)

We have a following index schema:
PUT index_data
{
"aliases": {},
"mappings": {
"properties": {
"Id": {
"type": "integer"
},
"title": {
"type": "text"
},
"data": {
"properties": {
"FieldName": {
"type": "text"
},
"FieldValue": {
"type": "text"
}
},
"type": "nested"
}
}
}
}
Id is a unique identifier. Here data field is an array and it could have more than 300,000 objects and may be more. Is it sensible and correct way to index this kind of data? Or we should change our design and make the schema like following:
PUT index_data
{
"aliases": {},
"mappings": {
"properties": {
"Id": {
"type": "integer"
},
"title": {
"type": "text"
},
"FieldName": {
"type": "text"
},
"FieldValue": {
"type": "text"
}
}
}
}
In this design, we cant use Id as a document id because with this design, id would be repeating. If we have 300,000 FieldName and FieldValues for one Id, Id would be repeating 300,000 time. The challenge here is to generate our custom id using some mechanism. Because we need to handle both insert and update cases.
In the first approach one document size would be too large so that it could contain an array of 300,000 objects or may be more.
In second approach, we would have too many documents. 75370611530 is the number we currently have. This is the number of FieldNames and FieldValues we have. How should we handle this kind of data? Which approach would be better? What should be the size of shards in this index?
I noticed that the current mapping is not nested. I assume you would need to be nested as the query seems to be "find value for key = key1".
If it is known that 300K objects are expected - It may not be a good idea. ES soft limit is 10K. Indexing issues are going to give trouble with this approach in addition to possible slow queries.
I doubt if indexing 75 billion documents for this purpose is useful - given the resources required, though it is feasible and will work.
May be consider RDBMS?

Creating an inverted index of a numberic field in Elasticsearch

I have a dataset of about 20 million records, with the following structure:
{"id": "123",
"cites":[
{"id":"234", "date":"2018-05-04"},
{"id":"456","date":"2018-02-01"}]
}
and I would like to make an index where I can see the list of the ids that cite an article, something like
{"id":"234", "cited_by":[{"id":"123"},{"id:"188"}]}
Which I understand is technically an inverted index. This can be static, so it could be computed just a single time. I've only seen documentation about inverted indices used for terms and their frequency in phrases, which is a very different use case.
I looked into using aggregations, but because the number of different ids is too large it runs out of buckets, and I am not sure 20 million buckets are possible and/or a good idea.
How could I generate this index? Is it possible in ElasticSearch, or would I need to write an external script that does this in batches?
Thank you so much!
No problem to use ElasicSearch in your case.
Script to make the index
PUT /city_index
{
"mappings": {
"citydata": {
"dynamic": "false",
"properties": {
"id": {
"type": "keyword"
},
"cited_by": {
"type": "object",
"properties": {
"id": {
"type": "keyword"
}
}
}
}
}
}
}

Multiple mappings for one index in Elasticsearch 6

I have an index that contains documents of different types (not talking about _type here) and each document has a field document_type that states their type. Is it possible to define mappings for each type of document within this index?
Is it possible to define mappings for each type of document within this index?
No, if you think of using the same field name with different types. For instance, field name id of type string and integer won't work.
Having different document_type basically indicates different domains. What you could do is to group information under each respective domain or type. For instance, an employee and project, both have an id and name, but different types in this example. Some call that nesting.
An example index mapping:
PUT example
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"doc": {
"properties": {
"employee": {
"properties": {
"id": {
"type": "integer"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 64
}
}
}
}
},
"project": {
"properties": {
"id": {
"type": "keyword"
},
"name": {
"type": "keyword",
"ignore_above": 32
}
}
}
}
}
}
}
If you write the information, with different types.
PUT example/doc/1
{
"employee": {
"id": 4711,
"name": "John Doe"
},
"project": {
"id": "Project X",
"name": "Firebrand"
}
}
Others would argue to store employee and project in separate indices. This approach depends on your scenario and is also desirable. You allow both domains to evolve separately from each other.
Having a separate employee and project index gives you an advantage regarding maintenance. For querying some would argue, that you can group than with an alias. In the above example, it doesn't make sense since the field types are different. A search for the name over an analysed text field is different than over a keyword. Querying makes sense if you have the same field type.
No, if you want to use a single index, you would need to define a single mapping that combines the fields of each document type.
A better way might be to define separate indices on the same cluster for each document type. You can then create a single index alias that aliases to both of those indices if you want to be able to query across document types. Be sure that all fields that exist in both documents have the same data type in both mappings.
Having a single field name with more than one mapping type in the same index is not possible. Two options I can think of:
1. Separate the different doc types to separate indices.
2. Use different fields names for different doc types, so that each name can have different mapping. You can also use nesting, like: type_a.my_field and type_b.my_field, both in the same index.

Elasticsearch custom mapping definition

I have to upload data to elk in the following format:
{
"location":{
"timestamp":1522751098000,
"resources":[
{
"resource":{
"name":"Node1"
},
"probability":0.1
},
{
"resource":{
"name":"Node2"
},
"probability":0.01
}]
}
}
I'm trying to define a mapping this kind of data and I produced he following mapping:
{
"mappings": {
"doc": {
"properties": {
"location": {
"properties" : {
"timestamp": {"type": "date"},
"resources": []
}
}
}
}
}
I have 2 questions:
how can I define the "resources" array in my mapping?
is it possible to define a custom type (e.g. resource) and use this type in my mapping (e.g "resources": [{type:resource}]) ?
There is a lot of things to know about the Elasticsearch mapping. I really highly suggest to read through at least some of their documentation.
Short answers first, in case you don't care:
Elasticsearch automatically allows storing one or multiple values of defined objects, there is no need to specify an array. See Marker 1 or refer to their documentation on array types.
I don't think there is. Since Elasticsearch 6 only 1 type per index is allowed. Nested objects is probably the closest, but you define them in the same file. Nested objects are stored in a separate index (internally).
Long answer and some thoughts
Take a look at the following mapping:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resources": { [1]
"type": "nested", [2]
"properties": {
"resource": {
"properties": {
"name": { [3]
"type": "text"
}
}
},
"probability": {
"type": "float"
}
}
}
}
}
}
}
}
This is how your mapping could look like. It can be done differently, but I think it makes sense this way - maybe except marker 3. I'll come to these right now:
Marker 1: If you define a field, you usually give it a type. I defined resources as a nested type, but your timestamp is of type date. Elasticsearch automatically allows storing one or multiple values of these objects. timestamp could actually also contain an array of dates, there is no need to specify an array.
Marker 2: I defined resources as a nested type, but it could also be an object like resource a little below (where no type is given). Read about nested objects here. In the end I don't know what your queries would look like, so not sure if you really need the nested type.
Marker 3: I want to address two things here. First, I want to mention again that resource is defined as a normal object with property name. You could do that for resources as well.
Second thing is more a thought-provoking impulse: Don't take it too seriously if something absolutely doesn't fit your case. Just take it as an opinion.
This mapping structure looks very inspired by a relational database approach. I think you usually want to define document structures for elasticsearch more for the expected searches. Redundancy is not a problem, but nested objects can make your queries complicated. I think I would omit the whole resources part and do it something like this:
"mappings": {
"doc": {
"properties": {
"location": {
"properties": {
"timestamp": {
"type": "date"
},
"resource": {
"properties": {
"resourceName": {
"type": "text"
}
"resourceProbability": {
"type": "float"
}
}
}
}
}
}
}
}
Because as I said, in this case resource can contain an array of objects, each having a resourceName and a resourceProbability.

Resources