Elastic Search indexing for many different mapping types - elasticsearch

I have implemented something like Class and Instance logic in my application where I create an object named category which is a blue print for it's instances.
User has freedom to create as many Categories as they like with whatever fields hence I used to use one new TYPE for each category in my elastic search index mapping until it was deprecated in latest upgrades.
With latest upgrades of ES , I can think of only these 2 approaches -
creating one index for each category
keeping one object type field named after the TYPE that holds fields for
each category and keep updating this one mapping every time.
I am trying to decide on which approach to take up for ES upgrade to version 7 from 5 to keep this dynamic nature of my data modelling. Searches would be governed by TYPE string that is system generated ID for each category hence need to have grouping of fields based on the category they belong to.
OLD MAPPINGS - NOW DEPRECATED
first one - one for each TYPE(category)
{
"type_cat1" : {
"dynamic" : "strict"
"mapping" :{
"field11" : {...}
}
}
}
second one and so on
{
"type_cat2" : {
"dynamic" : "strict"
"mapping" :{
"field21" : {...}
}
}
}
}
NEW MAPPING WITH OBJECTS FOR EACH OLD TYPE
{
"mapping" :{
"properties" :{
"type_cat1" : {
"properties" :{
"field11" : {...}
}
},
"type_cat2" : {
"properties" :{
"field11" : {...}
}
}
}
}
}
ALTERNATIVE NEW MAPPING - ONE INDEX PER CATEGORY (not more than 500)
One index would be created separately for each category...
Please advice if a better approach is out there or which one to choose among these...

I have a similar use-case at my workplace where the user can create an object with any number of fields, each field can be of any datatype.
Our approach is similar to one of yours:
All categories will be mapped to a single index.
Whenever a new object is created, the index mappings are updated to accommodate the new object (a category in your case).
This is what our mappings look like when molded to your needs:
{
"mappings": {
"category": { // this is a field present in all documents
"type": "keyword"
},
"createdTime": { // this is a field present in all documents
"type": "date"
},
"id": { // this is a field present in all documents
"type": "long"
},
"fields": {
"properties": {
"type_cat1": {
"properties": {
"field1": {...},
"field2": {...}
}
},
"type_cat2": {
"properties": {
"field1": {...},
"field2": {...}
}
},
{...}
}
}
}
Get all records of a certain category:
"category": "cat1"
Get all records of cat1 where field2 == "dummy_value"
"category": "cat1 AND "fields.cat1.field2.keyword": "dummy_value"
When a new category is created, the fields part of our mappings get updated.
Extracting out the common fields (category, createdTime, id) eliminates redundancy in mappings.
Some worthy points:
As the number of unique categories is only 500, you can also go with a separate index per category. This is more beneficial if there are going to be many records (> 1,00,000) per category.
If the categories are sparse in nature (each category has less number records), then ES can easily handle everything in a single index.
If we assume 50 fields per category on average, then the total fields in the single index approach will be 50*500 = 25000. This is a manageable number.
Of course, in the end, many things will depend upon resources allocated to the cluster.

Related

Update restrictions on Elasticsearch Object type field

I have to store documents with a single field contains a single Json object. this object has a variable depth and variable schema.
I config a mapping like this:
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "object"
}
}
}
It works fine and Elasticsearch creates and updates mapping with documents that received.
The problem is that after some updates in mapping, it rejects new documents and do not update mapping anymore. At this time I change the indices and mapping update occurred for that indies. I'm looking forward to know the right solution.
for example the first document is:
{
personalInfo:{
fistName: "tom"
}
moviesStatistics: {
count: 100
}
}
the second document that will update Elasticsearch mapping is:
{
personalInfo:{
fistName: "tom",
lastName: "hanks"
},
moviesStatistics: {
count: 100
},
education: {
title: "a title..."
}
}
Elasticsearch creates mapping with doc1 and updates it with doc2, doc3, ... until a number of documents received. After that it starts to reject every document that is not matched to the last mapping fields.
After all I found the solution in the home page of Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.13//dynamic-field-mapping.html
We can use Dynamic mapping and simply use this mapping:
"mappings": {
"dynamic": "true"
}
You should also change some default restrictions that mentioned here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.13//mapping-settings-limit.html

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

Is it possible to update an existing field in an index through mapping in Elasticsearch?

I've already created an index, and it contains data from my MySQL database. I've got few fields which are string in my table, where I need them as different types (integer & double) in Elasticsearch.
So I'm aware that I could do it through mapping as follows:
{
"mappings": {
"my_type": {
"properties": {
"userid": {
"type": "text",
"fielddata": true
},
"responsecode": {
"type": "integer"
},
"chargeamount": {
"type": "double"
}
}
}
}
}
But I've tried this when I'm creating the index as a new one. What I wanted to know is how can I update an existing field (ie: chargeamount in this scenario) using mapping as a PUT?
Is this possible? Any help could be appreciated.
Once a mapping type has been created, you're very constrained on what you can update. According to the official documentation, the only changes you can make to an existing mapping after it's been created are the following, but changing a field's type is not one of them:
In general, the mapping for existing fields cannot be updated. There
are some exceptions to this rule. For instance:
new properties can be added to Object datatype fields.
new multi-fields can be added to existing fields.
doc_values can be disabled, but not enabled.
the ignore_above parameter can be updated.

ElasticSearch performance when querying by element type

Assume that we have a dataset containing a collection of domains { domain.com, domain2.com } and also a collection of users { user#domain.com, angryuser#domain2.com, elastic#domain3.com }.
Being so lets assume that both domains and users have several attributes in common, such as "domain", and when the attribute name matches, also do the mapping and possible values.
Then we load up our elasticsearch index with all collections separating them by type, domain and user.
Obviously in our system we would have many more users compared to domains so when querying for domain related data, the expectation is that it would be much faster to filter the query by the type of the attribute right?
My question is, having around 5 million users and 200k domains, why is that when my index only contains domain data, users were deleted, queries run much faster than filtering the objects based on their type? Shouldn't it be at least around similar performance ? On my current status we can match 20 domains per second if there are no users on the index, but it drops to 4 when we load up the users, even though we still filter by type.
Maybe it is something that im missing as im new to elasticsearch.
UPDATE:
This is the query basically
"query" : {
"flt_field": {
"domain_address": {
"like_text": "chroma",
"fuzziness": 0.3
}
}
}
And the mapping is something like this
"user": {
"properties": {
...,
"domain_address": {
"type": "string",
"boost": 2.4,
"similarity": "linear"
}
}
},
"domain": {
"properties": {
...,
"domain_address": {
"type": "string",
"boost": 2.4,
"similarity": "linear"
}
}
}
Other fields in .... but their mapping should not influence the outcome ???

Avoid mapping multiple fields in elastic search

I have the following problem when indexing documents in elasticsearch, my documents contain some fields that are not repeated in other documents, so I end having a mapping of more than 100.000 elements. Let's see an example:
If I send something like this to an empty index:
{"example":{
"a1":123,
"a2":444,
"a3":52566,
"a4":7,
.....
"aN":11
}
}
It will create the following mapping:
{"example" : {
"properties" : {
"a1" : {
"type" : "long"
},
"a2" : {
"type" : "long"
},
"a3" : {
"type" : "long"
},
"a4" : {
"type" : "long"
},
.....
"aN" : {
"type" : "long"
}
}
}
}
Then if I send another document:
{"example":{
"b1":123,
"b2":444,
"b3":52566,
"b4":7,
.....
"bN":11
}
}
It will create a mapping double as the one above.
The object is more complex than this, but the situation that I'm having now is that the mapping is that big that is killing the server.
How can I address this? is the multifield working in this scenario? I tried in several ways but it doesn't seem to work.
Thanks.
It is pretty tough to give you a definitive answer given we have no idea of your usecase, but my initial guess is that if you have a mapping of thousands of fields that have no logical bond you've probably made some wrong choices about the architecture of your data. Could you tell us why you need to have thousands of fields that have different names for a single document type ? As it is there's not much we can do to pinpoint you into the right direction.
If you really want to do so, create mapping as on example below:
POST /index_name/_mapping/type_name
{
"type_name": {
"enabled": false
}
}
It will give required behavior. elasticsearch will stop to create mapping for fields, as well as parsing and indexing of your documents.
See these links to get more information:
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-dynamic-mapping.html
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-object-type.html#_enabled_3

Resources