Add default value on a field while modifying existing elasticsearch mapping - elasticsearch

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

Related

Elasticsearch “data”: { “type”: “float” } query returns incorrect results

I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107.
When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent.
--> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ?
2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ?
I have 5 float type in my mapping, what is the fastest way to change all of them.
Note: In my mind I have below solutions but I'm afraid it's slow
update by query API
reindex API
run time search request (especially this one)
Thank you!
That date_partition field should have the date type with format=yyyyMMdd, that's the only sensible type to use, not long and even worse float.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108 and have the 20220109 document returned in the results.
Using the date type would also allow you to use proper time-based range queries and create date_histogram aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.
It can be answer of my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

Update restrictions on Elasticsearch Object type field

I have to store documents with a single field contains a single Json object. this object has a variable depth and variable schema.
I config a mapping like this:
"mappings": {
"properties": {
"#timestamp": {
"type": "date"
},
"message": {
"type": "object"
}
}
}
It works fine and Elasticsearch creates and updates mapping with documents that received.
The problem is that after some updates in mapping, it rejects new documents and do not update mapping anymore. At this time I change the indices and mapping update occurred for that indies. I'm looking forward to know the right solution.
for example the first document is:
{
personalInfo:{
fistName: "tom"
}
moviesStatistics: {
count: 100
}
}
the second document that will update Elasticsearch mapping is:
{
personalInfo:{
fistName: "tom",
lastName: "hanks"
},
moviesStatistics: {
count: 100
},
education: {
title: "a title..."
}
}
Elasticsearch creates mapping with doc1 and updates it with doc2, doc3, ... until a number of documents received. After that it starts to reject every document that is not matched to the last mapping fields.
After all I found the solution in the home page of Elasticsearch https://www.elastic.co/guide/en/elasticsearch/reference/7.13//dynamic-field-mapping.html
We can use Dynamic mapping and simply use this mapping:
"mappings": {
"dynamic": "true"
}
You should also change some default restrictions that mentioned here:
https://www.elastic.co/guide/en/elasticsearch/reference/7.13//mapping-settings-limit.html

How to update data type of a field in elasticsearch

I am publishing a data to elasticsearch using fluentd. It has a field Data.CPU which is currently set to string. Index name is health_gateway
I have made some changes in python code which is generating the data so now this field Data.CPU has now become integer. But still elasticsearch is showing it as string. How can I update it data type.
I tried running below commands in kibana dev tools:
PUT health_gateway/doc/_mapping
{
"doc" : {
"properties" : {
"Data.CPU" : {"type" : "integer"}
}
}
}
But it gave me below error:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
}
],
"type" : "illegal_argument_exception",
"reason" : "Types cannot be provided in put mapping requests, unless the include_type_name parameter is set to true."
},
"status" : 400
}
There is also this document which says using mutate we can convert the data type but I am not able to understand it properly.
I do not want to delete the index and recreate as I have created a visualization based on this index and after deleting it will also be deleted. Can anyone please help in this.
The short answer is that you can't change the mapping of a field that already exists in a given index, as explained in the official docs.
The specific error you got is because you included /doc/ in your request path (you probably wanted /<index>/_mapping), but fixing this alone won't be sufficient.
Finally, I'm not sure you really have a dot in the field name there. Last I heard it wasn't possible to use dots in field names.
Nevertheless, there are several ways forward in your situation... here are a couple of them:
Use a scripted field
You can add a scripted field to the Kibana index-pattern. It's quick to implement, but has major performance implications. You can read more about them on the Elastic blog here (especially under the heading "Match a number and return that match").
Add a new multi-field
You could add a new multifield. The example below assumes that CPU is a nested field under Data, rather than really being called Data.CPU with a literal .:
PUT health_gateway/_mapping
{
"doc": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "keyword",
"fields": {
"int": {
"type": "short"
}
}
}
}
}
}
}
}
Reindex your data within ES
Use the Reindex API. Be sure to set the correct mapping on the target index.
Delete and reindex everything from source
If you are able to regenerate the data from source in a timely manner, without disrupting users, you can simply delete the index and reingest all your data with an updated mapping.
You can update the mapping, by indexing the same field in multiple ways i.e by using multi fields.
Using the below mapping, Data.CPU.raw will be of integer type
{
"mappings": {
"properties": {
"Data": {
"properties": {
"CPU": {
"type": "string",
"fields": {
"raw": {
"type": "integer"
}
}
}
}
}
}
}
}
OR you can create a new index with correct index mapping, and reindex the data in it using the reindex API

How to limit max number of ElasticSearch documents in a index?

I've installed an Elastic Search (version 7.x) cluster and created a new index. I want to limit the maximum number of documents in this index. Let's say 10000 documents top.
The naive solution is to query the number of documents before inserting a new document into it. But this method can be not accurate and also have poor performances (2 requests...).
How to do it right?
The best practice is to use Index Life Management which is in the Basic License and enabled by default in Elastic v7.3+
You can set a rollover action on the number of document (i put 5 max docs) :
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_docs": 5
}
}
}
}
}
}
Now i create a template with the policy my_policy :
PUT _template/my_template
{
"index_patterns": [
"my-index*"
],
"settings": {
"index.blocks.read_only" : true,
"index.lifecycle.name": "my_policy",
"index.lifecycle.rollover_alias": "my-index"
}
}
Note that i put the setting "index.blocks.read_only" : true because when the rollover will be applied it will create a new index with read_only parameter.
Now i can create my index :
PUT my-index-000001
{
"settings": {
"index.blocks.read_only": false
},
"aliases": {
"my-index": {
"is_write_index": true
}
}
}
That's it ! After 5 documents, it will create a new read only index and the alias will be on writing on this one.
You can test by index some new docs with the alias :
PUT my-index/_doc/1
{
"field" : "value"
}
Also, by default the ilm policy will be applied every 10 minutes, you can change that in order to test with :
PUT /_cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "5s"
}
}

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Resources