How to limit max number of ElasticSearch documents in a index? - elasticsearch

I've installed an Elastic Search (version 7.x) cluster and created a new index. I want to limit the maximum number of documents in this index. Let's say 10000 documents top.
The naive solution is to query the number of documents before inserting a new document into it. But this method can be not accurate and also have poor performances (2 requests...).
How to do it right?

The best practice is to use Index Life Management which is in the Basic License and enabled by default in Elastic v7.3+
You can set a rollover action on the number of document (i put 5 max docs) :
PUT _ilm/policy/my_policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_docs": 5
}
}
}
}
}
}
Now i create a template with the policy my_policy :
PUT _template/my_template
{
"index_patterns": [
"my-index*"
],
"settings": {
"index.blocks.read_only" : true,
"index.lifecycle.name": "my_policy",
"index.lifecycle.rollover_alias": "my-index"
}
}
Note that i put the setting "index.blocks.read_only" : true because when the rollover will be applied it will create a new index with read_only parameter.
Now i can create my index :
PUT my-index-000001
{
"settings": {
"index.blocks.read_only": false
},
"aliases": {
"my-index": {
"is_write_index": true
}
}
}
That's it ! After 5 documents, it will create a new read only index and the alias will be on writing on this one.
You can test by index some new docs with the alias :
PUT my-index/_doc/1
{
"field" : "value"
}
Also, by default the ilm policy will be applied every 10 minutes, you can change that in order to test with :
PUT /_cluster/settings
{
"persistent": {
"indices.lifecycle.poll_interval": "5s"
}
}

Related

Elasticsearch “data”: { “type”: “float” } query returns incorrect results

I have a query like below and when date_partition field is "type" => "float" it returns queries like 20220109, 20220108, 20220107.
When field "type" => "long", it only returns 20220109 query. Which is what I want.
Each queries below, the result is returned as if the query 20220119 was sent.
--> 20220109, 20220108, 20220107
PUT date
{
"mappings": {
"properties": {
"date_partition_float": {
"type": "float"
},
"date_partition_long": {
"type": "long"
}
}
}
}
POST date/_doc
{
"date_partition_float": "20220109",
"date_partition_long": "20220109"
}
#its return the query
GET date/_search
{
"query": {
"match": {
"date_partition_float": "20220108"
}
}
}
#nothing return
GET date/_search
{
"query": {
"match": {
"date_partition_long": "20220108"
}
}
}
Is this a bug or is this how float type works ?
2 years of data loaded to Elasticsearch (like day-1, day-2) (20 gb pri shard size per day)(total 15 TB) what is the best way to change the type of just this field ?
I have 5 float type in my mapping, what is the fastest way to change all of them.
Note: In my mind I have below solutions but I'm afraid it's slow
update by query API
reindex API
run time search request (especially this one)
Thank you!
That date_partition field should have the date type with format=yyyyMMdd, that's the only sensible type to use, not long and even worse float.
PUT date
{
"mappings": {
"properties": {
"date_partition": {
"type": "date",
"format": "yyyyMMdd"
}
}
}
}
It's not logical to query for 20220108 and have the 20220109 document returned in the results.
Using the date type would also allow you to use proper time-based range queries and create date_histogram aggregations on your data.
You can either recreate the index with the adequate type and reindex your data, or add a new field to your existing index and update it by query. Both options are valid.
It can be answer of my question => https://discuss.elastic.co/t/elasticsearch-data-type-float-returns-incorrect-results/300335

Elasticsearch alias not pointing to new indexes created by the rollover strategy

Elasticsearch: 7.15.2
We have an alias created in this way:
POST _aliases
{
"actions": [
{
"add": {
"index": "my-index-*",
"alias": "my-alias",
"is_write_index": false
}
}
]
}
And if I get the alias info I can see
GET _alias/my-alias
{
"my-index-2021.11.30-000001" : {
"aliases" : {
"my-alias" : { }
}
}
}
However there is another index which has been created automatically by the rollover policy: my-index-2021.11.30-000002 but this index is not pointed by the alias created before my-alias
If I create a new alias from scratch with the same index pattern I can see both:
POST _aliases
{
"actions": [
{
"add": {
"index": "my-index-*",
"alias": "my-alias-2",
"is_write_index": false
}
}
]
}
GET _alias/my-alias-2
{
"my-index-2021.11.30-000001" : {
"aliases" : {
"my-alias-2" : { }
}
},
"my-index-2021.11.30-000002" : {
"aliases" : {
"my-alias-2" : { }
}
}
}
Is there something that I am missing? I was expecting to see also the *-000002 index pointed by the alias my-alias without any manual operation.
The rollover policy is just creating a new index if the index size is grater then X GBs
or maybe do I have to modify the index template in order to add the "read" alias automatically? I have already the alias specified in the template but that is the write alias for the rollover policy (which I cannot use for search because our custom elasticsearch configuration)
{
"index": {
"lifecycle": {
"rollover_alias": "my-write-alias"
}
}
}
When you create an alias using POST _aliases it will just create the alias on the matching indexes that currently exist, but if a new index is created later and matches the criteria, the alias will not be added to that index.
What you need to do is to:
create an index template containing your alias definition
assign your rollover policy to your template index setting (i.e. the index.lifecycle.name and index.lifecycle.rollover_alias settings)
Basically, like this:
PUT _index_template/my_index_template
{
"index_patterns": ["my-index-*"],
"template": {
"alias": {
"my-alias": {}
},
"mappings": {
...
},
"settings": {
"index.lifecycle.name": "my_policy",
"index.lifecycle.rollover_alias": "my_write_alias"
}
}
}
After this is set up, every time the lifecycle policy creates a new index my-index-2021.MM.dd-000xyz, that new index will be pointed to by my-alias. Also worth noting that my_write_alias will always point to the latest index of the sequence, i.e. the write index.

elasticsearch template doesn't change index ILM

in my elasticsearch, I will receive daily index with format like dstack-prod_dcbs-. I want to add ILM to them, immediately after they are revived. I dont know why ILM are not added to indexs. below you can find my command.(I have already defined "dstack-prod_dcbs-policy" ILM)
*PUT _template/dstack-prod_dcbs
{
"index_patterns": ["dstack-prod_dcbs-*"],
"settings": {
"index.lifecycle.name": "dstack-prod_dcbs-policy"
}
}*
but when I run
GET dstack-prod_dcbs/_ilm/explain*
below result returns
*{
"indices" : {
"dstack-prod_dcbs-20200821" : {
"index" : "dstack-prod_dcbs-20200821",
"managed" : false
},
"dstack-prod_dcbs-2020-09-22" : {
"index" : "dstack-prod_dcbs-2020-09-22",
"managed" : false
}
}
}*
I believe ILM is an alternative to using daily indices where indices are rolled over when a condition is met in the policy (not when it becomes a new day)
For ILM you need to define a rollover alias for the template
PUT _template/dstack-prod_dcbs
{
"index_patterns": ["dstack-prod_dcbs-*"],
"settings": {
"index.lifecycle.name": "dstack-prod_dcbs-policy",
"index.lifecycle.rollover_alias": "dstack-prod_dcbs"
}
}
Then you need to create the first index manually and assign it as the write index for the alias
PUT dstack-prod_dcbs-000001
{
"aliases": {
"dstack-prod_dcbs":{
"is_write_index": true
}
}
}
After that everything will be handled automatically and a new index will be created on rollover which will be then assigned as the write index for the alias

Add default value on a field while modifying existing elasticsearch mapping

Let's say I've an elasticsearch index with around 10M documents on it. Now I need to add a new filed with a default value e.g is_hotel_type=0 for each and every ES document. Later I'll update as per my requirments.
To do that I've modified myindex with a PUT request like below-
PUT myindex
{
"mappings": {
"rp": {
"properties": {
"is_hotel_type": {
"type": "integer"
}
}
}
}
}
Then run a painless script query with POST to update all the existing documents with the value is_hotel_type=0
POST myindex/_update_by_query
{
"query": {
"match_all": {}
},
"script" : "ctx._source.is_hotel_type = 0;"
}
But this process is very time consuming for a large index with 10M documents. Usually we can set default values on SQL while creating new columns. So my question-
Is there any way in Elasticsearch so I can add a new field with a default value.I've tried below PUT request with null_value but it doesn't work for.
PUT myindex/_mapping/rp
{
"properties": {
"is_hotel_type": {
"type": "integer",
"null_value" : 0
}
}
}
I just want to know is there any other way to do that without the script query?

How to use mapping in elasticsearch?

After treating logs with logstash, All my fields have the same type 'STRING so i want to use mapping in elasticsearch to change some type like ip, port ect.. whereas i don't know how to do it, i'm a super beginner in ElasticSearch..
Any help ?
The first thing to do would be to install the Marvel plugin in Elasticsearch. It allows you to work with the Elasticsearch REST API very easily - to index documents, modify mappings, etc.
Go to the Elasticsearch folder and run:
bin/plugin -i elasticsearch/marvel/latest
Then go to http://localhost:9200/_plugin/marvel/sense/index.html to access Marvel Sense from which you can send commands. Marvel itself provides you with a dashboard about Elasticsearch indices, performance stats, etc.: http://localhost:9200/_plugin/marvel/
In Sense, you can run:
GET /_cat/indices
to learn what indices exist in your Elasticsearch instance.
Let's say there is an index called logstash.
You can check its mapping by running:
GET /logstash/_mapping
Elasticsearch will return a JSON document that describes the mapping of the index. It could be something like:
{
"logstash": {
"mappings": {
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "String"
},
"y": {
"type": "String"
}
}
}
}
}
}
}
}
...in this case doc is the document type (collection) in which you index documents. In Sense, you could index a document as follows:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
... that's a command to index the JSON object under the id 1.
Once a document field such as Foo.x has a type String, it cannot be changed to a number. You have to set the mapping first and then reindex.
First delete the index:
DELETE logstash
Then create the index and set the mapping as follows:
PUT logstash
PUT logstash/doc/_mapping
{
"doc": {
"properties": {
"Foo": {
"properties": {
"x": {
"type": "long"
},
"y": {
"type": "long"
}
}
}
}
}
}
Now, even if you index a doc with the properties as JSON strings, Elastisearch will convert them to numbers:
PUT logstash/doc/1
{
"Foo": {
"x":"500",
"y":"200"
}
}
Search for the new doc:
GET logstash/_search
Notice that the returned document, in the _source field, looks exactly the way you sent it to Elasticsearch - that's on purpose, Elasticsearch always preserves the original doc this way. The properties are indexed as numbers though. You can run a range query to confirm:
GET logstash/_search
{
"query":{
"range" : {
"Foo.x" : {
"gte" : 500
}
}
}
}
With respect to Logstash, you might want to set a mapping template for index name logstash-* since Logstash creates new indices automatically: http://www.elastic.co/guide/en/elasticsearch/reference/1.5/indices-templates.html

Resources