How to get creation time of indices in elastic search using Jest - elasticsearch

I am trying to delete the indexes from elasticsearch which are created 24 hours before. I am not finding a way to get the creation time of indices for the particular node. Using spring boot elastic search, this can be accomplished. However, I am using the Jest API.

You can get the settings.index.creation_date value that was stored at index creation time.
With curl you can get it easily using:
curl -XGET localhost:9200/your_index/_settings
You get:
{
"your_index" : {
"settings" : {
"index" : {
"creation_date" : "1460663685415", <--- this is what you're looking for
"number_of_shards" : "5",
"number_of_replicas" : "1",
"version" : {
"created" : "1040599"
},
"uuid" : "dIG5GYsMTueOwONu4RGSQw"
}
}
}
}
With Jest, you can get the same value using:
import io.searchbox.indices.settings.GetSettings;
GetSettings getSettings = new GetSettings.Builder().build();
JestResult result = client.execute(getSettings);
You can then use JestResult in order to find the creation_date
If I may suggest something, curator would be a much handier tool for achieving what you need.
Simply run this once a day:
curator delete indices --older-than 1 --time-unit days

Related

Elasticsearch "_cat/indices" api update delayed until search?

Elasticsearch (v7.9.2) got an api _cat/indices to show index status, the last change made to docs.count seems not visiable, until a search or another update is made.
Is this behaior for the purpose of performance improvement?
And, is there any way to make it always up to date?
#Update - How I obverse this?
I'm using logstash to import data into es.
In the browser I have opened http://localhost:9200/_cat/indices?v.
After each import, I refresh the browser page, usually it changes.
After the logstash finish, and I terminate it, the count in the page is less than the count from source db (e.g mysql).
Then I refresh the page again and again, it won't change.
But, as I send a query request in postman to query the es index, then refresh again, the docs.count changed, the total count become the same as in the source db.
So, I'm summarizing following behavior:
At first, the docs.count do update after each import (aka. insert).
But, as importing continues for a while, without querying on the index, the page's docs.count stopped updating.
Then, a query on index will force docs.count update to the correct number.
After that, the above steps will repeat. It does look like some kind of delay until necessary optimization.
And, the index setting from http://localhost:9200/xxx/_settings:
(as requested from comment):
{
"xxx" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"provided_name" : "xxx",
"creation_date" : "1602844600812",
"analysis" : {
"analyzer" : {
"default_search" : {
"type" : "ik_max_word"
},
"default" : {
"type" : "ik_max_word"
}
}
},
"number_of_replicas" : "0",
"uuid" : "qLFMHhyBQNOOs1u_EcJbBg",
"version" : {
"created" : "7090299"
}
}
}
}
}
same issue on the ES version v7.9.3
from ES official docs:
To get an accurate count of Elasticsearch documents, use the cat count
or count APIs
the cat count API is accurate on my ES cluster.
GET _cat/count/log-uwsgi-2021?v
epoch
timestamp
count
1638855942
05:45:42
500
last doc.count will be shown when a refresh occurred.
it will refresh periodic base on refresh.interval setting.
from documention: Elasticsearch periodically refreshes indices every second, but only on indices that have received one search request or more in the last 30 seconds.

Attempting to delete all the data for an Index in Elasticsearch

I am trying to delete all the documents, i.e. data from an index. I am using v6.6 along with the dev tools in Kibana.
In the past, I have done this operation successfully but now it is saying 'not found'
{
"_index" : "new-index",
"_type" : "doc",
"_id" : "_query",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 2,
"failed" : 0
},
"_seq_no" : 313,
"_primary_term" : 7
}
Here is my kibana statement
DELETE /new-index/doc/_query
{
"query": {
"match_all": {}
}
}
Also, the index GET operation which verified the index has data and exists:
GET new-index/doc/_search
I verified the type is doc but I can post the whole mapping, if needed.
Easier way is to navigate in Kibana to Management->Elasticsearch index mapping then select indexes you would like to delete via checkboxes, and click on Manage index -> delete index or flush index depending on your need.
I was able to resolve the issue by using a delete by query:
POST new-index/_delete_by_query
{
"query": {
"match_all": {}
}
}
Delete documents is a problematic way to clear data.
Preferable delete index:
DELETE [your-index]
From kibana console.
And recreate from scratch.
And more preferable way is to make a template for an index that creates index as well with the first indexed document.
Only solutions currently are to either delete the index itself (faster), or delete-by-query (slower)
https://www.elastic.co/guide/en/elasticsearch/reference/7.4/docs-delete-by-query.html
POST new-index/_delete_by_query?conflicts=proceed
{
"query": {
"match_all": {}
}
}
Delete API only removes a single document https://www.elastic.co/guide/en/elasticsearch/reference/7.4/docs-delete.html
My guess is that someone changed a field's name and now the DB (NoSQL) and Elasticsearch string name for that field doesn't match. So Elasticsearch tried to delete that field, but the field was "not found".
It's not an error I would lose sleep over.

how to copy ElasticSearch field to another field

I have 100GB ES index now. Right now I need to change one field to multi-fields, such as: username to username.username and username.raw (not_analyzed). I know it will apply to the incoming data. But how can I make this change affect on the old data? Should I using index scroll to copy the whole index to a new one, Or there is a better solution to just copy one filed please.
There's a way to achieve this without reindexing all your data by using the update by query plugin.
Basically, after installing the plugin, you can run the following query and all your documents will get the multi-field re-populated.
curl -XPOST 'localhost:9200/your_index/_update_by_query' -d '{
"query" : {
"match_all" : {}
},
"script" : "ctx._source.username = ctx._source.username;"
}'
It might take a while to run on 100GB docs, but after this runs, the username.raw field will be populated.
Note: for this plugin to work, one needs to have scripting enabled.
POST index/type/_update_by_query
{
"query" : {
"match_all" : {}
},
"script" :{
"inline" : "ctx._source.username = ctx._source.username;",
"lang" : "painless"
}
}
This worked for me on es 5.6, above one did not!

Elasticsearch querying alias with routing giving partial results

In an effort to create multi-tenant architecture for my project.
I've created an elasticsearch cluster with an index 'tenant'
"tenant" : {
"some_type" : {
"_routing" : {
"required" : true,
"path" : "tenantId"
},
Now,
I've also created some aliases -
"tenant" : {
"aliases" : {
"tenant_1" : {
"index_routing" : "1",
"search_routing" : "1"
},
"tenant_2" : {
"index_routing" : "2",
"search_routing" : "2"
},
"tenant_3" : {
"index_routing" : "3",
"search_routing" : "3"
},
"tenant_4" : {
"index_routing" : "4",
"search_routing" : "4"
}
I've added some data with tenantId = 2
After all that, I tried to query 'tenant_2' but I only got partial results, while querying 'tenant' index directly returns with the full results.
Why's that?
I was sure that routing is supposed to query all the shards that documents with tenantId = 2 resides on.
When you have created aliases in elasticsearch, you have to do all operations using aliases only. Be it indexing, update or search.
Try reindexing the data again and check if possible (If it is a test index, I hope so).
Remove all the indices.
curl -XDELETE 'localhost:9200/' # Warning:!! Dont use this in production.
Use this command only if it is test index.
Create the index again. Create alias again. Do all the indexing, search and delete operations on alias name. Even the import of data should also be done via alias name.

Index huge data into Elasticsearch

I am new to elasticsearch and have huge data(more than 16k huge rows in the mysql table). I need to push this data into elasticsearch and am facing problems indexing it into it.
Is there a way to make indexing data faster? How to deal with huge data?
Expanding on the Bulk API
You will make a POST request to the /_bulk
Your payload will follow the following format where \n is the newline character.
action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
...
Make sure your json is not pretty printed
The available actions are index, create, update and delete.
Bulk Load Example
To answer your question, if you just want to bulk load data into your index.
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
The first line contains the action and metadata. In this case, we are calling create. We will be inserting a document of type type1 into the index named test with a manually assigned id of 3 (instead of elasticsearch auto-generating one).
The second line contains all the fields in your mapping, which in this example is just field1 with a value of value3.
You will just concatenate as many as these as you'd like to insert into your index.
This may be an old thread but I though I would comment anyway for anyone who is looking for a solution to this problem. The JDBC river plugin for Elastic Search is very useful for importing data from a wide array of supported DB's.
Link to JDBC' River source here..
Using Git Bash' curl command I PUT the following configuration document to allow for communication between ES instance and MySQL instance -
curl -XPUT 'localhost:9200/_river/uber/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"strategy" : "simple",
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/elastic",
"user" : "root",
"password" : "root",
"sql" : "select * from tbl_indexed",
"poll" : "24h",
"max_retries": 3,
"max_retries_wait" : "10s"
},
"index": {
"index": "uber",
"type" : "uber",
"bulk_size" : 100
}
}'
Ensure you have the mysql-connector-java-VERSION-bin in the river-jdbc plugin directory which contains jdbc-river' necessary JAR files.
Try bulk api
http://www.elasticsearch.org/guide/reference/api/bulk.html

Resources