ILM does support on document level in elasticsearch? - elasticsearch

I have created policy and applied policy on indices .
This policy allows us to set a expiry time for a document. Once the time has past, the expired documents are deleted.
Is it possible with latest version of Elasticsearch ?

As far as I know ILM deletes the entire index and criteria will be the duration of days since the index has been created. If you need to delete a specific document I feel you will need to leverage API to implement this. You can setup a cronjob which uses the below script to delete a specific doc.
curl -k -X POST "https://USERNAME:PASSWORD#localhost:9200/test/_delete_by_query?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"filter": [
{
"range": {
"#timestamp": {
"lt": "now-30d"
}
}
}
]
}
}
}
Reference : https://discuss.elastic.co/t/automatically-delete-older-documents/247078/9

Related

Get data from only the latest Elastic index in Grafana

I have a series of indexes in Elastic, myindex-YYYY.MM.DD. In a Grafana panel, I want to read data only from the latest such index each time. I have created a datasource [myindex-]YYYY.MM.DD with pattern Daily, but this reads from all indexes. I can't find out whether limiting to the latest index should be done in the data source or in the panel options.
An alternative could be to filter the documents so that I get only those whose #timestamp equals the max #timestamp, but I can't figure out this either. I can get the max #timestamp with this:
GET /myindex-*/_search
{
"size": 0,
"aggs": {
"max_timestamp": { "max": { "field": "#timestamp" } }
}
}
I’d need to save the result in a variable and use it in another query, but I can’t find a way to do this in Grafana.
My conclusion (from reading whatever I could find and from the absence of answers to this question) is that what I want is not possible to do directly. I ended up creating a myindex-latest alias to the latest of the myindex-YYYY.MM.DD series. I did this by running a script similar to the following (in my case it's being run by Logstash after creation of myindex-YYYY.MM.DD finishes):
#!/bin/bash
#
# This script creates elastic alias myindex-latest for the index
# myindex-YYYY.MM.DD, where YYYY.MM.DD is the current date.
curdate=`date +%Y.%m.%d`
read -r -d '' JSON <<EOF1
{
"actions": [
{
"remove": {
"index": "*",
"alias": "myindex-latest"
}
},
{
"add": {
"index": "myindex-$curdate",
"alias": "myindex-latest"
}
}
]
}
EOF1
curl -X POST \
-H "Content-Type: application/json" \
"http://es01:9200/_aliases" \
-d "$JSON"

elasticsearch : How can i tell _reindex api in to continue indexing docs while source index still receiving new docs?

I have daily created indices, these indices are filled by an agent which collects a logs every second of the day, and i'am reindexing them (by field) to new indices using _reindex api.
How can i tell _reindex api to still reindixing while the source index still receiving new documents ?
Any help woould be really appriciated!
Thank you
you cannot force reindex API to be online to reindex new received documents.
but I have solution. you can add a date field (index_time) to your source index. write an hourly cron job to run reindex API with a query to get last hour indexed docs via index_time.
POST _reindex
{
"source": {
"index": "my-index-000001",
"query": {
"filter" :{
"query": {
"range": {
"index_time": {"gte" : "now-1h"}
}
}
}
}
},
"dest": {
"index": "my-new-index-000001"
}
}

Delete Data from Elasticsearch Maxed out index

I have an single Elasticsearch index which has maximum number of documents. i want to delete some data from this index. i have tried running a delete_by_query but it fails. any sugestions for this, which i would able to keep the index and its data and also delete some old data from the same index ?
This is the request i was running.
curl -X GET "1localhost:9200/produdtion-index/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"bool": {
"must": [
{ "range": { "submitDate": { "lte": "20200421" } } }
]
}
}
}
'
Try index life cycle managementhttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-lifecycle-management.html
You can split the index based on the number of documents and then delete them

How to copy some ElasticSearch data to a new index

Let's say I have movie data in my ElasticSearch and I created them like this:
curl -XPUT "http://192.168.0.2:9200/movies/movie/1" -d'
{
"title": "The Godfather",
"director": "Francis Ford Coppola",
"year": 1972
}'
And I have a bunch of movies from different years. I want to copy all the movies from a particular year (so, 1972) and copy them to a new index of "70sMovies", but I couldn't see how to do that.
Since ElasticSearch 2.3 you can now use the built in _reindex API
for example:
POST /_reindex
{
"source": {
"index": "twitter"
},
"dest": {
"index": "new_twitter"
}
}
Or only a specific part by adding a filter/query
POST /_reindex
{
"source": {
"index": "twitter",
"query": {
"term": {
"user": "kimchy"
}
}
},
"dest": {
"index": "new_twitter"
}
}
Read more: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
The best approach would be to use elasticsearch-dump tool https://github.com/taskrabbit/elasticsearch-dump.
The real world example I used :
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=mapping
elasticdump \
--input=http://localhost:9700/.kibana \
--output=http://localhost:9700/.kibana_read_only \
--type=data
Check out knapsack:
https://github.com/jprante/elasticsearch-knapsack
Once you have the plugin installed and working, you could export part of your index via query. For example:
curl -XPOST 'localhost:9200/test/test/_export' -d '{
"query" : {
"match" : {
"myfield" : "myvalue"
}
},
"fields" : [ "_parent", "_source" ]
}'
This will create a tarball with only your query results, which you can then import into another index.
To reindex specific type from source index to destination index type syntax is
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
If the intent were to copy some portion of the data or the entire data to an index with the same settings/mappings as that of the original index one could use the clone api to achieve the same. Something like below:
POST /<index>/_clone/<target-index>
OR
PUT /<index>/_clone/<target-index>
However if the intent is to copy the data to a new index with the different settings/mappings than the original index one could use the reindex api to achieve the same. Something like below:
POST _reindex/
{
"source": {
"index": "source_index",
"type": "source_type",
"query": {
// add filter criteria
}
},
"dest": {
"index": "dest_index",
"type": "dest_type"
}
}
*Note: In case of reindex api the target index has to be created prior to actual api call.
For further reading on difference between clone and reindex refer What's the difference between cloning and reindexing an index in Elasticsearch?
You can do it easily with elasticsearch-dump (https://github.com/taskrabbit/elasticsearch-dump) in three steps. In the following example I copy the index "thor" to "thor2"
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=analyzer
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=mapping
elasticdump --input=http://localhost:9200/thor --output=http://localhost:9200/thor2 --type=data
Well the straightforward way to do this is to write code, with the API of your choice, querying for "year": 1972 and then indexing that data into a new index. You would use the Search api or the Scan and Scroll API to get all the documents and then either index them one by one or use the Bulk Api:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-search.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-scroll.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-bulk.html
Assuming you don't want to do this via code but are looking for a direct way of doing this, I suggest the Elasticsearch Snapshot and Restore. Basically you would take a snapshot of your existing index, restore it into a new index and then use the Delete command to delete all documents with a year other than 1972.
Snapshot And Restore
The snapshot and restore module allows to create snapshots of
individual indices or an entire cluster into a remote repository. At
the time of the initial release only shared file system repository was
supported, but now a range of backends are available via officially
supported repository plugins.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
Delete By Query API
The delete by query API allows to delete documents from one or more
indices and one or more types based on a query. The query can either
be provided using a simple query string as a parameter, or using the
Query DSL defined within the request body.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-delete-by-query.html
Since v7.4 the _clone api was introduced and can easily satisfy your need: (read for the relevant prerequisites and monitoring involved)
POST /<index>/_clone/<target-index>
Or:
PUT /<index>/_clone/<target-index>
You can use elasticdump --searchBody:
# Copy documents from movies to 70sMovies (filtering using query)
elasticdump \
--input=http://localhost:9200/movies \
--output=http://localhost:9200/70sMovies \
--type=data \
--searchBody="{\"query\":{\"term\":{\"username\": \"admin\"}}}" # <--- Your query here
more on elasticdump options here.

Delete all documents from index/type without deleting type

I know one can delete all documents from a certain type via deleteByQuery.
Example:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"term" : { "user" : "kimchy" }
}
}'
But i have NO term and simply want to delete all documents from that type, no matter what term. What is best practice to achieve this? Empty term does not work.
Link to deleteByQuery
I believe if you combine the delete by query with a match all it should do what you are looking for, something like this (using your example):
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d '{
"query" : {
"match_all" : {}
}
}'
Or you could just delete the type:
curl -XDELETE http://localhost:9200/twitter/tweet
Note: XDELETE is deprecated for later versions of ElasticSearch
The Delete-By-Query plugin has been removed in favor of a new Delete By Query API implementation in core. Read here
curl -XPOST 'localhost:9200/twitter/tweet/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
From ElasticSearch 5.x, delete_by_query API is there by default
POST: http://localhost:9200/index/type/_delete_by_query
{
"query": {
"match_all": {}
}
}
You can delete documents from type with following query:
POST /index/type/_delete_by_query
{
"query" : {
"match_all" : {}
}
}
I tested this query in Kibana and Elastic 5.5.2
Torsten Engelbrecht's comment in John Petrones answer expanded:
curl -XDELETE 'http://localhost:9200/twitter/tweet/_query' -d
'{
"query":
{
"match_all": {}
}
}'
(I did not want to edit John's reply, since it got upvotes and is set as answer, and I might have introduced an error)
Starting from Elasticsearch 2.x delete is not anymore allowed, since documents remain in the index causing index corruption.
Since ElasticSearch 7.x, delete-by-query plugin was removed in favor of new Delete By Query API.
The curl option:
curl -X POST "localhost:9200/my-index/_delete_by_query" -H 'Content-Type: application/json' -d' { "query": { "match_all":{} } } '
Or in Kibana
POST /my-index/_delete_by_query
{
"query": {
"match_all":{}
}
}
The above answers no longer work with ES 6.2.2 because of Strict Content-Type Checking for Elasticsearch REST Requests. The curl command which I ended up using is this:
curl -H'Content-Type: application/json' -XPOST 'localhost:9200/yourindex/_doc/_delete_by_query?conflicts=proceed' -d' { "query": { "match_all": {} }}'
In Kibana Console:
POST calls-xin-test-2/_delete_by_query
{
"query": {
"match_all": {}
}
}
(Reputation not high enough to comment)
The second part of John Petrone's answer works - no query needed. It will delete the type and all documents contained in that type, but that can just be re-created whenever you index a new document to that type.
Just to clarify:
$ curl -XDELETE 'http://localhost:9200/twitter/tweet'
Note: this does delete the mapping! But as mentioned before, it can be easily re-mapped by creating a new document.
Note for ES2+
Starting with ES 1.5.3 the delete-by-query API is deprecated, and is completely removed since ES 2.0
Instead of the API, the Delete By Query is now a plugin.
In order to use the Delete By Query plugin you must install the plugin on all nodes of the cluster:
sudo bin/plugin install delete-by-query
All of the nodes must be restarted after the installation.
The usage of the plugin is the same as the old API. You don't need to change anything in your queries - this plugin will just make them work.
*For complete information regarding WHY the API was removed you can read more here.
You have these alternatives:
1) Delete a whole index:
curl -XDELETE 'http://localhost:9200/indexName'
example:
curl -XDELETE 'http://localhost:9200/mentorz'
For more details you can find here -https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-delete-index.html
2) Delete by Query to those that match:
curl -XDELETE 'http://localhost:9200/mentorz/users/_query' -d
'{
"query":
{
"match_all": {}
}
}'
*Here mentorz is an index name and users is a type
I'm using elasticsearch 7.5 and when I use
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -d'
{
"query": {
"match_all": {}
}
}'
which will throw below error.
{
"error" : "Content-Type header [application/x-www-form-urlencoded] is not supported",
"status" : 406
}
I also need to add extra -H 'Content-Type: application/json' header in the request to make it works.
curl -XPOST 'localhost:9200/materials/_delete_by_query?conflicts=proceed&pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"match_all": {}
}
}'
{
"took" : 465,
"timed_out" : false,
"total" : 2275,
"deleted" : 2275,
"batches" : 3,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
Just to add couple cents to this.
The "delete_by_query" mentioned at the top is still available as a plugin in elasticsearch 2.x.
Although in the latest upcoming version 5.x it will be replaced by
"delete by query api"
Elasticsearch 2.3 the option
action.destructive_requires_name: true
in elasticsearch.yml do the trip
curl -XDELETE http://localhost:9200/twitter/tweet
For future readers:
in Elasticsearch 7.x there's effectively one type per index - types are hidden
you can delete by query, but if you want remove everything you'll be much better off removing and re-creating the index. That's because deletes are only soft deletes under the hood, until the trigger Lucene segment merges*, which can be expensive if the index is large. Meanwhile, removing an index is almost instant: remove some files on disk and a reference in the cluster state.
* The video/slides are about Solr, but things work exactly the same in Elasticsearch, this is Lucene-level functionality.
If you want to delete document according to a date.
You can use kibana console (v.6.1.2)
POST index_name/_delete_by_query
{
"query" : {
"range" : {
"sendDate" : {
"lte" : "2018-03-06"
}
}
}
}

Resources