How to really delete document of a certain type in elasticsearch - elasticsearch

I would like to delete all of my document of _type=varnish-request on my elasticsearch.
I installed the delete by query plugin (https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/plugins-delete-by-query.html)
I did DELETE http://localhost:9200/logstash*/_query
{
"query": {
"bool": {
"must": [
{ "match": {"_type":"varnish-request"}},
{ "match": {"_index":"logstash-2016.02.05"}}
]
}
}
}
And it's OK
{"took":2265842,"timed_out":false,"_indices":{"_all":{"found":3062614,"deleted":3062614,"missing":0,"failed":0},"logstash-2016.02.05":
{"found":3062614,"deleted":3062614,"missing":0,"failed":0}},"failures":[]}
curl http://localhost:9200/_cat/indices | sort
Before the clean
yellow open logstash-2016.02.05 5 1 4618245 0 4.1gb 4.1gb
After the clean
yellow open logstash-2016.02.05 5 1 1555631 3062605 4.1gb 4.1gb
The whole point is to 'light' my ES server by removing useless data. But here I see that the index size is still the same.
I already check Delete documents of type in Elasticsearch but no luck
I try with elasticsearch: how to free store size after deleting documents
POST http://localhost:9200/logstash-2016.02.05/_forcemerge
{"_shards":{"total":10,"successful":5,"failed":0}}
But still
yellow open logstash-2016.02.05 5 1 1555631 3062605 4.1gb 4.1gb

The first step is correct. Now you simply need to call _optimize (or _forcemerge if you're using ES 2.1+) by enabling only_expunge_deletes. This will delete the segments with deleted documents and free some space.
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'
or
curl -XPOST 'http://localhost:9200/_forcemerge?only_expunge_deletes=true'

Related

Restore elasticsearch cluster onto another cluster

Hello i have 3 node elasticsearch cluster ( source ) and i have snapshot called
snapshot-1 which taken from source cluster
and i have another 6 node elasticsearch cluster ( destination ) cluster
and when i restore my destinatition cluster from snapshot-1 using this command
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "*",
> "ignore_unavailable": true,
> "include_global_state": false,
> "rename_pattern": ".security(.+)",
> "rename_replacement": "delete_$1",
> "include_aliases": false
> }
> '
{
and i got this error
"error" : {
"root_cause" : [
{
"type" : "snapshot_restore_exception",
"reason" : "[snapshot:snapshot-1 yjg/mHsYhycHQsKiEhWVhBywxQ] cannot restore index [.ilm-history-0003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
so as you can see the index .ilm-history-0003 already exists in the cluster, but how can i do rename replacement for security,.ilm,.slm,.transfrom indices using only 1 rename_pattern?
like this one
"rename_pattern": ".security(.+)",
From my experiences the rename pattern doesn't need to be super fancy because you will probably
a) delete the index (as your renaming pattern suggests) or
b) reindex data from the restored index to new indices. In this case the naming of the restored index is insignificant.
So this is what I would suggest:
Use the following renaming pattern to include all indices. Again, from my experience, your first aim is to get the old data restored. After that you have to manage the reindexing etc.
POST /_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore
{
"indices": "*",
"ignore_unavailable": true,
"include_aliases": false,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}
This will prepend restored_ to the actual index name resulting in the following restored indices:
restored_security
restored_.ilm*
restored_.slm*
restored_.transfrom*
I hope I could help you.
solve it using this way
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
with response:
{
"indices": "*,-.slm*,-,ilm*,-.transfrom*,-security*",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false
}

Getting error index.max_inner_result_window during rolling upgrade of ES from 5.6.10 to 6.8.10

I have 2 data nodes and 3 master nodes in an ES cluster. I was doing a rolling upgrade as ES suggested moving from 5.6.10 to 6.8.10.
As there should be zero downtime, I was testing that and getting one error.
I have upgraded the 1 data node and do basic search testing. It is working fine. When I have upgraded 2nd node search is breaking with the below Error.
java.lang.IllegalArgumentException: Top hits result window is too large, the top hits aggregator [top]'s from + size must be less than or equal to: [100] but was [999]. This limit can be set by changing the [index.max_inner_result_window] index level setting.
index.max_inner_result_window -- This property was introduced in the 6.X version, and the master node is still on 5.6.10. So what will be the solution with 0 downtimes?
Note: My indexing is stopped completely. My 2 data nodes are now on 6.8.10 and master nodes are on 5.6.
Thanks
1 - Change the parameter on current indexes:
curl -X PUT "http://localhost:9200/_all/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index.max_inner_result_window": "2147483647"
}
'
2 - Create a template to further indexes:
curl -X PUT "http://localhost:9200/_index_template/template_max_inner_result?pretty" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["*"],
"template": {
"settings": {
"index":{
"max_inner_result_window": 2147483647
}
}
}
}
'

Deleting index from ElasticSearch (via Kibana) automatically being recreated?

I have created an ElasticSearch instance via AWS and have pushed some test data into it in order to play around with Kibana. I'm done playing around now and want to delete all my data and start again. I have run a delete command on my index:
Command
DELETE /uniqueindex
Response
{
"acknowledged" : true
}
However almost immediately my index seems to re-appear and documents start appearing in the count of documents as well.
Command
GET /_cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 e3LQWRvgSvqSL8CFTyw_SA 1 0 3 0 15.2kb 15.2kb
yellow open uniqueindex Y4tlNxAXQVKUs_DjVQLNnA 5 1 713 0 421.7kb 421.7kb
It's like it's auto generating after the delete. Clearly a setting or something, but being new to ElasticSearch/Kibana I'm not sure what I'm missing.
By default indices in Elasticsearch can be created automatically just by PUTing or POSTing a document.
You can change this behavior with action.auto_create_index where you can disable this entirely (indices need to be created with a PUT command) or just whitelist specific indices.
Quoting from the linked docs:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "twitter,index10,-index1*,+ind*"
}
}
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}
+ is allowing automatic index creation while - forbids it.

Backup and restore some records of an elasticsearch index

I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')

Can anyone give a list of REST APIs to query elasticsearch?

I am trying to push my logs to elasticsearch through logstash.
My logstash.conf have 2 log files as input; elasticsearch as output; and grok as filter. Here is my grok match:
grok {
match => [ "message", "(?<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}
[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})
(?:\[%{GREEDYDATA:caller_thread}\]) (?:%{LOGLEVEL:level})
(?:%{DATA:caller_class})(?:\-%{GREEDYDATA:message})" ]
}
When elasticsearch is started, all my logs are added to elasticsearch server with seperate index name as mentioned in logstash.conf.
My doubt is that how my logs are stored in elasticsearch? I only know that it is stored with the index name as mentioned in logstash.
'http://164.99.178.18:9200/_cat/indices?v' API given me the following:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open tomcat-log 5 1 6478 0 1.9mb 1.9mb
yellow open apache-log 5 1 212 0 137kb
137kb
But, how 'documents', 'fields' are created in elasticsearch for my logs.
I read that elasticsearch is REST based search engine. So, if there any REST APIs that I could use to analyze my data in elasticsearch.
Indeed.
curl localhost:9200/tomcat-log/_search
Will give you back the first 10 documents but also the total number of docs in your index.
curl localhost:9200/tomcat-log/_search -d '{
"query": {
"match": {
"level" : "error"
}
}
}'
might gives you all docs in tomcat-log which have level equal to error.
Have a look at this section of the book. It will help.

Resources