I have created an ElasticSearch instance via AWS and have pushed some test data into it in order to play around with Kibana. I'm done playing around now and want to delete all my data and start again. I have run a delete command on my index:
Command
DELETE /uniqueindex
Response
{
"acknowledged" : true
}
However almost immediately my index seems to re-appear and documents start appearing in the count of documents as well.
Command
GET /_cat/indices?v
Response:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .kibana_1 e3LQWRvgSvqSL8CFTyw_SA 1 0 3 0 15.2kb 15.2kb
yellow open uniqueindex Y4tlNxAXQVKUs_DjVQLNnA 5 1 713 0 421.7kb 421.7kb
It's like it's auto generating after the delete. Clearly a setting or something, but being new to ElasticSearch/Kibana I'm not sure what I'm missing.
By default indices in Elasticsearch can be created automatically just by PUTing or POSTing a document.
You can change this behavior with action.auto_create_index where you can disable this entirely (indices need to be created with a PUT command) or just whitelist specific indices.
Quoting from the linked docs:
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "twitter,index10,-index1*,+ind*"
}
}
PUT _cluster/settings
{
"persistent": {
"action.auto_create_index": "false"
}
}
+ is allowing automatic index creation while - forbids it.
Related
I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')
I have got a index with 5 primary shards and no replicas.
One of my shard(shard 1) is in unassigned state. When i checked the log file, i found out below error:
2obv65.nvd, _2vfjgt.fdx, _3e3109.si, _3dwgm5_Lucene45_0.dvm, _3aks2g_Lucene45_0.dvd, _3d9u9f_76.del, _3e30gm.cfs, _3cvkyl_es090_0.tim, _3e309p.nvd, _3cvkyl_es090_0.blm]]; nested: FileNotFoundException[_101a65.si]; ]]
When i checked the index, i could not find the 101a65.si file for the shard 1.
I am unable to locate the missing .si file. I tried a lot but could not assign the shard 1 again.
Is there any other way to make the shard 1 assign again? or do i need to delete the entire shard 1 data?
Please suggest.
Normally in the stack trace you should see the path to the corrupted shard, something like MMapIndexInput(path="path/to/es/db/nodes/node_number/indices/name_of_index/1/index/some_file) (here the 1 is the shard number)
Normally deleting path/to/es/db/nodes/node_number/indices/name_of_index/1 should help the shard recover. If you still see it unassigned try sending this command to your cluster (normally as per the documentation, it should work, though I'm not sure about ES 1.x syntax and commands):
POST _cluster/reroute
{
"commands" : [
{
"allocate" : {
"index" : "myIndexName",
"shard" : 1,
"node" : "myNodeName",
"allow_primary": true
}
}
]
}
I have create several indexes in an elasticsearch. The result of the http://localhost:9200/_cat/indices?v indicate as follows.
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open idx_post:user:5881bc8eb46a870d958d007a 5 1 6 0 31.3kb 31.3kb
yellow open idx_usr 5 1 2 0 14.8kb 14.8kb
yellow open idx_post:group:587dc2b57a8bee13bdab6592 5 1 1 0 6.1kb 6.1kb
I am new to elasticsearch and I tried to filter data by using the index idx_usr.
http://localhost:9200/idx_usr/_search?pretty=1
it is working fine and it returns the expected values. But when I am trying to grab data by using the index idx_post it returns an Error saying the index is not found.
http://localhost:9200/idx_post/_search?pretty=1
This gives the result:
{
"error" : "IndexMissingException[[idx_post] missing]",
"status" : 404
}
If anyone can identify the reason. I would be really thankful.
Update::
This is how do I create the index. Assume that there is an object call data. And I am using an ES client call this.esClient
var _esData = {
index: "idx_post:group"+data.group_id;
type:'posts',
id:data.post_id.toString()
};
_esData['body'] = data;
this.esClient.index(_esData, function (error, response) {
if(error)
console.log(error);
callBack(response);
});
You can get the results you want by querying your indexes like this with a wildcard
http://localhost:9200/idx_post*/_search?pretty=1
^
|
add this
Because index does not exists
{
"error" : "IndexMissingException[[idx_post] missing]",
"status" : 404
}
You try to search in idx_post while _cat returns name idx_post:user:5881bc8eb46a870d958d007a and idx_post:group:587dc2b57a8bee13bdab6592
I would like to delete all of my document of _type=varnish-request on my elasticsearch.
I installed the delete by query plugin (https://www.elastic.co/guide/en/elasticsearch/plugins/2.0/plugins-delete-by-query.html)
I did DELETE http://localhost:9200/logstash*/_query
{
"query": {
"bool": {
"must": [
{ "match": {"_type":"varnish-request"}},
{ "match": {"_index":"logstash-2016.02.05"}}
]
}
}
}
And it's OK
{"took":2265842,"timed_out":false,"_indices":{"_all":{"found":3062614,"deleted":3062614,"missing":0,"failed":0},"logstash-2016.02.05":
{"found":3062614,"deleted":3062614,"missing":0,"failed":0}},"failures":[]}
curl http://localhost:9200/_cat/indices | sort
Before the clean
yellow open logstash-2016.02.05 5 1 4618245 0 4.1gb 4.1gb
After the clean
yellow open logstash-2016.02.05 5 1 1555631 3062605 4.1gb 4.1gb
The whole point is to 'light' my ES server by removing useless data. But here I see that the index size is still the same.
I already check Delete documents of type in Elasticsearch but no luck
I try with elasticsearch: how to free store size after deleting documents
POST http://localhost:9200/logstash-2016.02.05/_forcemerge
{"_shards":{"total":10,"successful":5,"failed":0}}
But still
yellow open logstash-2016.02.05 5 1 1555631 3062605 4.1gb 4.1gb
The first step is correct. Now you simply need to call _optimize (or _forcemerge if you're using ES 2.1+) by enabling only_expunge_deletes. This will delete the segments with deleted documents and free some space.
curl -XPOST 'http://localhost:9200/_optimize?only_expunge_deletes=true'
or
curl -XPOST 'http://localhost:9200/_forcemerge?only_expunge_deletes=true'
I am trying to push my logs to elasticsearch through logstash.
My logstash.conf have 2 log files as input; elasticsearch as output; and grok as filter. Here is my grok match:
grok {
match => [ "message", "(?<timestamp>[0-9]{4}-[0-9]{2}-[0-9]{2}
[0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3})
(?:\[%{GREEDYDATA:caller_thread}\]) (?:%{LOGLEVEL:level})
(?:%{DATA:caller_class})(?:\-%{GREEDYDATA:message})" ]
}
When elasticsearch is started, all my logs are added to elasticsearch server with seperate index name as mentioned in logstash.conf.
My doubt is that how my logs are stored in elasticsearch? I only know that it is stored with the index name as mentioned in logstash.
'http://164.99.178.18:9200/_cat/indices?v' API given me the following:
health status index pri rep docs.count docs.deleted store.size pri.store.size
yellow open tomcat-log 5 1 6478 0 1.9mb 1.9mb
yellow open apache-log 5 1 212 0 137kb
137kb
But, how 'documents', 'fields' are created in elasticsearch for my logs.
I read that elasticsearch is REST based search engine. So, if there any REST APIs that I could use to analyze my data in elasticsearch.
Indeed.
curl localhost:9200/tomcat-log/_search
Will give you back the first 10 documents but also the total number of docs in your index.
curl localhost:9200/tomcat-log/_search -d '{
"query": {
"match": {
"level" : "error"
}
}
}'
might gives you all docs in tomcat-log which have level equal to error.
Have a look at this section of the book. It will help.