Index with ! in their name cant be filtered for recovering - elasticsearch

I have an ES cluster whith indices name like web.analytics.data.api!monthly!2018-07_v0 and doing regular snapshots/backups
Now, when I want to restore all of them, all works pretty well. If I want to restore just a specific index however, es wont do it. The command I use:
curl -X POST "localhost:9200/_snapshot/s3_backups/20191218_060001/_restore?pretty&wait_for_completion=true" -H 'Content-Type: application/json' -d'
{
"indices": "web.analytics.data.api!monthly!2018-07_v0",
"index_settings": {
"index.number_of_replicas": 0
}
}
'
The result I get is:
{
"snapshot" : {
"snapshot" : "20191218_060001",
"indices" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
}
Please note, that If I use index without ! in its name (e.g. .kibana), it works well. Any ideas of how I can solve that? Preferably without telling developers to rename the indices. The ES in question has version 1.7.3 I am aware it is EOL, but it is what I have to work with right now.

So it was my bad in the end. The index I got did not exist (typo in it) but I was told ! is problematic so i did not double check and the test indices were picked by me, so of course they were correct...

Related

Restore elasticsearch cluster onto another cluster

Hello i have 3 node elasticsearch cluster ( source ) and i have snapshot called
snapshot-1 which taken from source cluster
and i have another 6 node elasticsearch cluster ( destination ) cluster
and when i restore my destinatition cluster from snapshot-1 using this command
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
> {
> "indices": "*",
> "ignore_unavailable": true,
> "include_global_state": false,
> "rename_pattern": ".security(.+)",
> "rename_replacement": "delete_$1",
> "include_aliases": false
> }
> '
{
and i got this error
"error" : {
"root_cause" : [
{
"type" : "snapshot_restore_exception",
"reason" : "[snapshot:snapshot-1 yjg/mHsYhycHQsKiEhWVhBywxQ] cannot restore index [.ilm-history-0003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
}
so as you can see the index .ilm-history-0003 already exists in the cluster, but how can i do rename replacement for security,.ilm,.slm,.transfrom indices using only 1 rename_pattern?
like this one
"rename_pattern": ".security(.+)",
From my experiences the rename pattern doesn't need to be super fancy because you will probably
a) delete the index (as your renaming pattern suggests) or
b) reindex data from the restored index to new indices. In this case the naming of the restored index is insignificant.
So this is what I would suggest:
Use the following renaming pattern to include all indices. Again, from my experience, your first aim is to get the old data restored. After that you have to manage the reindexing etc.
POST /_snapshot/REPOSITORY_NAME/SNAPSHOT_NAME/_restore
{
"indices": "*",
"ignore_unavailable": true,
"include_aliases": false,
"include_global_state": false,
"rename_pattern": "(.+)",
"rename_replacement": "restored_$1"
}
This will prepend restored_ to the actual index name resulting in the following restored indices:
restored_security
restored_.ilm*
restored_.slm*
restored_.transfrom*
I hope I could help you.
solve it using this way
curl -X POST -u elastic:321 "192.168.2.15:9200/snapshot/snapshot_repository/snapshot-1/_restore?pretty" -H 'Content-Type: application/json' -d'
with response:
{
"indices": "*,-.slm*,-,ilm*,-.transfrom*,-security*",
"ignore_unavailable": true,
"include_global_state": false,
"include_aliases": false
}

Backup and restore some records of an elasticsearch index

I wish to take a backup of some records(eg latest 1 million records only) of an Elasticsearch index and restore this backup on a different machine. It would be better if this could be done using available/built-in Elasticsearch features.
I've tried Elasticsearch snapshot and restore (following code), but looks like it takes a backup of the whole index, and not selective records.
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump?pretty=true" -d '
{
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
curl -H 'Content-Type: application/json' -X PUT "localhost:9200/_snapshot/es_data_dump/snapshot1?wait_for_completion=true&pretty=true" -d '
{
"indices" : "index_name",
"type": "fs",
"settings": {
"compress" : true,
"location": "es_data_dump"
}
}'
The format of backup could be anything, as long as it can be successfully restored on a different machine.
you can use _reinex API. it can take any query. after reindex, you have a new index as backup, which contains requested records. easily copy it where ever you want.
complete information is here: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-reindex.html
In the end, I fetched the required data using python driver because that is what I found the easiest for the given use case.
For that, I ran an Elasticsearch query and stored its response in a file in newline-separated format and then I later restored data from it using another python script. A maximum of 10000 entries are returned this way along with the scroll ID to be used to fetch next 10000 entries and so on.
es = Elasticsearch(timeout=30, max_retries=10, retry_on_timeout=True)
page = es.search(index=['ct_analytics'], body={'size': 10000, 'query': _query, 'stored_fields': '*'}, scroll='5m')
while len(page['hits']['hits']) > 0:
es_data = page['hits']['hits'] #Store this as you like
scrollId = page['_scroll_id']
page = es.scroll(scroll_id=scrollId, scroll='5m')

Autocompletion elasticsearch

I'm following along with the tutorial for elasticsearch's completion suggester here. It's pretty easy to get going. But I'm unable to get completions for more than one word. In the example single incomplete words give great results, e.g
"Nir" -> "options":[{"text":"Nevermind Nirvana..."
"Nev" -> "options":[{"text":"Nevermind Nirvana..."
But the following fail:
"Nirvana Nev" -> Nothing!
"Nevermind Nir" -> Nothing!
I can get it to work by populating combinatorial options e.g
curl -X PUT "localhost:9200/music/_doc/1?refresh" -H 'Content-Type: application/json' -d'
{
"suggest" : {
"input": [ "Nevermind", "Nirvana", "Nirvana Nevermind", "Nevermind Nirvana" ],
"weight" : 34
},
"title" : "Nevermind by Nirvana"
}
'
But this approach will soon lead to massive variants of text added to the input.
There must be a better way?

AWS Datapipeline, EmrActivity step to run a hive script fails immediately with 'No such file or directory'

I've got a simple DataPipeline job which has only a single an EmrActivity with a single step attempting to execute a hive script from my s3 bucket.
The config for the EmrActivity looks like this:
{
"name" : "Extract and Transform",
"id" : "HiveActivity",
"type" : "EmrActivity",
"runsOn" : { "ref" : "EmrCluster" },
"step" : ["command-runner.jar,/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://[bucket-name-removed]/s1-tracer-hql.q -d INPUT=s3://[bucket-name-removed] -d OUTPUT=s3://[bucket-name-removed]"],
"runsOn" : { "ref": "EmrCluster" }
}
And the config for the corresponding EmrCluster resource it's running on:
{
"id" : "EmrCluster",
"type" : "EmrCluster",
"name" : "Hive Cluster",
"keyPair" : "[removed]",
"masterInstanceType" : "m3.xlarge",
"coreInstanceType" : "m3.xlarge",
"coreInstanceCount" : "2",
"coreInstanceBidPrice": "0.10",
"releaseLabel": "emr-4.1.0",
"applications": ["hive"],
"enableDebugging" : "true",
"terminateAfter": "45 Minutes"
}
The error message I'm getting is always the following:
java.io.IOException: Cannot run program "/usr/share/aws/emr/scripts/hive-script --run-hive-script --args -f s3://[bucket-name-removed]/s1-tracer-hql.q -d INPUT=s3://[bucket-name-removed] -d OUTPUT=s3://[bucket-name-removed]" (in directory "."): error=2, No such file or directory
at com.amazonaws.emr.command.runner.ProcessRunner.exec(ProcessRunner.java:139)
at com.amazonaws.emr.command.runner.CommandRunner.main(CommandRunner.java:13)
...
The main error msg being "... (in directory "."): error=2, No such file or directory".
I've logged into the master node and verified the existence of /usr/share/aws/emr/scripts/hive-script. I've also tried specifying an s3-based location for the hive-script, among a few other places; always the same error result.
I can manually create a cluster directly in EMR that looks exactly like what I'm specifying in this DataPipeline, with a Step that uses the identical "command-runner.jar,/usr/share/aws/emr/scripts/hive-script ..." command string, and it works without error.
Has anyone experienced this, and can advise me on what I'm missing and/or doing wrong? I've been at this one for awhile now.
I'm able to answer my own q, after some long research and try-error.
There were 3 things, maybe 4, wrong with my Step script:
needed the 'script-runner.jar', rather than the 'command-runner.jar', as we're running a script (which I ended up just pulling from EMR's libs dir on s3)
need to get the 'hive-script' from elsewhere - so, also went to the public EMR libs dir in s3 for this
a fun one, yay thanks AWS; for the Steps args (everything after the 'hive-script' specification)...need to comma-separate every value in it when in DataPipeline (as opposed to space-separating as you do when specifying args in a Step directly in EMR)
And then the "maybe 4th":
included the base folder in s3 and specific hive release we're working with for the hive-script (I added this as result of seeing something similar in an AWS blog, but haven't yet tested whether it makes a difference in my case, too drained with everything else)
So, in the end, my working EmrActivity ended looking like so:
{
"name" : "Extract and Transform",
"id" : "HiveActivity",
"type" : "EmrActivity",
"runsOn" : { "ref" : "EmrCluster" },
"step" : ["s3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar,s3://us-east-1.elasticmapreduce/libs/hive/hive-script,--base-path,s3://us-east-1.elasticmapreduce/libs/hive/,--hive-versions,latest,--run-hive-script,--args,-f,s3://[bucket-name-removed]/s1-tracer-hql.q,-d,INPUT=s3://[bucket-name-removed],-d,OUTPUT=s3://[bucket-name-removed],-d,LIBS=s3://[bucket-name-removed]"],
"runsOn" : { "ref": "EmrCluster" }
}
Hope this helps save someone else from the same time-sink I invested. Happy coding!

Couchdb view Queries

Could you please help me in creating a view. Below is the requirement
select * from personaccount where name="srini" and user="pup" order by lastloggedin
I have to send name and user as input to the view and the data should be sorted by lastloggedin.
Below is the view I have created but it is not working
{
"language": "javascript",
"views": {
"sortdatetimefunc": {
"map": "function(doc) {
emit({
lastloggedin: doc.lastloggedin,
name: doc.name,
user: doc.user
},doc);
}"
}
}
}
And this the curl command iam using:
http://uta:password#localhost:5984/personaccount/_design/checkdatesorting/_view/sortdatetimefunc?key={\"name:srini\",\"user:pup\"}
My Questions are
As sorting will be done on key and I want it on lastloggedin so I have given that also in emit function.
But iam passing name and user only as parameters. Do we need to pass all the parameters which we give it in key?
First of all I want to convey to you for the reply, I have done the same and iam getting errors. Please help
Could you please try this on your PC, iam posting all the commands :
curl -X PUT http://uta:password#localhost:5984/person-data
curl -X PUT http://uta:password#localhost:5984/person-data/srini -d '{"Name":"SRINI", "Idnum":"383896", "Format":"NTSC", "Studio":"Disney", "Year":"2009", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:44:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/raju -d '{"Name":"RAJU", "Idnum":"456787", "Format":"FAT", "Studio":"VFX", "Year":"2010", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:50:37+0100"}'
curl -X PUT http://uta:password#localhost:5984/person-data/vihar -d '{"Name":"BALA", "Idnum":"567876", "Format":"FAT32", "Studio":"YELL", "Year":"2011", "Rating":"PG", "lastTimeOfCall": "2012-02-08T19:55:37+0100"}'
Here's the view as you said I created :
{
"_id": "_design/persondestwo",
"_rev": "1-0d3b4857b8e6c9e47cc9af771c433571",
"language": "javascript",
"views": {
"personviewtwo": {
"map": "function (doc) {\u000a emit([ doc.Name, doc.Idnum, doc.lastTimeOfCall ], null);\u000a}"
}
}
}
I have fired this command from curl command :
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
I got this error :
[4] 3000
curl: (3) [globbing] error: bad range specification after pos 99
[5] 1776
[6] 2736
[3] Done descending=true
[4] Done(3) curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]
[5] Done endkey=["SRINI","383896"]
I am not knowing what this error is.
I have also tried passing the parameters the below way and it is not helping
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?key={\"Name\":\"SRINI\",\"Idnum\": \"383896\"}&descending=true
But I get different errors on escape sequences
Overall I just want this query to be satisfied through the view :
select * from person-data where Name="SRINI" and Idnum="383896" orderby lastTimeOfCall
My concern is how to pass the multiple parameters from curl command as I get lot of errors if I do the above way.
First off, you need to use an array as your key. I would use:
function (doc) {
emit([ doc.name, doc.user, doc.lastLoggedIn ], null);
}
This basically outputs all the documents in order by name, then user, then lastLoggedIn. You can use the following URL to query.
/_design/checkdatesorting/_view/sortdatetimefunc?startkey=["srini","pup"]&endkey=["srini","pup",{}]&include_docs=true
Second, notice I did not output doc as the value of your query. It takes up much more disk space, especially if your documents are fairly large. Just use include_docs=true.
Lastly, refer to the CouchDB Wiki, it's pretty helpful.
I just stumbled upon this question. The errors you are getting are caused by not escaping this command:
curl -X GET http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=["SRINI","383896"]&endkey=["SRINI","383896",{}]descending=true&include_docs=true
The & character has a special meaning on the command-line and should be escaped when part of an actual parameter.
So you should put quotes around the big URL, and escape the quotes inside it:
curl -X GET "http://uta:password#localhost:5984/person-data/_design/persondestwo/_view/personviewtwo?startkey=[\"SRINI\",\"383896\"]&endkey=[\"SRINI\",\"383896\",{}]descending=true&include_docs=true"

Resources