How to migrate Elasticsearch to a new server? - elasticsearch

The old server will remove, that Elasticsearch version is v6.8, the new server installed same version. Now, I'll migrate all data to the new server. Is my operation correct?
Old server: elasticsearch.yml add path.repo, for example
path.repo:["/data/backup"]
Restart Elasticsearch services
Old server
curl -H "Content-Type: application/json"
-XPUT http://192.168.50.247:9200/_snapshot/my_backup
-d '{ "type": "fs", "settings":
{ "location": "/data/backup","compress": true }}'
Create backup
curl -XPUT http://192.168.50.247:9200/_snapshot/my_backup/news0618
Restore database(new server ip:192.168.10.49 ):
curl -XPOST http://192.168.10.49:9200/_snapshot/my_backup/news0618/_restore
Are these operations can migrate all the data?

If you are using fs as a snapshot repository location then it will not work as your new instance is hosted on different host it will not have access to your old hosts file system. You need to use any shared location like volume mounts, S3 or Azure blob etc.
You should use reindexing rather than snapshot and restore. Its pretty simpler. Refer this link for remote reindexing:
Steps:
Whitelist remote host in elasticsearch.yaml using the reindex.remote.whitelist property in your new Elasticsearch instance:
reindex.remote.whitelist: "192.168.50.247:9200"
Restart new Elasticsearch instance.
Reindexing:
curl -X POST "http://192.168.10.49:9200/_reindex?pretty" -H 'Content-Type: application/json' -d'
{
"source": {
"remote": {
"host": "http://192.168.50.247:9200"
},
"index": "source-index-name",
"query": {
"match_all": {}
}
},
"dest": {
"index": "dest-index-name"
}
}
'
Refer this link for reindexing many indices
Warning: The destination should be configured as wanted before calling _reindex. Reindex does not copy the settings from the source or its associated template. Mappings, shard counts, replicas, and so on must be configured ahead of time.
Hope this helps!

Related

How to setup elasticsearch snapshot repository for multinode?

I setup a snapshot repository for one node ES cluster running inside a container.
version 7.7.1
path.repo=[/usr/share/elasticsearch/data/snapshot]
PUT /_snapshot/my_backup
{
"type": "fs",
"settings": {
"location": "/usr/share/elasticsearch/data/snapshot"
}
}
It works well.
However in a multinode cluster it fails with RepositoryVerificationException.
How should I change the above code to be able to use it?
I come across these sources but both of them unclear about what to do exactly:
https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-register-repository.html
https://discuss.elastic.co/t/snapshot-and-path-repo-on-one-cluster-node/155717

Elastic search snapshot restore another cluster

How to restore elastic search snapshot another cluster? without repository-s3, repository-hdfs, repository-azure, repository-gcs.
This answer is wrt Elastic Search 7.14. So, it is possible to host a snapshot repository on a NFS. Since, you would like to restore snapshot of one cluster to another, you would need to meet the following pre-requisites:
The NFS should be accessible from both source and destination cluster.
The version of the source and destination cluster should be the same. At most, the destination cluster can be 1 major version higher than the source cluster. Eg: you can restore a 5.x snaphot. to a 6.x cluster, but not a 7.x cluster.
Ensure that the shared NFS directory is owned by uid:gid = 1000:0 (elasticsearch user), and appropriate permissions are given (chmod -R 777 <appropriate NFS directory> as elasticsearch user)
Now, I am detailing the steps that you could take to copy the data.
Create a registry of type fs on the source cluster:
PUT http://10.29.61.189:9200/_snapshot/registry1
{
"type": "fs",
"settings": {
"location": "/usr/share/elasticsearch/snapshotrepo",
"compress": true
}
}
Take a snapshot on the created registry:
PUT http://10.29.61.189:9200/_snapshot/registry1/snapshot_1?wait_for_completion=true
{
"indices": "employee,manager",
"ignore_unavailable": true,
"include_global_state": false,
"metadata": {
"taken_by": "binita",
"taken_because": "test snapshot restore"
}
}
Create a registry of type url on the destination cluster. Type url will ensure that the same registry (in terms of the shared NFS path) will be read-only wrt the destination cluster. Destination cluster can only restore/read snapshot info, but can not write snapshots.
PUT http://10.29.59.165:9200/_snapshot/registry1
{
"type": "url",
"settings": {
"url": "file:/usr/share/elasticsearch/snapshotrepo"
}
}
Restore the snapshot generated from source cluster (in step no 2) to the destination cluster.
POST http://10.29.59.165:9200/_snapshot/registry1/snapshot_1/_restore
For more info, refer : https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshots-restore-snapshot.html
Finally i found the solution.it work fine.please read carefully and do.
if you have question contact me waruna94kithruwan#gmail.com.
I have two elastic search cluster.i want a migrate elastic_01 data to elastic_02.
i mean elastic_01 snapshot restore to elastic_02.let's go.
Importent
verify elastic_01 and elastic_02 has this folder "/home/snapshot/".
if not exist create this folder first.
set correct permission to this folder.
please verify elastic_01 and elatic_02 versions same or match.
[elasticsearch snapshot documentation]: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-snapshots.html
(01) set elastic_01 snapshot settings
$ curl -XPUT '/_snapshot/first_backup' -H 'Content-Type: application/json' -d '{
"type": "fs",
"settings": {
"location": "/home/snapshot/",
"compress": true
}
}'
(2) add snapshot location to elasticsearch.yml (elastic_01)
edit elasticsearch.yml file and add this code line and save.
$ path.repo: ["/home/snapshot/"]
(03) create snapshot (elastic_01)
$ curl -XPUT "/_snapshot/first_backup/snapshot_1?wait_for_completion=true"
(04) set elastic_02 snapshot settings
$ curl -XPUT '/_snapshot/first_backup' -H 'Content-Type: application/json' -d '{
"type": "fs",
"settings": {
"location": "/home/snapshot/",
"compress": true
}
}'
(05) add snapshot location to elasticsearch.yml (elastic_02)
edit elasticsearch.yml file and add this code line and save.
$ path.repo: ["/home/snapshot/"]
(06) create snapshot (elastic_02)
$ curl -XPUT "/_snapshot/first_backup/snapshot_1?wait_for_completion=true"
(07) copy elastic_01 snapshot to >>>> elastic_02
delete elastic_02 snapshot folder content $ rm -rf /home/snapshot/*
copy elastic_01 snapshot folder content to elastic_02 snapshot folder
(08) list snapshot
$ curl -XGET '/_snapshot/first_backup/_all?pretty'
it will show backup indexes and snapshot related data
(09) restore elastic search snapshot
$ curl -XPOST "/_snapshot/first_backup/snapshot_1/_restore?wait_for_completion=true"
NOTE: We need to make parameter "include_global_state" to "true" to restore the template as per link " https://www.elastic.co/guide/en/elasticsearch/client/curator/current/option_include_gs.html"
curl -X POST "localhost:9200/_snapshot/my_backup/snapshot_1/_restore?pretty" -H 'Content-Type: application/json' -d'
{
"include_global_state": true
}
'
{
"accepted" : true
}
Your idea is to first create a snapshot on nodeB, then delete its data and overwrite nodeA's data to this location?
But according to elastic's documentation, nodeB should mount the NFS directory in a read-only manner, so it does not have write permissions, such as using the type: url repository.
PUT _snapshot/local
{
"type": "url",
"settings": {
"url": "file:/home/esdata/snapshot"
}
}

Spark REST API, submit application NullPointerException on Windows

I used my PC as the Spark Server and at the same time as the Spark Worker, using Spark 2.3.1.
At first, I used my Ubuntu 16.04 LTS.
Everything works fine, I tried to run the SparkPi example (using spark-submit and spark-shell)and it is able to run without problem.
I also try to run it using REST API from Spark, with this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:/home/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
After testing this and that, I have to move to Windows, since it is will be done on Windows anyway.
I able to run the server and worker (manually), add the winutils.exe, and run the SparkPi example also using spark-shell and spark-submit, everything able to run too.
The problem is when I used the REST API, using this POST string:
curl -X POST http://192.168.1.107:6066/v1/submissions/create --header "Content-Type:application/json" --data '{
"action": "CreateSubmissionRequest",
"appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"clientSparkVersion": "2.3.1",
"appArgs": [ "10" ],
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass": "org.apache.spark.examples.SparkPi",
"sparkProperties": {
"spark.jars": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar",
"spark.driver.supervise":"false",
"spark.executor.memory": "512m",
"spark.driver.memory": "512m",
"spark.submit.deployMode":"cluster",
"spark.app.name": "SparkPi",
"spark.master": "spark://192.168.1.107:7077"
}
}'
Only the path is a little different, but my worker always failed.
The logs said:
"Exception from the cluster: java.lang.NullPointerException
org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:151)
org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scal173)
org.apache.spark.deploy.worker.DriverRunner$$anon$1.run(DriverRunner.scala:92)"
I searched but no solutions has come yet..
So, finally I found the cause.
I read the source from:
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/DriverRunner.scala
From inspecting it, I conclude that the problem is not from Spark, but the parameter is not being read correctly. Which means somehow, I put wrong parameter format.
So, after trying out several things, this one is the right one :
appResource": "file:D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
changed to:
appResource": "file:///D:/Workspace/Spark/spark-2.3.1-bin-hadoop2.7/examples/jars/spark-examples_2.11-2.3.1.jar"
And I did the same with spark.jars param.
That little differences had cost me almost 24 hours work... ~~~~

How to remove orphaned tasks in Apache Mesos?

The problem maybe caused by Mesos and Marathon out of sync, but the solution mentioned on GitHub doesn't work for me.
When I found the orphaned tasks:
What I do is:
restart Marathon
Marathon does not sync orphaned tasks, but start new tasks.
Orphaned tasks still took the resources, so I have to delete them.
I find all orphaned tasks under framework ef169d8a-24fc-41d1-8b0d-c67718937a48-0000,
curl -XGET `http://c196:5050/master/frameworks
shows that framework is unregistered_frameworks:
{
"frameworks": [
.....
],
"completed_frameworks": [ ],
"unregistered_frameworks": [
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000",
"ef169d8a-24fc-41d1-8b0d-c67718937a48-0000"
]
}
Try to delete framework by framework ID (so that the tasks under framework would be delete too)
curl -XPOST http://c196:5050/master/teardown -d 'frameworkId=ef169d8a-24fc-41d1-8b0d-c67718937a48-0000'
but get No framework found with specified ID
So, how to delete orphaned tasks?
There are two options
Register framework with same framework id. Do reconciliation and kill all tasks you receive. For example you can do it in following manner
Download the code git clone https://github.com/janisz/mesos-cookbook.git
Change dir cd mesos-cookbook/4_understanding_frameworks
In scheduler.go change master for your URL
If you want to mimic some other framework create /tmp/framework.json and fill it with FrameworkInfo data:
{
"id": "<mesos-framewokr-id>",
"user": "<framework-user>",
"name": "<framework-name>",
"failover_timeout": 3600,
"checkpoint": true,
"hostname": "<hostname>",
"webui_url": "<framework-web-ui>"
}
Run it go run scheduler.go scheduler.pb.go mesos.pb.go
Get list of all tasks curl localhost:9090
Delete task with curl -X DELETE "http://10.10.10.10:9090/?id=task_id"
Wait until failover_timeout so Mesos will delete this tasks for you.

Programmatically set Kibana's default index pattern

A Kibana newbie would like to know how to set default index pattern programmatically rather than setting it on the Kibana UI through web browser during the first time viewing Kibana UI as mentioned on page https://www.elastic.co/guide/en/kibana/current/setup.html
Elasticsearch stores all Kibana metadata information under .kibana index. Kibana configurations like defaultIndex and advance settings are stored under index/type/id .kibana/config/4.5.0 where 4.5.0 is the version of your Kibana.
So you can achieve setting up or changing defaultIndex with following steps:
Add index to Kibana which you want to set as defaultIndex. You can do that by executing following command:
curl -XPUT http://<es node>:9200/.kibana/index-pattern/your_index_name -d '{"title" : "your_index_name", "timeFieldName": "timestampFieldNameInYourInputData"}'
Change your Kibana config to set index added earlier as defaultIndex:
curl -XPUT http://<es node>:9200/.kibana/config/4.5.0 -d '{"defaultIndex" : "your_index_name"}'
Note: Make sure your giving correct index_name everywhere, valid timestamp field name and kibana version for example if you are using kibana 4.1.1 then you can replace 4.5.0 with 4.1.1 .
In kibana:6.5.3 this can be achieved this calling the kibana api.
curl -X POST "http://localhost:5601/api/saved_objects/index-pattern/logstash" -H 'kbn-xsrf: true' -H 'Content-Type: application/json' -d'
{
"attributes": {
"title": "logstash-*",
"timeFieldName": "#timestamp"
}
}
'
the Docs are here it does mention that the feature is experimental.

Resources