Import Ceph RBD snapshot - snapshot

I just wonder if there is a way to import a Ceph RBD snapshot that we have previously exported, but without recover the current image state ?
Thanks in advance.

Related

How to set HDFS as statebackend for flink

I want to store flink store in HDFS so that after crash I can recover the flink state from HDFS. I am planning to write state to HDFS every 60 second. How Can I achieve this ?
Is this the config I need to follow ?
https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/state/state_backends.html#setting-default-state-backend
And where do I specify the check point interval ? Any link or sample code would be helpful
Choosing where checkpoints are stored (e.g., HDFS) is separate from deciding which state backend to use for managing your working state (which can be on-heap, or in local files managed by the RocksDB library).
These two concepts were cleanly separated in Flink 1.12. In early versions of Flink, the two appeared to be more strongly related than they actually are because the filesystem and rocksdb state backend constructors took a file URI as a parameter, specifying where the checkpoints should be stored.
The best way to manage all of this is to leave this out of your code, and to specify the configuration you want in flink-conf.yaml, e.g.,
state.backend: filesystem
state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints
execution.checkpointing.interval: 10s
Information about checkpointing and savepointing can be found at https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/checkpointing/
On how to configure HDFS as a filesystem, you should check https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/filesystems/overview/

Is it possible to upgrade the database providing metadata storage for a HDP cluster?

We have a Hadoop cluster with a very old PostgreSQL database (9.2) storing cluster metadata.
Is it possible to replace it with a more up-to-date version? I am concerned about breaking the cluster, what should I consider?
#Luis Sisamon
My recommendation would be to dump the 9.2 database and import into the version of your new preference. Assuming the import works without any errors, you should be able to move from old database to the new database. If you have concerns I would test this out with a dev cluster first before trying on live/prod system.

Import data to Hdfs from AWS S3 using Sqoop

I am using distcp(For Batch data) to get data from S3.
But according to sqoop website we can import from s3 to hdfs. I tried but I get error every time for connection build error :
https://sqoop.apache.org/docs/1.99.7/user/examples/S3Import.html
So, is there anyone who can tell me how I can do this perfectly ?
Also, What I can do to get auto syncing of incremental data.
You may want to take a look at s3distcp instead. See https://aws.amazon.com/blogs/big-data/seven-tips-for-using-s3distcp-on-amazon-emr-to-move-data-efficiently-between-hdfs-and-amazon-s3/

hbase export snapshot - CorruptedSnapshotException

I am working on a project which has 1TB data in Hbase. For backup purpose I read about snapshot.
hbase snapshot is on a cluster and I want to export to different cluster and I am getting
Caused by:
org.apache.hadoop.hbase.ipc.RemoteWithExtrasException(org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException):
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException:
So what other files do I need to include in my export?
and is it possible to restore the snapshot in another cluster like moving the snapshot directory from one cluster to another via winscp?
If you are getting CorruptedSnapshotException is due to this reason the snapshot info from the filesystem is not valid. So, please check whether your
export command was right.
example:
hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot30072017 -copy-to hdfs://127.0.0.1:9000/hbase -mappers 8 -bandwidth 100
Please read this Issue tracker.
The above command has eight map jobs which will run to export all snapshots to another cluster with a limiting bandwidth of 100 MB/s.
Note :
The org.apache.hadoop.hbase.snapshot.ExportSnapshot tool copies all the data related to a snapshot (HFiles, logs, and snapshot metadata) to another cluster.
snapshot details can be found under this hdfs location
/apps/hbase/data/.hbase-snapshot/ (cloudera vm path), please copy those files to another cluster
and restore using 'restore_snapshot 'snapshot_name''
Please read this HBase snapshot documentation.

Setting up a single backup node for an elasticsearch cluster?

Given Elasticsearch cluster with several machines, I would want to have a single machine(special node) that is located on a different geographical region that can effectively sync with the cluster for read only purpose. (i.e. no write for the special node; and that special node should be able to handle all query on its own). Is it possible and how can this be done?
With elasticsearch 1.0 (currently available in RC1) you can use the snapshot & restore api; have a look at this blog too to know more.
You can basically make a snapshot of your indices, then copy the snapshot over to the secondary location and restore it into a different cluster. The nice part is that snapshots are incremental, which means that only the files that have changed since the last snapshot are actually backed up. You can then create snapshots at regular intervals, and import them into the secondary cluster.
If you are not using 1.0 yet, I would suggest to have a look at it, snapshot & restore is a great addition. You can still make backups manually and restore them with 0.90, but you don't have a nice api to do that and you need to do everything pretty much manually.

Resources